2003-06-30 14:03:06 +04:00
|
|
|
/*
|
|
|
|
* QEMU System Emulator block driver
|
2007-09-17 01:08:06 +04:00
|
|
|
*
|
2003-06-30 14:03:06 +04:00
|
|
|
* Copyright (c) 2003 Fabrice Bellard
|
2021-04-28 18:18:04 +03:00
|
|
|
* Copyright (c) 2020 Virtuozzo International GmbH.
|
2007-09-17 01:08:06 +04:00
|
|
|
*
|
2003-06-30 14:03:06 +04:00
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
|
|
* of this software and associated documentation files (the "Software"), to deal
|
|
|
|
* in the Software without restriction, including without limitation the rights
|
|
|
|
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
|
|
* copies of the Software, and to permit persons to whom the Software is
|
|
|
|
* furnished to do so, subject to the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice shall be included in
|
|
|
|
* all copies or substantial portions of the Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
|
|
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
|
|
* THE SOFTWARE.
|
|
|
|
*/
|
2018-02-01 14:18:31 +03:00
|
|
|
|
2016-01-29 20:50:05 +03:00
|
|
|
#include "qemu/osdep.h"
|
2017-01-25 19:14:15 +03:00
|
|
|
#include "block/trace.h"
|
2012-12-17 21:19:44 +04:00
|
|
|
#include "block/block_int.h"
|
|
|
|
#include "block/blockjob.h"
|
2022-12-21 16:35:49 +03:00
|
|
|
#include "block/dirty-bitmap.h"
|
2020-10-27 22:05:42 +03:00
|
|
|
#include "block/fuse.h"
|
2016-07-06 12:22:39 +03:00
|
|
|
#include "block/nbd.h"
|
2018-06-14 22:14:28 +03:00
|
|
|
#include "block/qdict.h"
|
2015-03-17 20:29:20 +03:00
|
|
|
#include "qemu/error-report.h"
|
2019-08-29 21:34:43 +03:00
|
|
|
#include "block/module_block.h"
|
Include qemu/main-loop.h less
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h). It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.
Include qemu/main-loop.h only where it's needed. Touching it now
recompiles only some 1700 objects. For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the
others, they shrink only slightly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-12 08:23:50 +03:00
|
|
|
#include "qemu/main-loop.h"
|
2012-12-17 21:20:00 +04:00
|
|
|
#include "qemu/module.h"
|
2018-02-01 14:18:31 +03:00
|
|
|
#include "qapi/error.h"
|
2018-02-01 14:18:39 +03:00
|
|
|
#include "qapi/qmp/qdict.h"
|
2012-12-17 21:19:43 +04:00
|
|
|
#include "qapi/qmp/qjson.h"
|
2018-02-24 18:40:32 +03:00
|
|
|
#include "qapi/qmp/qnull.h"
|
2018-02-01 14:18:40 +03:00
|
|
|
#include "qapi/qmp/qstring.h"
|
2018-01-10 17:52:33 +03:00
|
|
|
#include "qapi/qobject-output-visitor.h"
|
|
|
|
#include "qapi/qapi-visit-block-core.h"
|
2014-10-07 15:59:11 +04:00
|
|
|
#include "sysemu/block-backend.h"
|
2012-12-17 21:20:00 +04:00
|
|
|
#include "qemu/notify.h"
|
2018-02-01 14:18:46 +03:00
|
|
|
#include "qemu/option.h"
|
2015-09-01 16:48:02 +03:00
|
|
|
#include "qemu/coroutine.h"
|
2014-01-24 00:31:34 +04:00
|
|
|
#include "block/qapi.h"
|
2012-12-17 21:20:00 +04:00
|
|
|
#include "qemu/timer.h"
|
2016-03-20 20:16:19 +03:00
|
|
|
#include "qemu/cutils.h"
|
|
|
|
#include "qemu/id.h"
|
block: block-status cache for data regions
As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform. The
main difference is that this time it is implemented as part of the
general block layer code.
The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the
implementation is outside of qemu, there is little we can do about its
performance.
We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts. So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.
To address this, we want to cache data regions. Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.
(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine. While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)
We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-08-12 11:41:44 +03:00
|
|
|
#include "qemu/range.h"
|
|
|
|
#include "qemu/rcu.h"
|
2020-09-24 21:54:10 +03:00
|
|
|
#include "block/coroutines.h"
|
2003-06-30 14:03:06 +04:00
|
|
|
|
2009-07-27 18:12:56 +04:00
|
|
|
#ifdef CONFIG_BSD
|
2005-04-27 01:59:26 +04:00
|
|
|
#include <sys/ioctl.h>
|
2009-09-12 11:36:22 +04:00
|
|
|
#include <sys/queue.h>
|
2021-03-15 21:03:39 +03:00
|
|
|
#if defined(HAVE_SYS_DISK_H)
|
2005-04-27 01:59:26 +04:00
|
|
|
#include <sys/disk.h>
|
|
|
|
#endif
|
2009-03-07 23:06:23 +03:00
|
|
|
#endif
|
2005-04-27 01:59:26 +04:00
|
|
|
|
2009-03-08 19:26:59 +03:00
|
|
|
#ifdef _WIN32
|
|
|
|
#include <windows.h>
|
|
|
|
#endif
|
|
|
|
|
2011-10-13 16:08:22 +04:00
|
|
|
#define NOT_DONE 0x7fffffff /* used while emulated sync operation in progress */
|
|
|
|
|
2022-03-03 18:15:48 +03:00
|
|
|
/* Protected by BQL */
|
2014-01-24 00:31:32 +04:00
|
|
|
static QTAILQ_HEAD(, BlockDriverState) graph_bdrv_states =
|
|
|
|
QTAILQ_HEAD_INITIALIZER(graph_bdrv_states);
|
|
|
|
|
2022-03-03 18:15:48 +03:00
|
|
|
/* Protected by BQL */
|
2016-01-29 18:36:11 +03:00
|
|
|
static QTAILQ_HEAD(, BlockDriverState) all_bdrv_states =
|
|
|
|
QTAILQ_HEAD_INITIALIZER(all_bdrv_states);
|
|
|
|
|
2022-03-03 18:15:48 +03:00
|
|
|
/* Protected by BQL */
|
2010-04-13 13:29:33 +04:00
|
|
|
static QLIST_HEAD(, BlockDriver) bdrv_drivers =
|
|
|
|
QLIST_HEAD_INITIALIZER(bdrv_drivers);
|
2004-08-02 01:59:26 +04:00
|
|
|
|
2016-05-17 17:41:31 +03:00
|
|
|
static BlockDriverState *bdrv_open_inherit(const char *filename,
|
|
|
|
const char *reference,
|
|
|
|
QDict *options, int flags,
|
|
|
|
BlockDriverState *parent,
|
2020-05-13 14:05:13 +03:00
|
|
|
const BdrvChildClass *child_class,
|
2020-05-13 14:05:17 +03:00
|
|
|
BdrvChildRole child_role,
|
2016-05-17 17:41:31 +03:00
|
|
|
Error **errp);
|
2015-04-08 14:43:47 +03:00
|
|
|
|
2021-10-18 16:47:14 +03:00
|
|
|
static bool bdrv_recurse_has_child(BlockDriverState *bs,
|
|
|
|
BlockDriverState *child);
|
|
|
|
|
2022-07-26 23:11:31 +03:00
|
|
|
static void bdrv_replace_child_noperm(BdrvChild *child,
|
2022-07-26 23:11:28 +03:00
|
|
|
BlockDriverState *new_bs);
|
2022-07-26 23:11:34 +03:00
|
|
|
static void bdrv_remove_child(BdrvChild *child, Transaction *tran);
|
2021-04-28 18:17:44 +03:00
|
|
|
|
2021-04-28 18:17:58 +03:00
|
|
|
static int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
|
|
|
|
BlockReopenQueue *queue,
|
2021-06-10 15:05:36 +03:00
|
|
|
Transaction *change_child_tran, Error **errp);
|
2021-04-28 18:17:35 +03:00
|
|
|
static void bdrv_reopen_commit(BDRVReopenState *reopen_state);
|
|
|
|
static void bdrv_reopen_abort(BDRVReopenState *reopen_state);
|
|
|
|
|
2021-12-15 15:11:38 +03:00
|
|
|
static bool bdrv_backing_overridden(BlockDriverState *bs);
|
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
2022-10-25 11:49:45 +03:00
|
|
|
GHashTable *visited, Transaction *tran,
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
Error **errp);
|
|
|
|
|
2009-10-27 20:41:44 +03:00
|
|
|
/* If non-zero, use only whitelisted block drivers */
|
|
|
|
static int use_bdrv_whitelist;
|
|
|
|
|
2010-12-09 14:53:00 +03:00
|
|
|
#ifdef _WIN32
|
|
|
|
static int is_windows_drive_prefix(const char *filename)
|
|
|
|
{
|
|
|
|
return (((filename[0] >= 'a' && filename[0] <= 'z') ||
|
|
|
|
(filename[0] >= 'A' && filename[0] <= 'Z')) &&
|
|
|
|
filename[1] == ':');
|
|
|
|
}
|
|
|
|
|
|
|
|
int is_windows_drive(const char *filename)
|
|
|
|
{
|
|
|
|
if (is_windows_drive_prefix(filename) &&
|
|
|
|
filename[2] == '\0')
|
|
|
|
return 1;
|
|
|
|
if (strstart(filename, "\\\\.\\", NULL) ||
|
|
|
|
strstart(filename, "//./", NULL))
|
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2013-11-28 13:23:32 +04:00
|
|
|
size_t bdrv_opt_mem_align(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
if (!bs || !bs->drv) {
|
block: align bounce buffers to page
The following sequence
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
for (i = 0; i < 100000; i++)
write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes.
The difference is quite reliable.
On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes.
The patch changes default bounce buffer optimal alignment to
MAX(page size, 4k). 4k is chosen as maximal known sector size on real
HDD.
The justification of the performance improve is quite interesting.
From the kernel point of view each request to the disk was split
by two. This could be seen by blktrace like this:
9,0 11 1 0.000000000 11151 Q WS 312737792 + 1023 [qemu-img]
9,0 11 2 0.000007938 11151 Q WS 312738815 + 8 [qemu-img]
9,0 11 3 0.000030735 11151 Q WS 312738823 + 1016 [qemu-img]
9,0 11 4 0.000032482 11151 Q WS 312739839 + 8 [qemu-img]
9,0 11 5 0.000041379 11151 Q WS 312739847 + 1016 [qemu-img]
9,0 11 6 0.000042818 11151 Q WS 312740863 + 8 [qemu-img]
9,0 11 7 0.000051236 11151 Q WS 312740871 + 1017 [qemu-img]
9,0 5 1 0.169071519 11151 Q WS 312741888 + 1023 [qemu-img]
After the patch the pattern becomes normal:
9,0 6 1 0.000000000 12422 Q WS 314834944 + 1024 [qemu-img]
9,0 6 2 0.000038527 12422 Q WS 314835968 + 1024 [qemu-img]
9,0 6 3 0.000072849 12422 Q WS 314836992 + 1024 [qemu-img]
9,0 6 4 0.000106276 12422 Q WS 314838016 + 1024 [qemu-img]
and the amount of requests sent to disk (could be calculated counting
number of lines in the output of blktrace) is reduced about 2 times.
Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest
does his job well and real requests comes properly aligned (to page).
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1431441056-26198-3-git-send-email-den@openvz.org
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-05-12 17:30:56 +03:00
|
|
|
/* page size or 4k (hdd sector size) should be on the safe side */
|
2022-03-23 18:57:22 +03:00
|
|
|
return MAX(4096, qemu_real_host_page_size());
|
2013-11-28 13:23:32 +04:00
|
|
|
}
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2013-11-28 13:23:32 +04:00
|
|
|
|
|
|
|
return bs->bl.opt_mem_alignment;
|
|
|
|
}
|
|
|
|
|
2015-05-12 17:30:55 +03:00
|
|
|
size_t bdrv_min_mem_align(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
if (!bs || !bs->drv) {
|
block: align bounce buffers to page
The following sequence
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
for (i = 0; i < 100000; i++)
write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes.
The difference is quite reliable.
On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes.
The patch changes default bounce buffer optimal alignment to
MAX(page size, 4k). 4k is chosen as maximal known sector size on real
HDD.
The justification of the performance improve is quite interesting.
From the kernel point of view each request to the disk was split
by two. This could be seen by blktrace like this:
9,0 11 1 0.000000000 11151 Q WS 312737792 + 1023 [qemu-img]
9,0 11 2 0.000007938 11151 Q WS 312738815 + 8 [qemu-img]
9,0 11 3 0.000030735 11151 Q WS 312738823 + 1016 [qemu-img]
9,0 11 4 0.000032482 11151 Q WS 312739839 + 8 [qemu-img]
9,0 11 5 0.000041379 11151 Q WS 312739847 + 1016 [qemu-img]
9,0 11 6 0.000042818 11151 Q WS 312740863 + 8 [qemu-img]
9,0 11 7 0.000051236 11151 Q WS 312740871 + 1017 [qemu-img]
9,0 5 1 0.169071519 11151 Q WS 312741888 + 1023 [qemu-img]
After the patch the pattern becomes normal:
9,0 6 1 0.000000000 12422 Q WS 314834944 + 1024 [qemu-img]
9,0 6 2 0.000038527 12422 Q WS 314835968 + 1024 [qemu-img]
9,0 6 3 0.000072849 12422 Q WS 314836992 + 1024 [qemu-img]
9,0 6 4 0.000106276 12422 Q WS 314838016 + 1024 [qemu-img]
and the amount of requests sent to disk (could be calculated counting
number of lines in the output of blktrace) is reduced about 2 times.
Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest
does his job well and real requests comes properly aligned (to page).
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1431441056-26198-3-git-send-email-den@openvz.org
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-05-12 17:30:56 +03:00
|
|
|
/* page size or 4k (hdd sector size) should be on the safe side */
|
2022-03-23 18:57:22 +03:00
|
|
|
return MAX(4096, qemu_real_host_page_size());
|
2015-05-12 17:30:55 +03:00
|
|
|
}
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2015-05-12 17:30:55 +03:00
|
|
|
|
|
|
|
return bs->bl.min_mem_alignment;
|
|
|
|
}
|
|
|
|
|
2010-12-09 14:53:00 +03:00
|
|
|
/* check if the path starts with "<protocol>:" */
|
2014-12-03 16:57:22 +03:00
|
|
|
int path_has_protocol(const char *path)
|
2010-12-09 14:53:00 +03:00
|
|
|
{
|
2012-05-08 18:51:48 +04:00
|
|
|
const char *p;
|
|
|
|
|
2010-12-09 14:53:00 +03:00
|
|
|
#ifdef _WIN32
|
|
|
|
if (is_windows_drive(path) ||
|
|
|
|
is_windows_drive_prefix(path)) {
|
|
|
|
return 0;
|
|
|
|
}
|
2012-05-08 18:51:48 +04:00
|
|
|
p = path + strcspn(path, ":/\\");
|
|
|
|
#else
|
|
|
|
p = path + strcspn(path, ":/");
|
2010-12-09 14:53:00 +03:00
|
|
|
#endif
|
|
|
|
|
2012-05-08 18:51:48 +04:00
|
|
|
return *p == ':';
|
2010-12-09 14:53:00 +03:00
|
|
|
}
|
|
|
|
|
2006-08-01 20:21:11 +04:00
|
|
|
int path_is_absolute(const char *path)
|
2005-10-30 21:30:10 +03:00
|
|
|
{
|
2007-01-07 21:22:37 +03:00
|
|
|
#ifdef _WIN32
|
|
|
|
/* specific case for names like: "\\.\d:" */
|
2012-05-08 18:51:47 +04:00
|
|
|
if (is_windows_drive(path) || is_windows_drive_prefix(path)) {
|
2007-01-07 21:22:37 +03:00
|
|
|
return 1;
|
2012-05-08 18:51:47 +04:00
|
|
|
}
|
|
|
|
return (*path == '/' || *path == '\\');
|
2007-01-07 20:27:07 +03:00
|
|
|
#else
|
2012-05-08 18:51:47 +04:00
|
|
|
return (*path == '/');
|
2007-01-07 20:27:07 +03:00
|
|
|
#endif
|
2005-10-30 21:30:10 +03:00
|
|
|
}
|
|
|
|
|
2019-02-01 22:29:13 +03:00
|
|
|
/* if filename is absolute, just return its duplicate. Otherwise, build a
|
2006-08-01 20:21:11 +04:00
|
|
|
path to it by considering it is relative to base_path. URL are
|
|
|
|
supported. */
|
2019-02-01 22:29:13 +03:00
|
|
|
char *path_combine(const char *base_path, const char *filename)
|
2005-10-30 21:30:10 +03:00
|
|
|
{
|
2019-02-01 22:29:13 +03:00
|
|
|
const char *protocol_stripped = NULL;
|
2006-08-01 20:21:11 +04:00
|
|
|
const char *p, *p1;
|
2019-02-01 22:29:13 +03:00
|
|
|
char *result;
|
2006-08-01 20:21:11 +04:00
|
|
|
int len;
|
|
|
|
|
|
|
|
if (path_is_absolute(filename)) {
|
2019-02-01 22:29:13 +03:00
|
|
|
return g_strdup(filename);
|
|
|
|
}
|
2017-05-22 22:52:15 +03:00
|
|
|
|
2019-02-01 22:29:13 +03:00
|
|
|
if (path_has_protocol(base_path)) {
|
|
|
|
protocol_stripped = strchr(base_path, ':');
|
|
|
|
if (protocol_stripped) {
|
|
|
|
protocol_stripped++;
|
2017-05-22 22:52:15 +03:00
|
|
|
}
|
2019-02-01 22:29:13 +03:00
|
|
|
}
|
|
|
|
p = protocol_stripped ?: base_path;
|
2017-05-22 22:52:15 +03:00
|
|
|
|
2019-02-01 22:29:13 +03:00
|
|
|
p1 = strrchr(base_path, '/');
|
2007-01-07 20:27:07 +03:00
|
|
|
#ifdef _WIN32
|
2019-02-01 22:29:13 +03:00
|
|
|
{
|
|
|
|
const char *p2;
|
|
|
|
p2 = strrchr(base_path, '\\');
|
|
|
|
if (!p1 || p2 > p1) {
|
|
|
|
p1 = p2;
|
2007-01-07 20:27:07 +03:00
|
|
|
}
|
2019-02-01 22:29:13 +03:00
|
|
|
}
|
2007-01-07 20:27:07 +03:00
|
|
|
#endif
|
2019-02-01 22:29:13 +03:00
|
|
|
if (p1) {
|
|
|
|
p1++;
|
|
|
|
} else {
|
|
|
|
p1 = base_path;
|
|
|
|
}
|
|
|
|
if (p1 > p) {
|
|
|
|
p = p1;
|
2005-10-30 21:30:10 +03:00
|
|
|
}
|
2019-02-01 22:29:13 +03:00
|
|
|
len = p - base_path;
|
|
|
|
|
|
|
|
result = g_malloc(len + strlen(filename) + 1);
|
|
|
|
memcpy(result, base_path, len);
|
|
|
|
strcpy(result + len, filename);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2017-05-22 22:52:16 +03:00
|
|
|
/*
|
|
|
|
* Helper function for bdrv_parse_filename() implementations to remove optional
|
|
|
|
* protocol prefixes (especially "file:") from a filename and for putting the
|
|
|
|
* stripped filename into the options QDict if there is such a prefix.
|
|
|
|
*/
|
|
|
|
void bdrv_parse_filename_strip_prefix(const char *filename, const char *prefix,
|
|
|
|
QDict *options)
|
|
|
|
{
|
|
|
|
if (strstart(filename, prefix, &filename)) {
|
|
|
|
/* Stripping the explicit protocol prefix may result in a protocol
|
|
|
|
* prefix being (wrongly) detected (if the filename contains a colon) */
|
|
|
|
if (path_has_protocol(filename)) {
|
2020-12-11 20:11:51 +03:00
|
|
|
GString *fat_filename;
|
2017-05-22 22:52:16 +03:00
|
|
|
|
|
|
|
/* This means there is some colon before the first slash; therefore,
|
|
|
|
* this cannot be an absolute path */
|
|
|
|
assert(!path_is_absolute(filename));
|
|
|
|
|
|
|
|
/* And we can thus fix the protocol detection issue by prefixing it
|
|
|
|
* by "./" */
|
2020-12-11 20:11:51 +03:00
|
|
|
fat_filename = g_string_new("./");
|
|
|
|
g_string_append(fat_filename, filename);
|
2017-05-22 22:52:16 +03:00
|
|
|
|
2020-12-11 20:11:51 +03:00
|
|
|
assert(!path_has_protocol(fat_filename->str));
|
2017-05-22 22:52:16 +03:00
|
|
|
|
2020-12-11 20:11:51 +03:00
|
|
|
qdict_put(options, "filename",
|
|
|
|
qstring_from_gstring(fat_filename));
|
2017-05-22 22:52:16 +03:00
|
|
|
} else {
|
|
|
|
/* If no protocol prefix was detected, we can use the shortened
|
|
|
|
* filename as-is */
|
|
|
|
qdict_put_str(options, "filename", filename);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2017-05-04 19:52:40 +03:00
|
|
|
/* Returns whether the image file is opened as read-only. Note that this can
|
|
|
|
* return false and writing to the image file is still not possible because the
|
|
|
|
* image is inactivated. */
|
2017-04-07 23:55:28 +03:00
|
|
|
bool bdrv_is_read_only(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2021-05-27 18:40:55 +03:00
|
|
|
return !(bs->open_flags & BDRV_O_RDWR);
|
2017-04-07 23:55:28 +03:00
|
|
|
}
|
|
|
|
|
2023-02-03 18:21:40 +03:00
|
|
|
static int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
|
|
|
|
bool ignore_allow_rdw, Error **errp)
|
2017-04-07 23:55:25 +03:00
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
|
|
|
|
block: do not set BDS read_only if copy_on_read enabled
A few block drivers will set the BDS read_only flag from their
.bdrv_open() function. This means the bs->read_only flag could
be set after we enable copy_on_read, as the BDRV_O_COPY_ON_READ
flag check occurs prior to the call to bdrv->bdrv_open().
This adds an error return to bdrv_set_read_only(), and an error will be
return if we try to set the BDS to read_only while copy_on_read is
enabled.
This patch also changes the behavior of vvfat. Before, vvfat could
override the drive 'readonly' flag with its own, internal 'rw' flag.
For instance, this -drive parameter would result in a writable image:
"-drive format=vvfat,dir=/tmp/vvfat,rw,if=virtio,readonly=on"
This is not correct. Now, attempting to use the above -drive parameter
will result in an error (i.e., 'rw' is incompatible with 'readonly=on').
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 0c5b4c1cc2c651471b131f21376dfd5ea24d2196.1491597120.git.jcody@redhat.com
2017-04-07 23:55:26 +03:00
|
|
|
/* Do not set read_only if copy_on_read is enabled */
|
|
|
|
if (bs->copy_on_read && read_only) {
|
|
|
|
error_setg(errp, "Can't set node '%s' to r/o with copy-on-read enabled",
|
|
|
|
bdrv_get_device_or_node_name(bs));
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2017-04-07 23:55:27 +03:00
|
|
|
/* Do not clear read_only if it is prohibited */
|
2017-08-03 18:02:58 +03:00
|
|
|
if (!read_only && !(bs->open_flags & BDRV_O_ALLOW_RDWR) &&
|
|
|
|
!ignore_allow_rdw)
|
|
|
|
{
|
2017-04-07 23:55:27 +03:00
|
|
|
error_setg(errp, "Node '%s' is read only",
|
|
|
|
bdrv_get_device_or_node_name(bs));
|
|
|
|
return -EPERM;
|
|
|
|
}
|
|
|
|
|
2017-04-07 23:55:29 +03:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-10-12 12:27:41 +03:00
|
|
|
/*
|
|
|
|
* Called by a driver that can only provide a read-only image.
|
|
|
|
*
|
|
|
|
* Returns 0 if the node is already read-only or it could switch the node to
|
|
|
|
* read-only because BDRV_O_AUTO_RDONLY is set.
|
|
|
|
*
|
|
|
|
* Returns -EACCES if the node is read-write and BDRV_O_AUTO_RDONLY is not set
|
|
|
|
* or bdrv_can_set_read_only() forbids making the node read-only. If @errmsg
|
|
|
|
* is not NULL, it is used as the error message for the Error object.
|
|
|
|
*/
|
|
|
|
int bdrv_apply_auto_read_only(BlockDriverState *bs, const char *errmsg,
|
|
|
|
Error **errp)
|
2017-04-07 23:55:29 +03:00
|
|
|
{
|
|
|
|
int ret = 0;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2017-04-07 23:55:29 +03:00
|
|
|
|
2018-10-12 12:27:41 +03:00
|
|
|
if (!(bs->open_flags & BDRV_O_RDWR)) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
if (!(bs->open_flags & BDRV_O_AUTO_RDONLY)) {
|
|
|
|
goto fail;
|
2017-04-07 23:55:29 +03:00
|
|
|
}
|
|
|
|
|
2018-10-12 12:27:41 +03:00
|
|
|
ret = bdrv_can_set_read_only(bs, true, false, NULL);
|
|
|
|
if (ret < 0) {
|
|
|
|
goto fail;
|
2018-10-09 17:57:12 +03:00
|
|
|
}
|
|
|
|
|
2018-10-12 12:27:41 +03:00
|
|
|
bs->open_flags &= ~BDRV_O_RDWR;
|
|
|
|
|
block: do not set BDS read_only if copy_on_read enabled
A few block drivers will set the BDS read_only flag from their
.bdrv_open() function. This means the bs->read_only flag could
be set after we enable copy_on_read, as the BDRV_O_COPY_ON_READ
flag check occurs prior to the call to bdrv->bdrv_open().
This adds an error return to bdrv_set_read_only(), and an error will be
return if we try to set the BDS to read_only while copy_on_read is
enabled.
This patch also changes the behavior of vvfat. Before, vvfat could
override the drive 'readonly' flag with its own, internal 'rw' flag.
For instance, this -drive parameter would result in a writable image:
"-drive format=vvfat,dir=/tmp/vvfat,rw,if=virtio,readonly=on"
This is not correct. Now, attempting to use the above -drive parameter
will result in an error (i.e., 'rw' is incompatible with 'readonly=on').
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 0c5b4c1cc2c651471b131f21376dfd5ea24d2196.1491597120.git.jcody@redhat.com
2017-04-07 23:55:26 +03:00
|
|
|
return 0;
|
2018-10-12 12:27:41 +03:00
|
|
|
|
|
|
|
fail:
|
|
|
|
error_setg(errp, "%s", errmsg ?: "Image is read-only");
|
|
|
|
return -EACCES;
|
2017-04-07 23:55:25 +03:00
|
|
|
}
|
|
|
|
|
2019-02-01 22:29:14 +03:00
|
|
|
/*
|
|
|
|
* If @backing is empty, this function returns NULL without setting
|
|
|
|
* @errp. In all other cases, NULL will only be returned with @errp
|
|
|
|
* set.
|
|
|
|
*
|
|
|
|
* Therefore, a return value of NULL without @errp set means that
|
|
|
|
* there is no backing file; if @errp is set, there is one but its
|
|
|
|
* absolute filename cannot be generated.
|
|
|
|
*/
|
|
|
|
char *bdrv_get_full_backing_filename_from_filename(const char *backed,
|
|
|
|
const char *backing,
|
|
|
|
Error **errp)
|
2012-05-08 18:51:50 +04:00
|
|
|
{
|
2019-02-01 22:29:14 +03:00
|
|
|
if (backing[0] == '\0') {
|
|
|
|
return NULL;
|
|
|
|
} else if (path_has_protocol(backing) || path_is_absolute(backing)) {
|
|
|
|
return g_strdup(backing);
|
2014-11-26 19:20:26 +03:00
|
|
|
} else if (backed[0] == '\0' || strstart(backed, "json:", NULL)) {
|
|
|
|
error_setg(errp, "Cannot use relative backing file names for '%s'",
|
|
|
|
backed);
|
2019-02-01 22:29:14 +03:00
|
|
|
return NULL;
|
2012-05-08 18:51:50 +04:00
|
|
|
} else {
|
2019-02-01 22:29:14 +03:00
|
|
|
return path_combine(backed, backing);
|
2012-05-08 18:51:50 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-02-01 22:29:16 +03:00
|
|
|
/*
|
|
|
|
* If @filename is empty or NULL, this function returns NULL without
|
|
|
|
* setting @errp. In all other cases, NULL will only be returned with
|
|
|
|
* @errp set.
|
|
|
|
*/
|
|
|
|
static char *bdrv_make_absolute_filename(BlockDriverState *relative_to,
|
|
|
|
const char *filename, Error **errp)
|
2014-11-26 19:20:25 +03:00
|
|
|
{
|
2019-02-01 22:29:23 +03:00
|
|
|
char *dir, *full_name;
|
2019-02-01 22:29:16 +03:00
|
|
|
|
2019-02-01 22:29:23 +03:00
|
|
|
if (!filename || filename[0] == '\0') {
|
|
|
|
return NULL;
|
|
|
|
} else if (path_has_protocol(filename) || path_is_absolute(filename)) {
|
|
|
|
return g_strdup(filename);
|
|
|
|
}
|
2014-11-26 19:20:26 +03:00
|
|
|
|
2019-02-01 22:29:23 +03:00
|
|
|
dir = bdrv_dirname(relative_to, errp);
|
|
|
|
if (!dir) {
|
|
|
|
return NULL;
|
|
|
|
}
|
block: Use bdrv_refresh_filename() to pull
Before this patch, bdrv_refresh_filename() is used in a pushing manner:
Whenever the BDS graph is modified, the parents of the modified edges
are supposed to be updated (recursively upwards). However, that is
nonviable, considering that we want child changes not to concern
parents.
Also, in the long run we want a pull model anyway: Here, we would have a
bdrv_filename() function which returns a BDS's filename, freshly
constructed.
This patch is an intermediate step. It adds bdrv_refresh_filename()
calls before every place a BDS.filename value is used. The only
exceptions are protocol drivers that use their own filename, which
clearly would not profit from refreshing that filename before.
Also, bdrv_get_encrypted_filename() is removed along the way (as a user
of BDS.filename), since it is completely unused.
In turn, all of the calls to bdrv_refresh_filename() before this patch
are removed, because we no longer have to call this function on graph
changes.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:05 +03:00
|
|
|
|
2019-02-01 22:29:23 +03:00
|
|
|
full_name = g_strconcat(dir, filename, NULL);
|
|
|
|
g_free(dir);
|
|
|
|
return full_name;
|
2019-02-01 22:29:16 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
char *bdrv_get_full_backing_filename(BlockDriverState *bs, Error **errp)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2019-02-01 22:29:16 +03:00
|
|
|
return bdrv_make_absolute_filename(bs, bs->backing_file, errp);
|
2014-11-26 19:20:25 +03:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:51 +03:00
|
|
|
void bdrv_register(BlockDriver *bdrv)
|
|
|
|
{
|
2020-03-19 01:22:35 +03:00
|
|
|
assert(bdrv->format_name);
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2010-04-13 13:29:33 +04:00
|
|
|
QLIST_INSERT_HEAD(&bdrv_drivers, bdrv, list);
|
2004-08-02 01:59:26 +04:00
|
|
|
}
|
2004-03-15 00:38:54 +03:00
|
|
|
|
2014-10-07 15:59:03 +04:00
|
|
|
BlockDriverState *bdrv_new(void)
|
|
|
|
{
|
|
|
|
BlockDriverState *bs;
|
|
|
|
int i;
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
block: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
Patch created with Coccinelle, with two manual changes on top:
* Add const to bdrv_iterate_format() to keep the types straight
* Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
inexplicably misses
Coccinelle semantic patch:
@@
type T;
@@
-g_malloc(sizeof(T))
+g_new(T, 1)
@@
type T;
@@
-g_try_malloc(sizeof(T))
+g_try_new(T, 1)
@@
type T;
@@
-g_malloc0(sizeof(T))
+g_new0(T, 1)
@@
type T;
@@
-g_try_malloc0(sizeof(T))
+g_try_new0(T, 1)
@@
type T;
expression n;
@@
-g_malloc(sizeof(T) * (n))
+g_new(T, n)
@@
type T;
expression n;
@@
-g_try_malloc(sizeof(T) * (n))
+g_try_new(T, n)
@@
type T;
expression n;
@@
-g_malloc0(sizeof(T) * (n))
+g_new0(T, n)
@@
type T;
expression n;
@@
-g_try_malloc0(sizeof(T) * (n))
+g_try_new0(T, n)
@@
type T;
expression p, n;
@@
-g_realloc(p, sizeof(T) * (n))
+g_renew(T, p, n)
@@
type T;
expression p, n;
@@
-g_try_realloc(p, sizeof(T) * (n))
+g_try_renew(T, p, n)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-19 12:31:08 +04:00
|
|
|
bs = g_new0(BlockDriverState, 1);
|
2013-11-13 14:29:43 +04:00
|
|
|
QLIST_INIT(&bs->dirty_bitmaps);
|
2014-05-23 17:29:42 +04:00
|
|
|
for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) {
|
|
|
|
QLIST_INIT(&bs->op_blockers[i]);
|
|
|
|
}
|
2017-06-05 15:39:02 +03:00
|
|
|
qemu_co_mutex_init(&bs->reqs_lock);
|
2017-06-05 15:39:03 +03:00
|
|
|
qemu_mutex_init(&bs->dirty_bitmap_mutex);
|
2013-08-23 05:14:46 +04:00
|
|
|
bs->refcnt = 1;
|
2014-05-08 18:34:37 +04:00
|
|
|
bs->aio_context = qemu_get_aio_context();
|
2012-08-23 13:20:36 +04:00
|
|
|
|
2016-07-18 22:39:52 +03:00
|
|
|
qemu_co_queue_init(&bs->flush_queue);
|
|
|
|
|
block: block-status cache for data regions
As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform. The
main difference is that this time it is implemented as part of the
general block layer code.
The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the
implementation is outside of qemu, there is little we can do about its
performance.
We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts. So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.
To address this, we want to cache data regions. Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.
(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine. While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)
We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-08-12 11:41:44 +03:00
|
|
|
qemu_co_mutex_init(&bs->bsc_modify_lock);
|
|
|
|
bs->block_status_cache = g_new0(BdrvBlockStatusCache, 1);
|
|
|
|
|
2018-03-28 19:29:18 +03:00
|
|
|
for (i = 0; i < bdrv_drain_all_count; i++) {
|
|
|
|
bdrv_drained_begin(bs);
|
|
|
|
}
|
|
|
|
|
2016-01-29 18:36:11 +03:00
|
|
|
QTAILQ_INSERT_TAIL(&all_bdrv_states, bs, bs_list);
|
|
|
|
|
2004-03-15 00:38:54 +03:00
|
|
|
return bs;
|
|
|
|
}
|
|
|
|
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
static BlockDriver *bdrv_do_find_format(const char *format_name)
|
2004-08-02 01:59:26 +04:00
|
|
|
{
|
|
|
|
BlockDriver *drv1;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
|
2010-04-13 13:29:33 +04:00
|
|
|
QLIST_FOREACH(drv1, &bdrv_drivers, list) {
|
|
|
|
if (!strcmp(drv1->format_name, format_name)) {
|
2004-08-02 01:59:26 +04:00
|
|
|
return drv1;
|
2010-04-13 13:29:33 +04:00
|
|
|
}
|
2004-08-02 01:59:26 +04:00
|
|
|
}
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
|
2004-08-02 01:59:26 +04:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
BlockDriver *bdrv_find_format(const char *format_name)
|
|
|
|
{
|
|
|
|
BlockDriver *drv1;
|
|
|
|
int i;
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
drv1 = bdrv_do_find_format(format_name);
|
|
|
|
if (drv1) {
|
|
|
|
return drv1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* The driver isn't registered, maybe we need to load a module */
|
|
|
|
for (i = 0; i < (int)ARRAY_SIZE(block_driver_modules); ++i) {
|
|
|
|
if (!strcmp(block_driver_modules[i].format_name, format_name)) {
|
module: add Error arguments to module_load and module_load_qom
improve error handling during module load, by changing:
bool module_load(const char *prefix, const char *lib_name);
void module_load_qom(const char *type);
to:
int module_load(const char *prefix, const char *name, Error **errp);
int module_load_qom(const char *type, Error **errp);
where the return value is:
-1 on module load error, and errp is set with the error
0 on module or one of its dependencies are not installed
1 on module load success
2 on module load success (module already loaded or built-in)
module_load_qom_one has been introduced in:
commit 28457744c345 ("module: qom module support"), which built on top of
module_load_one, but discarded the bool return value. Restore it.
Adapt all callers to emit errors, or ignore them, or fail hard,
as appropriate in each context.
Replace the previous emission of errors via fprintf in _some_ error
conditions with Error and error_report, so as to emit to the appropriate
target.
A memory leak is also fixed as part of the module_load changes.
audio: when attempting to load an audio module, report module load errors.
Note that still for some callers, a single issue may generate multiple
error reports, and this could be improved further.
Regarding the audio code itself, audio_add() seems to ignore errors,
and this should probably be improved.
block: when attempting to load a block module, report module load errors.
For the code paths that already use the Error API, take advantage of those
to report module load errors into the Error parameter.
For the other code paths, we currently emit the error, but this could be
improved further by adding Error parameters to all possible code paths.
console: when attempting to load a display module, report module load errors.
qdev: when creating a new qdev Device object (DeviceState), report load errors.
If a module cannot be loaded to create that device, now abort execution
(if no CONFIG_MODULE) or exit (if CONFIG_MODULE).
qom/object.c: when initializing a QOM object, or looking up class_by_name,
report module load errors.
qtest: when processing the "module_load" qtest command, report errors
in the load of the module.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220929093035.4231-4-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-29 12:30:33 +03:00
|
|
|
Error *local_err = NULL;
|
|
|
|
int rv = block_module_load(block_driver_modules[i].library_name,
|
|
|
|
&local_err);
|
|
|
|
if (rv > 0) {
|
|
|
|
return bdrv_do_find_format(format_name);
|
|
|
|
} else if (rv < 0) {
|
|
|
|
error_report_err(local_err);
|
|
|
|
}
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
module: add Error arguments to module_load and module_load_qom
improve error handling during module load, by changing:
bool module_load(const char *prefix, const char *lib_name);
void module_load_qom(const char *type);
to:
int module_load(const char *prefix, const char *name, Error **errp);
int module_load_qom(const char *type, Error **errp);
where the return value is:
-1 on module load error, and errp is set with the error
0 on module or one of its dependencies are not installed
1 on module load success
2 on module load success (module already loaded or built-in)
module_load_qom_one has been introduced in:
commit 28457744c345 ("module: qom module support"), which built on top of
module_load_one, but discarded the bool return value. Restore it.
Adapt all callers to emit errors, or ignore them, or fail hard,
as appropriate in each context.
Replace the previous emission of errors via fprintf in _some_ error
conditions with Error and error_report, so as to emit to the appropriate
target.
A memory leak is also fixed as part of the module_load changes.
audio: when attempting to load an audio module, report module load errors.
Note that still for some callers, a single issue may generate multiple
error reports, and this could be improved further.
Regarding the audio code itself, audio_add() seems to ignore errors,
and this should probably be improved.
block: when attempting to load a block module, report module load errors.
For the code paths that already use the Error API, take advantage of those
to report module load errors into the Error parameter.
For the other code paths, we currently emit the error, but this could be
improved further by adding Error parameters to all possible code paths.
console: when attempting to load a display module, report module load errors.
qdev: when creating a new qdev Device object (DeviceState), report load errors.
If a module cannot be loaded to create that device, now abort execution
(if no CONFIG_MODULE) or exit (if CONFIG_MODULE).
qom/object.c: when initializing a QOM object, or looking up class_by_name,
report module load errors.
qtest: when processing the "module_load" qtest command, report errors
in the load of the module.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220929093035.4231-4-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-29 12:30:33 +03:00
|
|
|
return NULL;
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
}
|
|
|
|
|
2019-03-07 16:33:58 +03:00
|
|
|
static int bdrv_format_is_whitelisted(const char *format_name, bool read_only)
|
2009-10-27 20:41:44 +03:00
|
|
|
{
|
2013-05-29 15:35:40 +04:00
|
|
|
static const char *whitelist_rw[] = {
|
|
|
|
CONFIG_BDRV_RW_WHITELIST
|
2020-08-04 19:14:26 +03:00
|
|
|
NULL
|
2013-05-29 15:35:40 +04:00
|
|
|
};
|
|
|
|
static const char *whitelist_ro[] = {
|
|
|
|
CONFIG_BDRV_RO_WHITELIST
|
2020-08-04 19:14:26 +03:00
|
|
|
NULL
|
2009-10-27 20:41:44 +03:00
|
|
|
};
|
|
|
|
const char **p;
|
|
|
|
|
2013-05-29 15:35:40 +04:00
|
|
|
if (!whitelist_rw[0] && !whitelist_ro[0]) {
|
2009-10-27 20:41:44 +03:00
|
|
|
return 1; /* no whitelist, anything goes */
|
2013-05-29 15:35:40 +04:00
|
|
|
}
|
2009-10-27 20:41:44 +03:00
|
|
|
|
2013-05-29 15:35:40 +04:00
|
|
|
for (p = whitelist_rw; *p; p++) {
|
2019-03-07 16:33:58 +03:00
|
|
|
if (!strcmp(format_name, *p)) {
|
2009-10-27 20:41:44 +03:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
2013-05-29 15:35:40 +04:00
|
|
|
if (read_only) {
|
|
|
|
for (p = whitelist_ro; *p; p++) {
|
2019-03-07 16:33:58 +03:00
|
|
|
if (!strcmp(format_name, *p)) {
|
2013-05-29 15:35:40 +04:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2009-10-27 20:41:44 +03:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-03-07 16:33:58 +03:00
|
|
|
int bdrv_is_whitelisted(BlockDriver *drv, bool read_only)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2019-03-07 16:33:58 +03:00
|
|
|
return bdrv_format_is_whitelisted(drv->format_name, read_only);
|
|
|
|
}
|
|
|
|
|
2016-03-21 17:11:48 +03:00
|
|
|
bool bdrv_uses_whitelist(void)
|
|
|
|
{
|
|
|
|
return use_bdrv_whitelist;
|
|
|
|
}
|
|
|
|
|
2012-05-07 12:50:42 +04:00
|
|
|
typedef struct CreateCo {
|
|
|
|
BlockDriver *drv;
|
|
|
|
char *filename;
|
2014-06-05 13:20:51 +04:00
|
|
|
QemuOpts *opts;
|
2012-05-07 12:50:42 +04:00
|
|
|
int ret;
|
2013-09-06 19:14:26 +04:00
|
|
|
Error *err;
|
2012-05-07 12:50:42 +04:00
|
|
|
} CreateCo;
|
|
|
|
|
2022-11-28 17:23:36 +03:00
|
|
|
int coroutine_fn bdrv_co_create(BlockDriver *drv, const char *filename,
|
|
|
|
QemuOpts *opts, Error **errp)
|
2012-05-07 12:50:42 +04:00
|
|
|
{
|
2013-09-06 19:14:26 +04:00
|
|
|
int ret;
|
2022-11-28 17:23:30 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
ERRP_GUARD();
|
2023-02-03 18:21:55 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2022-11-28 17:23:30 +03:00
|
|
|
|
|
|
|
if (!drv->bdrv_co_create_opts) {
|
|
|
|
error_setg(errp, "Driver '%s' does not support image creation",
|
|
|
|
drv->format_name);
|
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = drv->bdrv_co_create_opts(drv, filename, opts, errp);
|
|
|
|
if (ret < 0 && !*errp) {
|
|
|
|
error_setg_errno(errp, -ret, "Could not create image");
|
|
|
|
}
|
2013-09-06 19:14:26 +04:00
|
|
|
|
2022-11-28 17:23:30 +03:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-01-22 19:45:29 +03:00
|
|
|
/**
|
|
|
|
* Helper function for bdrv_create_file_fallback(): Resize @blk to at
|
|
|
|
* least the given @minimum_size.
|
|
|
|
*
|
|
|
|
* On success, return @blk's actual length.
|
|
|
|
* Otherwise, return -errno.
|
|
|
|
*/
|
|
|
|
static int64_t create_file_fallback_truncate(BlockBackend *blk,
|
|
|
|
int64_t minimum_size, Error **errp)
|
2010-04-08 00:30:24 +04:00
|
|
|
{
|
2013-09-06 19:14:26 +04:00
|
|
|
Error *local_err = NULL;
|
2020-01-22 19:45:29 +03:00
|
|
|
int64_t size;
|
2013-09-06 19:14:26 +04:00
|
|
|
int ret;
|
2010-04-08 00:30:24 +04:00
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2020-04-24 15:54:41 +03:00
|
|
|
ret = blk_truncate(blk, minimum_size, false, PREALLOC_MODE_OFF, 0,
|
|
|
|
&local_err);
|
2020-01-22 19:45:29 +03:00
|
|
|
if (ret < 0 && ret != -ENOTSUP) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
size = blk_getlength(blk);
|
|
|
|
if (size < 0) {
|
|
|
|
error_free(local_err);
|
|
|
|
error_setg_errno(errp, -size,
|
|
|
|
"Failed to inquire the new image file's length");
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (size < minimum_size) {
|
|
|
|
/* Need to grow the image, but we failed to do that */
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
|
|
|
|
|
|
|
error_free(local_err);
|
|
|
|
local_err = NULL;
|
|
|
|
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Helper function for bdrv_create_file_fallback(): Zero the first
|
|
|
|
* sector to remove any potentially pre-existing image header.
|
|
|
|
*/
|
2022-09-22 11:49:00 +03:00
|
|
|
static int coroutine_fn
|
|
|
|
create_file_fallback_zero_first_sector(BlockBackend *blk,
|
|
|
|
int64_t current_size,
|
|
|
|
Error **errp)
|
2020-01-22 19:45:29 +03:00
|
|
|
{
|
|
|
|
int64_t bytes_to_clear;
|
|
|
|
int ret;
|
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2020-01-22 19:45:29 +03:00
|
|
|
bytes_to_clear = MIN(current_size, BDRV_SECTOR_SIZE);
|
|
|
|
if (bytes_to_clear) {
|
2022-10-13 15:37:02 +03:00
|
|
|
ret = blk_co_pwrite_zeroes(blk, 0, bytes_to_clear, BDRV_REQ_MAY_UNMAP);
|
2020-01-22 19:45:29 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
error_setg_errno(errp, -ret,
|
|
|
|
"Failed to clear the new image's first sector");
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-03-26 04:12:18 +03:00
|
|
|
/**
|
|
|
|
* Simple implementation of bdrv_co_create_opts for protocol drivers
|
|
|
|
* which only support creation via opening a file
|
|
|
|
* (usually existing raw storage device)
|
|
|
|
*/
|
|
|
|
int coroutine_fn bdrv_co_create_opts_simple(BlockDriver *drv,
|
|
|
|
const char *filename,
|
|
|
|
QemuOpts *opts,
|
|
|
|
Error **errp)
|
2020-01-22 19:45:29 +03:00
|
|
|
{
|
|
|
|
BlockBackend *blk;
|
2020-02-25 18:56:18 +03:00
|
|
|
QDict *options;
|
2020-01-22 19:45:29 +03:00
|
|
|
int64_t size = 0;
|
|
|
|
char *buf = NULL;
|
|
|
|
PreallocMode prealloc;
|
|
|
|
Error *local_err = NULL;
|
|
|
|
int ret;
|
|
|
|
|
2022-03-03 18:15:57 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2020-01-22 19:45:29 +03:00
|
|
|
size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
|
|
|
|
buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
|
|
|
|
prealloc = qapi_enum_parse(&PreallocMode_lookup, buf,
|
|
|
|
PREALLOC_MODE_OFF, &local_err);
|
|
|
|
g_free(buf);
|
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prealloc != PREALLOC_MODE_OFF) {
|
|
|
|
error_setg(errp, "Unsupported preallocation mode '%s'",
|
|
|
|
PreallocMode_str(prealloc));
|
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
|
|
|
|
2020-02-25 18:56:18 +03:00
|
|
|
options = qdict_new();
|
2020-01-22 19:45:29 +03:00
|
|
|
qdict_put_str(options, "driver", drv->format_name);
|
|
|
|
|
2023-01-26 20:24:31 +03:00
|
|
|
blk = blk_co_new_open(filename, NULL, options,
|
|
|
|
BDRV_O_RDWR | BDRV_O_RESIZE, errp);
|
2020-01-22 19:45:29 +03:00
|
|
|
if (!blk) {
|
|
|
|
error_prepend(errp, "Protocol driver '%s' does not support image "
|
|
|
|
"creation, and opening the image failed: ",
|
|
|
|
drv->format_name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
size = create_file_fallback_truncate(blk, size, errp);
|
|
|
|
if (size < 0) {
|
|
|
|
ret = size;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = create_file_fallback_zero_first_sector(blk, size, errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = 0;
|
|
|
|
out:
|
2023-05-04 14:57:33 +03:00
|
|
|
blk_co_unref(blk);
|
2020-01-22 19:45:29 +03:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-11-28 17:23:31 +03:00
|
|
|
int coroutine_fn bdrv_co_create_file(const char *filename, QemuOpts *opts,
|
|
|
|
Error **errp)
|
2020-01-22 19:45:29 +03:00
|
|
|
{
|
block: remove format defaults from QemuOpts in bdrv_create_file()
QemuOpts is usually created merging the QemuOptsList of format
and protocol. So, when the format calls bdr_create_file(), the 'opts'
parameter contains a QemuOptsList with a combination of format and
protocol default values.
The format properly removes its options before calling
bdr_create_file(), but the default values remain in 'opts->list'.
So if the protocol has options with the same name (e.g. rbd has
'cluster_size' as qcow2), it will see the default values of the format,
since for overlapping options, the format wins.
To avoid this issue, lets convert QemuOpts to QDict, in this way we take
only the set options, and then convert it back to QemuOpts, using the
'create_opts' of the protocol. So the new QemuOpts, will contain only the
protocol defaults.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210308161232.248833-1-sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 19:12:32 +03:00
|
|
|
QemuOpts *protocol_opts;
|
2020-01-22 19:45:29 +03:00
|
|
|
BlockDriver *drv;
|
block: remove format defaults from QemuOpts in bdrv_create_file()
QemuOpts is usually created merging the QemuOptsList of format
and protocol. So, when the format calls bdr_create_file(), the 'opts'
parameter contains a QemuOptsList with a combination of format and
protocol default values.
The format properly removes its options before calling
bdr_create_file(), but the default values remain in 'opts->list'.
So if the protocol has options with the same name (e.g. rbd has
'cluster_size' as qcow2), it will see the default values of the format,
since for overlapping options, the format wins.
To avoid this issue, lets convert QemuOpts to QDict, in this way we take
only the set options, and then convert it back to QemuOpts, using the
'create_opts' of the protocol. So the new QemuOpts, will contain only the
protocol defaults.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210308161232.248833-1-sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 19:12:32 +03:00
|
|
|
QDict *qdict;
|
|
|
|
int ret;
|
2020-01-22 19:45:29 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2015-02-05 21:58:12 +03:00
|
|
|
drv = bdrv_find_protocol(filename, true, errp);
|
2010-04-08 00:30:24 +04:00
|
|
|
if (drv == NULL) {
|
2010-11-30 18:14:14 +03:00
|
|
|
return -ENOENT;
|
2010-04-08 00:30:24 +04:00
|
|
|
}
|
|
|
|
|
block: remove format defaults from QemuOpts in bdrv_create_file()
QemuOpts is usually created merging the QemuOptsList of format
and protocol. So, when the format calls bdr_create_file(), the 'opts'
parameter contains a QemuOptsList with a combination of format and
protocol default values.
The format properly removes its options before calling
bdr_create_file(), but the default values remain in 'opts->list'.
So if the protocol has options with the same name (e.g. rbd has
'cluster_size' as qcow2), it will see the default values of the format,
since for overlapping options, the format wins.
To avoid this issue, lets convert QemuOpts to QDict, in this way we take
only the set options, and then convert it back to QemuOpts, using the
'create_opts' of the protocol. So the new QemuOpts, will contain only the
protocol defaults.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210308161232.248833-1-sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 19:12:32 +03:00
|
|
|
if (!drv->create_opts) {
|
|
|
|
error_setg(errp, "Driver '%s' does not support image creation",
|
|
|
|
drv->format_name);
|
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* 'opts' contains a QemuOptsList with a combination of format and protocol
|
|
|
|
* default values.
|
|
|
|
*
|
|
|
|
* The format properly removes its options, but the default values remain
|
|
|
|
* in 'opts->list'. So if the protocol has options with the same name
|
|
|
|
* (e.g. rbd has 'cluster_size' as qcow2), it will see the default values
|
|
|
|
* of the format, since for overlapping options, the format wins.
|
|
|
|
*
|
|
|
|
* To avoid this issue, lets convert QemuOpts to QDict, in this way we take
|
|
|
|
* only the set options, and then convert it back to QemuOpts, using the
|
|
|
|
* create_opts of the protocol. So the new QemuOpts, will contain only the
|
|
|
|
* protocol defaults.
|
|
|
|
*/
|
|
|
|
qdict = qemu_opts_to_qdict(opts, NULL);
|
|
|
|
protocol_opts = qemu_opts_from_qdict(drv->create_opts, qdict, errp);
|
|
|
|
if (protocol_opts == NULL) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2022-11-28 17:23:31 +03:00
|
|
|
ret = bdrv_co_create(drv, filename, protocol_opts, errp);
|
block: remove format defaults from QemuOpts in bdrv_create_file()
QemuOpts is usually created merging the QemuOptsList of format
and protocol. So, when the format calls bdr_create_file(), the 'opts'
parameter contains a QemuOptsList with a combination of format and
protocol default values.
The format properly removes its options before calling
bdr_create_file(), but the default values remain in 'opts->list'.
So if the protocol has options with the same name (e.g. rbd has
'cluster_size' as qcow2), it will see the default values of the format,
since for overlapping options, the format wins.
To avoid this issue, lets convert QemuOpts to QDict, in this way we take
only the set options, and then convert it back to QemuOpts, using the
'create_opts' of the protocol. So the new QemuOpts, will contain only the
protocol defaults.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210308161232.248833-1-sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 19:12:32 +03:00
|
|
|
out:
|
|
|
|
qemu_opts_del(protocol_opts);
|
|
|
|
qobject_unref(qdict);
|
|
|
|
return ret;
|
2010-04-08 00:30:24 +04:00
|
|
|
}
|
|
|
|
|
2020-01-31 00:39:05 +03:00
|
|
|
int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, Error **errp)
|
|
|
|
{
|
|
|
|
Error *local_err = NULL;
|
|
|
|
int ret;
|
|
|
|
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2020-01-31 00:39:05 +03:00
|
|
|
assert(bs != NULL);
|
2023-02-03 18:22:00 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2020-01-31 00:39:05 +03:00
|
|
|
|
|
|
|
if (!bs->drv) {
|
|
|
|
error_setg(errp, "Block node '%s' is not opened", bs->filename);
|
|
|
|
return -ENOMEDIUM;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!bs->drv->bdrv_co_delete_file) {
|
|
|
|
error_setg(errp, "Driver '%s' does not support image deletion",
|
|
|
|
bs->drv->format_name);
|
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = bs->drv->bdrv_co_delete_file(bs, &local_err);
|
|
|
|
if (ret < 0) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-12-17 20:09:03 +03:00
|
|
|
void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
Error *local_err = NULL;
|
|
|
|
int ret;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2020-12-17 20:09:03 +03:00
|
|
|
|
|
|
|
if (!bs) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = bdrv_co_delete_file(bs, &local_err);
|
|
|
|
/*
|
|
|
|
* ENOTSUP will happen if the block driver doesn't support
|
|
|
|
* the 'bdrv_co_delete_file' interface. This is a predictable
|
|
|
|
* scenario and shouldn't be reported back to the user.
|
|
|
|
*/
|
|
|
|
if (ret == -ENOTSUP) {
|
|
|
|
error_free(local_err);
|
|
|
|
} else if (ret < 0) {
|
|
|
|
error_report_err(local_err);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-02-16 14:47:54 +03:00
|
|
|
/**
|
|
|
|
* Try to get @bs's logical and physical block size.
|
|
|
|
* On success, store them in @bsz struct and return 0.
|
|
|
|
* On failure return -errno.
|
|
|
|
* @bs must not be empty.
|
|
|
|
*/
|
|
|
|
int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
|
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2019-06-12 18:03:38 +03:00
|
|
|
BlockDriverState *filtered = bdrv_filter_bs(bs);
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2015-02-16 14:47:54 +03:00
|
|
|
|
|
|
|
if (drv && drv->bdrv_probe_blocksizes) {
|
|
|
|
return drv->bdrv_probe_blocksizes(bs, bsz);
|
2019-06-12 18:03:38 +03:00
|
|
|
} else if (filtered) {
|
|
|
|
return bdrv_probe_blocksizes(filtered, bsz);
|
2015-02-16 14:47:54 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Try to get @bs's geometry (cyls, heads, sectors).
|
|
|
|
* On success, store them in @geo struct and return 0.
|
|
|
|
* On failure return -errno.
|
|
|
|
* @bs must not be empty.
|
|
|
|
*/
|
|
|
|
int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
|
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2019-06-12 18:03:38 +03:00
|
|
|
BlockDriverState *filtered = bdrv_filter_bs(bs);
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2015-02-16 14:47:54 +03:00
|
|
|
|
|
|
|
if (drv && drv->bdrv_probe_geometry) {
|
|
|
|
return drv->bdrv_probe_geometry(bs, geo);
|
2019-06-12 18:03:38 +03:00
|
|
|
} else if (filtered) {
|
|
|
|
return bdrv_probe_geometry(filtered, geo);
|
2015-02-16 14:47:54 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
|
|
|
|
2012-05-28 11:27:54 +04:00
|
|
|
/*
|
|
|
|
* Create a uniquely-named empty temporary file.
|
2022-10-10 07:04:31 +03:00
|
|
|
* Return the actual file name used upon success, otherwise NULL.
|
|
|
|
* This string should be freed with g_free() when not needed any longer.
|
|
|
|
*
|
|
|
|
* Note: creating a temporary file for the caller to (re)open is
|
|
|
|
* inherently racy. Use g_file_open_tmp() instead whenever practical.
|
2012-05-28 11:27:54 +04:00
|
|
|
*/
|
2022-10-10 07:04:31 +03:00
|
|
|
char *create_tmp_file(Error **errp)
|
2004-08-04 01:14:23 +04:00
|
|
|
{
|
2004-04-01 03:37:16 +04:00
|
|
|
int fd;
|
2008-09-14 10:45:34 +04:00
|
|
|
const char *tmpdir;
|
2022-10-10 07:04:31 +03:00
|
|
|
g_autofree char *filename = NULL;
|
|
|
|
|
|
|
|
tmpdir = g_get_tmp_dir();
|
|
|
|
#ifndef _WIN32
|
|
|
|
/*
|
|
|
|
* See commit 69bef79 ("block: use /var/tmp instead of /tmp for -snapshot")
|
|
|
|
*
|
|
|
|
* This function is used to create temporary disk images (like -snapshot),
|
|
|
|
* so the files can become very large. /tmp is often a tmpfs where as
|
|
|
|
* /var/tmp is usually on a disk, so more appropriate for disk images.
|
|
|
|
*/
|
|
|
|
if (!g_strcmp0(tmpdir, "/tmp")) {
|
2014-02-26 13:42:37 +04:00
|
|
|
tmpdir = "/var/tmp";
|
|
|
|
}
|
2022-10-10 07:04:31 +03:00
|
|
|
#endif
|
|
|
|
|
|
|
|
filename = g_strdup_printf("%s/vl.XXXXXX", tmpdir);
|
|
|
|
fd = g_mkstemp(filename);
|
2012-09-05 17:26:22 +04:00
|
|
|
if (fd < 0) {
|
2022-10-10 07:04:31 +03:00
|
|
|
error_setg_errno(errp, errno, "Could not open temporary file '%s'",
|
|
|
|
filename);
|
|
|
|
return NULL;
|
2012-05-28 11:27:54 +04:00
|
|
|
}
|
2022-10-10 07:04:30 +03:00
|
|
|
close(fd);
|
2022-10-10 07:04:31 +03:00
|
|
|
|
|
|
|
return g_steal_pointer(&filename);
|
2012-05-28 11:27:54 +04:00
|
|
|
}
|
2003-06-30 14:03:06 +04:00
|
|
|
|
2010-04-08 00:30:24 +04:00
|
|
|
/*
|
|
|
|
* Detect host devices. By convention, /dev/cdrom[N] is always
|
|
|
|
* recognized as a host CDROM.
|
|
|
|
*/
|
|
|
|
static BlockDriver *find_hdev_driver(const char *filename)
|
|
|
|
{
|
|
|
|
int score_max = 0, score;
|
|
|
|
BlockDriver *drv = NULL, *d;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2010-04-08 00:30:24 +04:00
|
|
|
|
|
|
|
QLIST_FOREACH(d, &bdrv_drivers, list) {
|
|
|
|
if (d->bdrv_probe_device) {
|
|
|
|
score = d->bdrv_probe_device(filename);
|
|
|
|
if (score > score_max) {
|
|
|
|
score_max = score;
|
|
|
|
drv = d;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return drv;
|
|
|
|
}
|
|
|
|
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
static BlockDriver *bdrv_do_find_protocol(const char *protocol)
|
|
|
|
{
|
|
|
|
BlockDriver *drv1;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
|
|
|
|
QLIST_FOREACH(drv1, &bdrv_drivers, list) {
|
|
|
|
if (drv1->protocol_name && !strcmp(drv1->protocol_name, protocol)) {
|
|
|
|
return drv1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2013-07-10 17:47:39 +04:00
|
|
|
BlockDriver *bdrv_find_protocol(const char *filename,
|
2015-02-05 21:58:12 +03:00
|
|
|
bool allow_protocol_prefix,
|
|
|
|
Error **errp)
|
2006-08-01 20:21:11 +04:00
|
|
|
{
|
|
|
|
BlockDriver *drv1;
|
|
|
|
char protocol[128];
|
2009-07-02 17:12:26 +04:00
|
|
|
int len;
|
2006-08-01 20:21:11 +04:00
|
|
|
const char *p;
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
int i;
|
2006-08-19 15:45:59 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2010-04-14 16:17:38 +04:00
|
|
|
/* TODO Drivers without bdrv_file_open must be specified explicitly */
|
|
|
|
|
2010-06-23 14:25:17 +04:00
|
|
|
/*
|
|
|
|
* XXX(hch): we really should not let host device detection
|
|
|
|
* override an explicit protocol specification, but moving this
|
|
|
|
* later breaks access to device names with colons in them.
|
|
|
|
* Thanks to the brain-dead persistent naming schemes on udev-
|
|
|
|
* based Linux systems those actually are quite common.
|
|
|
|
*/
|
|
|
|
drv1 = find_hdev_driver(filename);
|
|
|
|
if (drv1) {
|
|
|
|
return drv1;
|
|
|
|
}
|
|
|
|
|
2013-07-10 17:47:39 +04:00
|
|
|
if (!path_has_protocol(filename) || !allow_protocol_prefix) {
|
2014-12-02 20:32:42 +03:00
|
|
|
return &bdrv_file;
|
2010-04-08 00:30:24 +04:00
|
|
|
}
|
2013-07-10 17:47:39 +04:00
|
|
|
|
2010-12-09 14:53:00 +03:00
|
|
|
p = strchr(filename, ':');
|
|
|
|
assert(p != NULL);
|
2009-07-02 17:12:26 +04:00
|
|
|
len = p - filename;
|
|
|
|
if (len > sizeof(protocol) - 1)
|
|
|
|
len = sizeof(protocol) - 1;
|
|
|
|
memcpy(protocol, filename, len);
|
|
|
|
protocol[len] = '\0';
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
|
|
|
|
drv1 = bdrv_do_find_protocol(protocol);
|
|
|
|
if (drv1) {
|
|
|
|
return drv1;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < (int)ARRAY_SIZE(block_driver_modules); ++i) {
|
|
|
|
if (block_driver_modules[i].protocol_name &&
|
|
|
|
!strcmp(block_driver_modules[i].protocol_name, protocol)) {
|
module: add Error arguments to module_load and module_load_qom
improve error handling during module load, by changing:
bool module_load(const char *prefix, const char *lib_name);
void module_load_qom(const char *type);
to:
int module_load(const char *prefix, const char *name, Error **errp);
int module_load_qom(const char *type, Error **errp);
where the return value is:
-1 on module load error, and errp is set with the error
0 on module or one of its dependencies are not installed
1 on module load success
2 on module load success (module already loaded or built-in)
module_load_qom_one has been introduced in:
commit 28457744c345 ("module: qom module support"), which built on top of
module_load_one, but discarded the bool return value. Restore it.
Adapt all callers to emit errors, or ignore them, or fail hard,
as appropriate in each context.
Replace the previous emission of errors via fprintf in _some_ error
conditions with Error and error_report, so as to emit to the appropriate
target.
A memory leak is also fixed as part of the module_load changes.
audio: when attempting to load an audio module, report module load errors.
Note that still for some callers, a single issue may generate multiple
error reports, and this could be improved further.
Regarding the audio code itself, audio_add() seems to ignore errors,
and this should probably be improved.
block: when attempting to load a block module, report module load errors.
For the code paths that already use the Error API, take advantage of those
to report module load errors into the Error parameter.
For the other code paths, we currently emit the error, but this could be
improved further by adding Error parameters to all possible code paths.
console: when attempting to load a display module, report module load errors.
qdev: when creating a new qdev Device object (DeviceState), report load errors.
If a module cannot be loaded to create that device, now abort execution
(if no CONFIG_MODULE) or exit (if CONFIG_MODULE).
qom/object.c: when initializing a QOM object, or looking up class_by_name,
report module load errors.
qtest: when processing the "module_load" qtest command, report errors
in the load of the module.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220929093035.4231-4-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-29 12:30:33 +03:00
|
|
|
int rv = block_module_load(block_driver_modules[i].library_name, errp);
|
|
|
|
if (rv > 0) {
|
|
|
|
drv1 = bdrv_do_find_protocol(protocol);
|
|
|
|
} else if (rv < 0) {
|
|
|
|
return NULL;
|
|
|
|
}
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
break;
|
2010-04-13 13:29:33 +04:00
|
|
|
}
|
2006-08-01 20:21:11 +04:00
|
|
|
}
|
2015-02-05 21:58:12 +03:00
|
|
|
|
blockdev: Add dynamic module loading for block drivers
Extend the current module interface to allow for block drivers to be
loaded dynamically on request. The only block drivers that can be
converted into modules are the drivers that don't perform any init
operation except for registering themselves.
In addition, only the protocol drivers are being modularized, as they
are the only ones which see significant performance benefits. The format
drivers do not generally link to external libraries, so modularizing
them is of no benefit from a performance perspective.
All the necessary module information is located in a new structure found
in module_block.h
This spoils the purpose of 5505e8b76f (block/dmg: make it modular).
Before this patch, if module build is enabled, block-dmg.so is linked to
libbz2, whereas the main binary is not. In downstream, theoretically, it
means only the qemu-block-extra package depends on libbz2, while the
main QEMU package needn't to. With this patch, we (temporarily) change
the case so that the main QEMU depends on libbz2 again.
Signed-off-by: Marc Marí <markmb@redhat.com>
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-4-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Do a signed comparison against the length of
block_driver_modules[], so it will not cause a compile error when
empty]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-08-12 16:27:03 +03:00
|
|
|
if (!drv1) {
|
|
|
|
error_setg(errp, "Unknown protocol '%s'", protocol);
|
|
|
|
}
|
|
|
|
return drv1;
|
2006-08-01 20:21:11 +04:00
|
|
|
}
|
|
|
|
|
2014-11-20 18:27:10 +03:00
|
|
|
/*
|
|
|
|
* Guess image format by probing its contents.
|
|
|
|
* This is not a good idea when your image is raw (CVE-2008-2004), but
|
|
|
|
* we do it anyway for backward compatibility.
|
|
|
|
*
|
|
|
|
* @buf contains the image's first @buf_size bytes.
|
2014-11-20 18:27:11 +03:00
|
|
|
* @buf_size is the buffer size in bytes (generally BLOCK_PROBE_BUF_SIZE,
|
|
|
|
* but can be smaller if the image file is smaller)
|
2014-11-20 18:27:10 +03:00
|
|
|
* @filename is its filename.
|
|
|
|
*
|
|
|
|
* For all block drivers, call the bdrv_probe() method to get its
|
|
|
|
* probing score.
|
|
|
|
* Return the first block driver with the highest probing score.
|
|
|
|
*/
|
raw: Prohibit dangerous writes for probed images
If the user neglects to specify the image format, QEMU probes the
image to guess it automatically, for convenience.
Relying on format probing is insecure for raw images (CVE-2008-2004).
If the guest writes a suitable header to the device, the next probe
will recognize a format chosen by the guest. A malicious guest can
abuse this to gain access to host files, e.g. by crafting a QCOW2
header with backing file /etc/shadow.
Commit 1e72d3b (April 2008) provided -drive parameter format to let
users disable probing. Commit f965509 (March 2009) extended QCOW2 to
optionally store the backing file format, to let users disable backing
file probing. QED has had a flag to suppress probing since the
beginning (2010), set whenever a raw backing file is assigned.
All of these additions that allow to avoid format probing have to be
specified explicitly. The default still allows the attack.
In order to fix this, commit 79368c8 (July 2010) put probed raw images
in a restricted mode, in which they wouldn't be able to overwrite the
first few bytes of the image so that they would identify as a different
image. If a write to the first sector would write one of the signatures
of another driver, qemu would instead zero out the first four bytes.
This patch was later reverted in commit 8b33d9e (September 2010) because
it didn't get the handling of unaligned qiov members right.
Today's block layer that is based on coroutines and has qiov utility
functions makes it much easier to get this functionality right, so this
patch implements it.
The other differences of this patch to the old one are that it doesn't
silently write something different than the guest requested by zeroing
out some bytes (it fails the request instead) and that it doesn't
maintain a list of signatures in the raw driver (it calls the usual
probe function instead).
Note that this change doesn't introduce new breakage for false positive
cases where the guest legitimately writes data into the first sector
that matches the signatures of an image format (e.g. for nested virt):
These cases were broken before, only the failure mode changes from
corruption after the next restart (when the wrong format is probed) to
failing the problematic write request.
Also note that like in the original patch, the restrictions only apply
if the image format has been guessed by probing. Explicitly specifying a
format allows guests to write anything they like.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-11-20 18:27:12 +03:00
|
|
|
BlockDriver *bdrv_probe_all(const uint8_t *buf, int buf_size,
|
|
|
|
const char *filename)
|
2014-11-20 18:27:10 +03:00
|
|
|
{
|
|
|
|
int score_max = 0, score;
|
|
|
|
BlockDriver *drv = NULL, *d;
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
2014-11-20 18:27:10 +03:00
|
|
|
|
|
|
|
QLIST_FOREACH(d, &bdrv_drivers, list) {
|
|
|
|
if (d->bdrv_probe) {
|
|
|
|
score = d->bdrv_probe(buf, buf_size, filename);
|
|
|
|
if (score > score_max) {
|
|
|
|
score_max = score;
|
|
|
|
drv = d;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return drv;
|
|
|
|
}
|
|
|
|
|
2017-02-17 20:39:24 +03:00
|
|
|
static int find_image_format(BlockBackend *file, const char *filename,
|
2013-09-05 16:45:29 +04:00
|
|
|
BlockDriver **pdrv, Error **errp)
|
2009-06-15 15:55:19 +04:00
|
|
|
{
|
2014-11-20 18:27:10 +03:00
|
|
|
BlockDriver *drv;
|
2014-11-20 18:27:11 +03:00
|
|
|
uint8_t buf[BLOCK_PROBE_BUF_SIZE];
|
2012-11-12 20:35:27 +04:00
|
|
|
int ret = 0;
|
2010-05-17 20:45:57 +04:00
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2010-06-01 20:37:31 +04:00
|
|
|
/* Return the raw BlockDriver * to scsi-generic devices or empty drives */
|
2017-02-17 20:39:24 +03:00
|
|
|
if (blk_is_sg(file) || !blk_is_inserted(file) || blk_getlength(file) == 0) {
|
2014-12-02 20:32:42 +03:00
|
|
|
*pdrv = &bdrv_raw;
|
2010-07-21 23:51:51 +04:00
|
|
|
return ret;
|
2010-05-27 19:56:28 +04:00
|
|
|
}
|
2010-05-17 20:45:57 +04:00
|
|
|
|
block: Change blk_{pread,pwrite}() param order
Swap 'buf' and 'bytes' around for consistency with
blk_co_{pread,pwrite}(), and in preparation to implement these functions
using generated_co_wrapper.
Callers were updated using this Coccinelle script:
@@ expression blk, offset, buf, bytes, flags; @@
- blk_pread(blk, offset, buf, bytes, flags)
+ blk_pread(blk, offset, bytes, buf, flags)
@@ expression blk, offset, buf, bytes, flags; @@
- blk_pwrite(blk, offset, buf, bytes, flags)
+ blk_pwrite(blk, offset, bytes, buf, flags)
It had no effect on hw/block/nand.c, presumably due to the #if, so that
file was updated manually.
Overly-long lines were then fixed by hand.
Signed-off-by: Alberto Faria <afaria@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220705161527.1054072-4-afaria@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-07-05 19:15:11 +03:00
|
|
|
ret = blk_pread(file, 0, sizeof(buf), buf, 0);
|
2006-08-01 20:21:11 +04:00
|
|
|
if (ret < 0) {
|
2013-09-05 16:45:29 +04:00
|
|
|
error_setg_errno(errp, -ret, "Could not read image for determining its "
|
|
|
|
"format");
|
2010-07-21 23:51:51 +04:00
|
|
|
*pdrv = NULL;
|
|
|
|
return ret;
|
2006-08-01 20:21:11 +04:00
|
|
|
}
|
|
|
|
|
2022-07-05 19:15:09 +03:00
|
|
|
drv = bdrv_probe_all(buf, sizeof(buf), filename);
|
2010-07-21 23:51:51 +04:00
|
|
|
if (!drv) {
|
2013-09-05 16:45:29 +04:00
|
|
|
error_setg(errp, "Could not determine image format: No compatible "
|
|
|
|
"driver found");
|
2022-07-05 19:15:09 +03:00
|
|
|
*pdrv = NULL;
|
|
|
|
return -ENOENT;
|
2010-07-21 23:51:51 +04:00
|
|
|
}
|
2022-07-05 19:15:09 +03:00
|
|
|
|
2010-07-21 23:51:51 +04:00
|
|
|
*pdrv = drv;
|
2022-07-05 19:15:09 +03:00
|
|
|
return 0;
|
2004-08-02 01:59:26 +04:00
|
|
|
}
|
|
|
|
|
2010-04-19 19:56:41 +04:00
|
|
|
/**
|
|
|
|
* Set the current 'total_sectors' value
|
2014-06-26 15:23:17 +04:00
|
|
|
* Return 0 on success, -errno on error.
|
2010-04-19 19:56:41 +04:00
|
|
|
*/
|
2023-01-13 23:42:04 +03:00
|
|
|
int coroutine_fn bdrv_co_refresh_total_sectors(BlockDriverState *bs,
|
|
|
|
int64_t hint)
|
2010-04-19 19:56:41 +04:00
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
2023-02-03 18:22:02 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2010-04-19 19:56:41 +04:00
|
|
|
|
2017-11-10 23:31:09 +03:00
|
|
|
if (!drv) {
|
|
|
|
return -ENOMEDIUM;
|
|
|
|
}
|
|
|
|
|
2023-01-13 23:42:04 +03:00
|
|
|
/* Do not attempt drv->bdrv_co_getlength() on scsi-generic devices */
|
2015-06-23 13:44:56 +03:00
|
|
|
if (bdrv_is_sg(bs))
|
2010-05-17 20:46:04 +04:00
|
|
|
return 0;
|
|
|
|
|
2010-04-19 19:56:41 +04:00
|
|
|
/* query actual device if possible, otherwise just trust the hint */
|
2023-01-13 23:42:04 +03:00
|
|
|
if (drv->bdrv_co_getlength) {
|
|
|
|
int64_t length = drv->bdrv_co_getlength(bs);
|
2010-04-19 19:56:41 +04:00
|
|
|
if (length < 0) {
|
|
|
|
return length;
|
|
|
|
}
|
2013-11-06 15:48:06 +04:00
|
|
|
hint = DIV_ROUND_UP(length, BDRV_SECTOR_SIZE);
|
2010-04-19 19:56:41 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
bs->total_sectors = hint;
|
block: introduce BDRV_MAX_LENGTH
We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.
It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.
So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.
And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.
Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.
Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.
So, alignment can't be greater than 2^30.
Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.
OK, then, let's apply it to block/io.
Let's consider all block/io entry points of offset/bytes:
4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().
We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.
And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:
file-posix.c: adopted by this patch
write-threshold.c: only read fields
test-write-threshold.c: sets obviously small constant values
Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.
After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.
For all io requests except for discard we keep for now old restriction
of 32bit request length.
iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-04 01:27:13 +03:00
|
|
|
|
|
|
|
if (bs->total_sectors * BDRV_SECTOR_SIZE > BDRV_MAX_LENGTH) {
|
|
|
|
return -EFBIG;
|
|
|
|
}
|
|
|
|
|
2010-04-19 19:56:41 +04:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-11-16 18:43:27 +03:00
|
|
|
/**
|
|
|
|
* Combines a QDict of new block driver @options with any missing options taken
|
|
|
|
* from @old_options, so that leaving out an option defaults to its old value.
|
|
|
|
*/
|
|
|
|
static void bdrv_join_options(BlockDriverState *bs, QDict *options,
|
|
|
|
QDict *old_options)
|
|
|
|
{
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2015-11-16 18:43:27 +03:00
|
|
|
if (bs->drv && bs->drv->bdrv_join_options) {
|
|
|
|
bs->drv->bdrv_join_options(options, old_options);
|
|
|
|
} else {
|
|
|
|
qdict_join(options, old_options, false);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-09-06 12:37:09 +03:00
|
|
|
static BlockdevDetectZeroesOptions bdrv_parse_detect_zeroes(QemuOpts *opts,
|
|
|
|
int open_flags,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
Error *local_err = NULL;
|
|
|
|
char *value = qemu_opt_get_del(opts, "detect-zeroes");
|
|
|
|
BlockdevDetectZeroesOptions detect_zeroes =
|
|
|
|
qapi_enum_parse(&BlockdevDetectZeroesOptions_lookup, value,
|
|
|
|
BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, &local_err);
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2018-09-06 12:37:09 +03:00
|
|
|
g_free(value);
|
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
return detect_zeroes;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP &&
|
|
|
|
!(open_flags & BDRV_O_UNMAP))
|
|
|
|
{
|
|
|
|
error_setg(errp, "setting detect-zeroes to unmap is not allowed "
|
|
|
|
"without setting discard operation to unmap");
|
|
|
|
}
|
|
|
|
|
|
|
|
return detect_zeroes;
|
|
|
|
}
|
|
|
|
|
2020-01-20 17:18:50 +03:00
|
|
|
/**
|
|
|
|
* Set open flags for aio engine
|
|
|
|
*
|
|
|
|
* Return 0 on success, -1 if the engine specified is invalid
|
|
|
|
*/
|
|
|
|
int bdrv_parse_aio(const char *mode, int *flags)
|
|
|
|
{
|
|
|
|
if (!strcmp(mode, "threads")) {
|
|
|
|
/* do nothing, default */
|
|
|
|
} else if (!strcmp(mode, "native")) {
|
|
|
|
*flags |= BDRV_O_NATIVE_AIO;
|
|
|
|
#ifdef CONFIG_LINUX_IO_URING
|
|
|
|
} else if (!strcmp(mode, "io_uring")) {
|
|
|
|
*flags |= BDRV_O_IO_URING;
|
|
|
|
#endif
|
|
|
|
} else {
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-02-08 17:06:11 +04:00
|
|
|
/**
|
|
|
|
* Set open flags for a given discard mode
|
|
|
|
*
|
|
|
|
* Return 0 on success, -1 if the discard mode was invalid.
|
|
|
|
*/
|
|
|
|
int bdrv_parse_discard_flags(const char *mode, int *flags)
|
|
|
|
{
|
|
|
|
*flags &= ~BDRV_O_UNMAP;
|
|
|
|
|
|
|
|
if (!strcmp(mode, "off") || !strcmp(mode, "ignore")) {
|
|
|
|
/* do nothing */
|
|
|
|
} else if (!strcmp(mode, "on") || !strcmp(mode, "unmap")) {
|
|
|
|
*flags |= BDRV_O_UNMAP;
|
|
|
|
} else {
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-08-04 15:26:51 +04:00
|
|
|
/**
|
|
|
|
* Set open flags for a given cache mode
|
|
|
|
*
|
|
|
|
* Return 0 on success, -1 if the cache mode was invalid.
|
|
|
|
*/
|
2016-03-18 17:36:58 +03:00
|
|
|
int bdrv_parse_cache_mode(const char *mode, int *flags, bool *writethrough)
|
2011-08-04 15:26:51 +04:00
|
|
|
{
|
|
|
|
*flags &= ~BDRV_O_CACHE_MASK;
|
|
|
|
|
|
|
|
if (!strcmp(mode, "off") || !strcmp(mode, "none")) {
|
2016-03-18 17:36:58 +03:00
|
|
|
*writethrough = false;
|
|
|
|
*flags |= BDRV_O_NOCACHE;
|
2011-08-04 15:26:52 +04:00
|
|
|
} else if (!strcmp(mode, "directsync")) {
|
2016-03-18 17:36:58 +03:00
|
|
|
*writethrough = true;
|
2011-08-04 15:26:52 +04:00
|
|
|
*flags |= BDRV_O_NOCACHE;
|
2011-08-04 15:26:51 +04:00
|
|
|
} else if (!strcmp(mode, "writeback")) {
|
2016-03-18 17:36:58 +03:00
|
|
|
*writethrough = false;
|
2011-08-04 15:26:51 +04:00
|
|
|
} else if (!strcmp(mode, "unsafe")) {
|
2016-03-18 17:36:58 +03:00
|
|
|
*writethrough = false;
|
2011-08-04 15:26:51 +04:00
|
|
|
*flags |= BDRV_O_NO_FLUSH;
|
|
|
|
} else if (!strcmp(mode, "writethrough")) {
|
2016-03-18 17:36:58 +03:00
|
|
|
*writethrough = true;
|
2011-08-04 15:26:51 +04:00
|
|
|
} else {
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-01-17 17:56:16 +03:00
|
|
|
static char *bdrv_child_get_parent_desc(BdrvChild *c)
|
|
|
|
{
|
|
|
|
BlockDriverState *parent = c->opaque;
|
2021-06-01 10:52:15 +03:00
|
|
|
return g_strdup_printf("node '%s'", bdrv_get_node_name(parent));
|
2017-01-17 17:56:16 +03:00
|
|
|
}
|
|
|
|
|
2016-05-23 19:46:59 +03:00
|
|
|
static void bdrv_child_cb_drained_begin(BdrvChild *child)
|
|
|
|
{
|
|
|
|
BlockDriverState *bs = child->opaque;
|
2022-11-18 20:41:07 +03:00
|
|
|
bdrv_do_drained_begin_quiesce(bs, NULL);
|
2016-05-23 19:46:59 +03:00
|
|
|
}
|
|
|
|
|
2018-03-22 16:11:20 +03:00
|
|
|
static bool bdrv_child_cb_drained_poll(BdrvChild *child)
|
|
|
|
{
|
|
|
|
BlockDriverState *bs = child->opaque;
|
2022-11-18 20:41:05 +03:00
|
|
|
return bdrv_drain_poll(bs, NULL, false);
|
2018-03-22 16:11:20 +03:00
|
|
|
}
|
|
|
|
|
2022-11-18 20:40:59 +03:00
|
|
|
static void bdrv_child_cb_drained_end(BdrvChild *child)
|
2016-05-23 19:46:59 +03:00
|
|
|
{
|
|
|
|
BlockDriverState *bs = child->opaque;
|
2022-11-18 20:40:59 +03:00
|
|
|
bdrv_drained_end(bs);
|
2016-05-23 19:46:59 +03:00
|
|
|
}
|
|
|
|
|
2017-05-04 19:52:39 +03:00
|
|
|
static int bdrv_child_cb_inactivate(BdrvChild *child)
|
|
|
|
{
|
|
|
|
BlockDriverState *bs = child->opaque;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2017-05-04 19:52:39 +03:00
|
|
|
assert(bs->open_flags & BDRV_O_INACTIVE);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-10-25 11:49:47 +03:00
|
|
|
static bool bdrv_child_cb_change_aio_ctx(BdrvChild *child, AioContext *ctx,
|
|
|
|
GHashTable *visited, Transaction *tran,
|
|
|
|
Error **errp)
|
2019-05-06 20:17:59 +03:00
|
|
|
{
|
|
|
|
BlockDriverState *bs = child->opaque;
|
2022-10-25 11:49:47 +03:00
|
|
|
return bdrv_change_aio_context(bs, ctx, visited, tran, errp);
|
2019-05-06 20:17:59 +03:00
|
|
|
}
|
|
|
|
|
2014-05-06 14:11:42 +04:00
|
|
|
/*
|
2016-03-07 15:02:15 +03:00
|
|
|
* Returns the options and flags that a temporary snapshot should get, based on
|
|
|
|
* the originally requested flags (the originally requested image will have
|
|
|
|
* flags like a backing file)
|
2014-05-06 14:11:42 +04:00
|
|
|
*/
|
2016-03-07 15:02:15 +03:00
|
|
|
static void bdrv_temp_snapshot_options(int *child_flags, QDict *child_options,
|
|
|
|
int parent_flags, QDict *parent_options)
|
2014-05-06 14:11:42 +04:00
|
|
|
{
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2016-03-07 15:02:15 +03:00
|
|
|
*child_flags = (parent_flags & ~BDRV_O_SNAPSHOT) | BDRV_O_TEMPORARY;
|
|
|
|
|
|
|
|
/* For temporary files, unconditional cache=unsafe is fine */
|
|
|
|
qdict_set_default_str(child_options, BDRV_OPT_CACHE_DIRECT, "off");
|
|
|
|
qdict_set_default_str(child_options, BDRV_OPT_CACHE_NO_FLUSH, "on");
|
2016-06-16 13:59:30 +03:00
|
|
|
|
2019-04-04 18:04:43 +03:00
|
|
|
/* Copy the read-only and discard options from the parent */
|
2016-09-15 17:53:02 +03:00
|
|
|
qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
|
2019-04-04 18:04:43 +03:00
|
|
|
qdict_copy_default(child_options, parent_options, BDRV_OPT_DISCARD);
|
2016-09-15 17:53:02 +03:00
|
|
|
|
2016-06-16 13:59:30 +03:00
|
|
|
/* aio=native doesn't work for cache.direct=off, so disable it for the
|
|
|
|
* temporary snapshot */
|
|
|
|
*child_flags &= ~BDRV_O_NATIVE_AIO;
|
2014-05-06 14:11:42 +04:00
|
|
|
}
|
|
|
|
|
2017-02-08 13:28:52 +03:00
|
|
|
static void bdrv_backing_attach(BdrvChild *c)
|
|
|
|
{
|
|
|
|
BlockDriverState *parent = c->opaque;
|
|
|
|
BlockDriverState *backing_hd = c->bs;
|
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2017-02-08 13:28:52 +03:00
|
|
|
assert(!parent->backing_blocker);
|
|
|
|
error_setg(&parent->backing_blocker,
|
|
|
|
"node is used as backing hd of '%s'",
|
|
|
|
bdrv_get_device_or_node_name(parent));
|
|
|
|
|
block: Use bdrv_refresh_filename() to pull
Before this patch, bdrv_refresh_filename() is used in a pushing manner:
Whenever the BDS graph is modified, the parents of the modified edges
are supposed to be updated (recursively upwards). However, that is
nonviable, considering that we want child changes not to concern
parents.
Also, in the long run we want a pull model anyway: Here, we would have a
bdrv_filename() function which returns a BDS's filename, freshly
constructed.
This patch is an intermediate step. It adds bdrv_refresh_filename()
calls before every place a BDS.filename value is used. The only
exceptions are protocol drivers that use their own filename, which
clearly would not profit from refreshing that filename before.
Also, bdrv_get_encrypted_filename() is removed along the way (as a user
of BDS.filename), since it is completely unused.
In turn, all of the calls to bdrv_refresh_filename() before this patch
are removed, because we no longer have to call this function on graph
changes.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:05 +03:00
|
|
|
bdrv_refresh_filename(backing_hd);
|
|
|
|
|
2017-02-08 13:28:52 +03:00
|
|
|
parent->open_flags &= ~BDRV_O_NO_BACKING;
|
|
|
|
|
|
|
|
bdrv_op_block_all(backing_hd, parent->backing_blocker);
|
|
|
|
/* Otherwise we won't be able to commit or stream */
|
|
|
|
bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
|
|
|
|
parent->backing_blocker);
|
|
|
|
bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_STREAM,
|
|
|
|
parent->backing_blocker);
|
|
|
|
/*
|
|
|
|
* We do backup in 3 ways:
|
|
|
|
* 1. drive backup
|
|
|
|
* The target bs is new opened, and the source is top BDS
|
|
|
|
* 2. blockdev backup
|
|
|
|
* Both the source and the target are top BDSes.
|
|
|
|
* 3. internal backup(used for block replication)
|
|
|
|
* Both the source and the target are backing file
|
|
|
|
*
|
|
|
|
* In case 1 and 2, neither the source nor the target is the backing file.
|
|
|
|
* In case 3, we will block the top BDS, so there is only one block job
|
|
|
|
* for the top BDS and its backing chain.
|
|
|
|
*/
|
|
|
|
bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
|
|
|
|
parent->backing_blocker);
|
|
|
|
bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
|
|
|
|
parent->backing_blocker);
|
2020-05-13 14:05:22 +03:00
|
|
|
}
|
2017-12-18 18:05:48 +03:00
|
|
|
|
2017-02-08 13:28:52 +03:00
|
|
|
static void bdrv_backing_detach(BdrvChild *c)
|
|
|
|
{
|
|
|
|
BlockDriverState *parent = c->opaque;
|
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2017-02-08 13:28:52 +03:00
|
|
|
assert(parent->backing_blocker);
|
|
|
|
bdrv_op_unblock_all(c->bs, parent->backing_blocker);
|
|
|
|
error_free(parent->backing_blocker);
|
|
|
|
parent->backing_blocker = NULL;
|
2020-05-13 14:05:23 +03:00
|
|
|
}
|
2017-12-18 18:05:48 +03:00
|
|
|
|
2017-06-29 20:32:21 +03:00
|
|
|
static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,
|
|
|
|
const char *filename, Error **errp)
|
|
|
|
{
|
|
|
|
BlockDriverState *parent = c->opaque;
|
2018-11-12 17:00:34 +03:00
|
|
|
bool read_only = bdrv_is_read_only(parent);
|
2017-06-29 20:32:21 +03:00
|
|
|
int ret;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2017-06-29 20:32:21 +03:00
|
|
|
|
2018-11-12 17:00:34 +03:00
|
|
|
if (read_only) {
|
|
|
|
ret = bdrv_reopen_set_read_only(parent, false, errp);
|
2017-09-19 17:22:54 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-06-29 20:32:21 +03:00
|
|
|
ret = bdrv_change_backing_file(parent, filename,
|
2020-07-06 23:39:53 +03:00
|
|
|
base->drv ? base->drv->format_name : "",
|
|
|
|
false);
|
2017-06-29 20:32:21 +03:00
|
|
|
if (ret < 0) {
|
2017-11-06 19:52:58 +03:00
|
|
|
error_setg_errno(errp, -ret, "Could not update backing file link");
|
2017-06-29 20:32:21 +03:00
|
|
|
}
|
|
|
|
|
2018-11-12 17:00:34 +03:00
|
|
|
if (read_only) {
|
|
|
|
bdrv_reopen_set_read_only(parent, true, NULL);
|
2017-09-19 17:22:54 +03:00
|
|
|
}
|
|
|
|
|
2017-06-29 20:32:21 +03:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:20 +03:00
|
|
|
/*
|
|
|
|
* Returns the options and flags that a generic child of a BDS should
|
|
|
|
* get, based on the given options and flags for the parent BDS.
|
|
|
|
*/
|
block: Use bdrv_inherited_options()
Let child_file's, child_format's, and child_backing's .inherit_options()
implementations fall back to bdrv_inherited_options() to show that it
would really work for all of these cases, if only the parents passed the
appropriate BdrvChildRole and parent_is_format values.
(Also, make bdrv_open_inherit(), the only place to explicitly call
bdrv_backing_options(), call bdrv_inherited_options() instead.)
This patch should incur only two visible changes, both for child_format
children, both of which are effectively bug fixes:
First, they no longer have discard=unmap set by default. This reason it
was set is because bdrv_inherited_fmt_options() fell through to
bdrv_protocol_options(), and that set it because "format drivers take
care to send flushes and respect unmap policy". None of the drivers
that use child_format for their children (quorum and blkverify) are
format drivers, though, so this reasoning does not apply here.
Second, they no longer have BDRV_O_NO_IO force-cleared. child_format
was used solely for children that do not store any metadata and as such
will not be accessed by their parents as long as those parents do not
receive I/O themselves. Thus, such children should inherit
BDRV_O_NO_IO.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-12-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-13 14:05:21 +03:00
|
|
|
static void bdrv_inherited_options(BdrvChildRole role, bool parent_is_format,
|
|
|
|
int *child_flags, QDict *child_options,
|
|
|
|
int parent_flags, QDict *parent_options)
|
2020-05-13 14:05:20 +03:00
|
|
|
{
|
|
|
|
int flags = parent_flags;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2020-05-13 14:05:20 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* First, decide whether to set, clear, or leave BDRV_O_PROTOCOL.
|
|
|
|
* Generally, the question to answer is: Should this child be
|
|
|
|
* format-probed by default?
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Pure and non-filtered data children of non-format nodes should
|
|
|
|
* be probed by default (even when the node itself has BDRV_O_PROTOCOL
|
|
|
|
* set). This only affects a very limited set of drivers (namely
|
|
|
|
* quorum and blkverify when this comment was written).
|
|
|
|
* Force-clear BDRV_O_PROTOCOL then.
|
|
|
|
*/
|
|
|
|
if (!parent_is_format &&
|
|
|
|
(role & BDRV_CHILD_DATA) &&
|
|
|
|
!(role & (BDRV_CHILD_METADATA | BDRV_CHILD_FILTERED)))
|
|
|
|
{
|
|
|
|
flags &= ~BDRV_O_PROTOCOL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* All children of format nodes (except for COW children) and all
|
|
|
|
* metadata children in general should never be format-probed.
|
|
|
|
* Force-set BDRV_O_PROTOCOL then.
|
|
|
|
*/
|
|
|
|
if ((parent_is_format && !(role & BDRV_CHILD_COW)) ||
|
|
|
|
(role & BDRV_CHILD_METADATA))
|
|
|
|
{
|
|
|
|
flags |= BDRV_O_PROTOCOL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the cache mode isn't explicitly set, inherit direct and no-flush from
|
|
|
|
* the parent.
|
|
|
|
*/
|
|
|
|
qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_DIRECT);
|
|
|
|
qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_NO_FLUSH);
|
|
|
|
qdict_copy_default(child_options, parent_options, BDRV_OPT_FORCE_SHARE);
|
|
|
|
|
|
|
|
if (role & BDRV_CHILD_COW) {
|
|
|
|
/* backing files are opened read-only by default */
|
|
|
|
qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
|
|
|
|
qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
|
|
|
|
} else {
|
|
|
|
/* Inherit the read-only option from the parent if it's not set */
|
|
|
|
qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
|
|
|
|
qdict_copy_default(child_options, parent_options,
|
|
|
|
BDRV_OPT_AUTO_READ_ONLY);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* bdrv_co_pdiscard() respects unmap policy for the parent, so we
|
|
|
|
* can default to enable it on lower layers regardless of the
|
|
|
|
* parent option.
|
|
|
|
*/
|
|
|
|
qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
|
|
|
|
|
|
|
|
/* Clear flags that only apply to the top layer */
|
|
|
|
flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_COPY_ON_READ);
|
|
|
|
|
|
|
|
if (role & BDRV_CHILD_METADATA) {
|
|
|
|
flags &= ~BDRV_O_NO_IO;
|
|
|
|
}
|
|
|
|
if (role & BDRV_CHILD_COW) {
|
|
|
|
flags &= ~BDRV_O_TEMPORARY;
|
|
|
|
}
|
|
|
|
|
|
|
|
*child_flags = flags;
|
|
|
|
}
|
|
|
|
|
2022-12-07 16:18:35 +03:00
|
|
|
static void GRAPH_WRLOCK bdrv_child_cb_attach(BdrvChild *child)
|
2020-05-13 14:05:22 +03:00
|
|
|
{
|
|
|
|
BlockDriverState *bs = child->opaque;
|
|
|
|
|
2022-12-07 16:18:33 +03:00
|
|
|
assert_bdrv_graph_writable();
|
2021-11-15 17:53:58 +03:00
|
|
|
QLIST_INSERT_HEAD(&bs->children, child, next);
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
if (bs->drv->is_filter || (child->role & BDRV_CHILD_FILTERED)) {
|
|
|
|
/*
|
|
|
|
* Here we handle filters and block/raw-format.c when it behave like
|
|
|
|
* filter. They generally have a single PRIMARY child, which is also the
|
|
|
|
* FILTERED child, and that they may have multiple more children, which
|
|
|
|
* are neither PRIMARY nor FILTERED. And never we have a COW child here.
|
|
|
|
* So bs->file will be the PRIMARY child, unless the PRIMARY child goes
|
|
|
|
* into bs->backing on exceptional cases; and bs->backing will be
|
|
|
|
* nothing else.
|
|
|
|
*/
|
|
|
|
assert(!(child->role & BDRV_CHILD_COW));
|
|
|
|
if (child->role & BDRV_CHILD_PRIMARY) {
|
|
|
|
assert(child->role & BDRV_CHILD_FILTERED);
|
|
|
|
assert(!bs->backing);
|
|
|
|
assert(!bs->file);
|
|
|
|
|
|
|
|
if (bs->drv->filtered_child_is_backing) {
|
|
|
|
bs->backing = child;
|
|
|
|
} else {
|
|
|
|
bs->file = child;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
assert(!(child->role & BDRV_CHILD_FILTERED));
|
|
|
|
}
|
|
|
|
} else if (child->role & BDRV_CHILD_COW) {
|
|
|
|
assert(bs->drv->supports_backing);
|
|
|
|
assert(!(child->role & BDRV_CHILD_PRIMARY));
|
|
|
|
assert(!bs->backing);
|
|
|
|
bs->backing = child;
|
2020-05-13 14:05:22 +03:00
|
|
|
bdrv_backing_attach(child);
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
} else if (child->role & BDRV_CHILD_PRIMARY) {
|
|
|
|
assert(!bs->file);
|
|
|
|
bs->file = child;
|
2020-05-13 14:05:22 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-12-07 16:18:35 +03:00
|
|
|
static void GRAPH_WRLOCK bdrv_child_cb_detach(BdrvChild *child)
|
2020-05-13 14:05:23 +03:00
|
|
|
{
|
|
|
|
BlockDriverState *bs = child->opaque;
|
|
|
|
|
|
|
|
if (child->role & BDRV_CHILD_COW) {
|
|
|
|
bdrv_backing_detach(child);
|
|
|
|
}
|
|
|
|
|
2022-12-07 16:18:33 +03:00
|
|
|
assert_bdrv_graph_writable();
|
2021-11-15 17:53:58 +03:00
|
|
|
QLIST_REMOVE(child, next);
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
if (child == bs->backing) {
|
|
|
|
assert(child != bs->file);
|
|
|
|
bs->backing = NULL;
|
|
|
|
} else if (child == bs->file) {
|
|
|
|
bs->file = NULL;
|
|
|
|
}
|
2020-05-13 14:05:23 +03:00
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:24 +03:00
|
|
|
static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
|
|
|
|
const char *filename, Error **errp)
|
|
|
|
{
|
|
|
|
if (c->role & BDRV_CHILD_COW) {
|
|
|
|
return bdrv_backing_update_filename(c, base, filename, errp);
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
block/vvfat: child_vvfat_qcow: add .get_parent_aio_context, fix crash
Commit 3ca1f3225727419ba573673b744edac10904276f
"block: BdrvChildClass: add .get_parent_aio_context handler" introduced
new handler and commit 228ca37e12f97788e05bd0c92f89b3e5e4019607
"block: drop ctx argument from bdrv_root_attach_child" made a generic
use of it. But 3ca1f3225727419ba573673b744edac10904276f didn't update
child_vvfat_qcow. Fix that.
Before that fix the command
./build/qemu-system-x86_64 -usb -device usb-storage,drive=fat16 \
-drive file=fat:rw:fat-type=16:"<path of a host folder>",id=fat16,format=raw,if=none
crashes:
1 bdrv_child_get_parent_aio_context (c=0x559d62426d20)
at ../block.c:1440
2 bdrv_attach_child_common
(child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target",
child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3,
perm=3, shared_perm=4, opaque=0x559d62445690,
child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60)
at ../block.c:2795
3 bdrv_attach_child_noperm
(parent_bs=0x559d62445690, child_bs=0x559d62468190,
child_name=0x559d606f9e3d "write-target",
child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3,
child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at
../block.c:2855
4 bdrv_attach_child
(parent_bs=0x559d62445690, child_bs=0x559d62468190,
child_name=0x559d606f9e3d "write-target",
child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3,
errp=0x7ffc74c2ae60) at ../block.c:2953
5 bdrv_open_child
(filename=0x559d62464b80 "/var/tmp/vl.h3TIS4",
options=0x559d6246ec20, bdref_key=0x559d606f9e3d "write-target",
parent=0x559d62445690, child_class=0x559d60c58d20
<child_vvfat_qcow>, child_role=3, allow_none=false,
errp=0x7ffc74c2ae60) at ../block.c:3351
6 enable_write_target (bs=0x559d62445690, errp=0x7ffc74c2ae60) at
../block/vvfat.c:3176
7 vvfat_open (bs=0x559d62445690, options=0x559d6244adb0, flags=155650,
errp=0x7ffc74c2ae60) at ../block/vvfat.c:1236
8 bdrv_open_driver (bs=0x559d62445690, drv=0x559d60d4f7e0
<bdrv_vvfat>, node_name=0x0,
options=0x559d6244adb0, open_flags=155650,
errp=0x7ffc74c2af70) at ../block.c:1557
9 bdrv_open_common (bs=0x559d62445690, file=0x0,
options=0x559d6244adb0, errp=0x7ffc74c2af70) at
...
(gdb) fr 1
#1 0x0000559d603ea3bf in bdrv_child_get_parent_aio_context
(c=0x559d62426d20) at ../block.c:1440
1440 return c->klass->get_parent_aio_context(c);
(gdb) p c->klass
$1 = (const BdrvChildClass *) 0x559d60c58d20 <child_vvfat_qcow>
(gdb) p c->klass->get_parent_aio_context
$2 = (AioContext *(*)(BdrvChild *)) 0x0
Fixes: 3ca1f3225727419ba573673b744edac10904276f
Fixes: 228ca37e12f97788e05bd0c92f89b3e5e4019607
Reported-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210524101257.119377-2-vsementsov@virtuozzo.com>
Tested-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-05-24 13:12:56 +03:00
|
|
|
AioContext *child_of_bds_get_parent_aio_context(BdrvChild *c)
|
2021-04-28 18:17:33 +03:00
|
|
|
{
|
|
|
|
BlockDriverState *bs = c->opaque;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2021-04-28 18:17:33 +03:00
|
|
|
|
|
|
|
return bdrv_get_aio_context(bs);
|
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:24 +03:00
|
|
|
const BdrvChildClass child_of_bds = {
|
|
|
|
.parent_is_bds = true,
|
|
|
|
.get_parent_desc = bdrv_child_get_parent_desc,
|
|
|
|
.inherit_options = bdrv_inherited_options,
|
|
|
|
.drained_begin = bdrv_child_cb_drained_begin,
|
|
|
|
.drained_poll = bdrv_child_cb_drained_poll,
|
|
|
|
.drained_end = bdrv_child_cb_drained_end,
|
|
|
|
.attach = bdrv_child_cb_attach,
|
|
|
|
.detach = bdrv_child_cb_detach,
|
|
|
|
.inactivate = bdrv_child_cb_inactivate,
|
2022-10-25 11:49:47 +03:00
|
|
|
.change_aio_ctx = bdrv_child_cb_change_aio_ctx,
|
2020-05-13 14:05:24 +03:00
|
|
|
.update_filename = bdrv_child_cb_update_filename,
|
block/vvfat: child_vvfat_qcow: add .get_parent_aio_context, fix crash
Commit 3ca1f3225727419ba573673b744edac10904276f
"block: BdrvChildClass: add .get_parent_aio_context handler" introduced
new handler and commit 228ca37e12f97788e05bd0c92f89b3e5e4019607
"block: drop ctx argument from bdrv_root_attach_child" made a generic
use of it. But 3ca1f3225727419ba573673b744edac10904276f didn't update
child_vvfat_qcow. Fix that.
Before that fix the command
./build/qemu-system-x86_64 -usb -device usb-storage,drive=fat16 \
-drive file=fat:rw:fat-type=16:"<path of a host folder>",id=fat16,format=raw,if=none
crashes:
1 bdrv_child_get_parent_aio_context (c=0x559d62426d20)
at ../block.c:1440
2 bdrv_attach_child_common
(child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target",
child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3,
perm=3, shared_perm=4, opaque=0x559d62445690,
child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60)
at ../block.c:2795
3 bdrv_attach_child_noperm
(parent_bs=0x559d62445690, child_bs=0x559d62468190,
child_name=0x559d606f9e3d "write-target",
child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3,
child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at
../block.c:2855
4 bdrv_attach_child
(parent_bs=0x559d62445690, child_bs=0x559d62468190,
child_name=0x559d606f9e3d "write-target",
child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3,
errp=0x7ffc74c2ae60) at ../block.c:2953
5 bdrv_open_child
(filename=0x559d62464b80 "/var/tmp/vl.h3TIS4",
options=0x559d6246ec20, bdref_key=0x559d606f9e3d "write-target",
parent=0x559d62445690, child_class=0x559d60c58d20
<child_vvfat_qcow>, child_role=3, allow_none=false,
errp=0x7ffc74c2ae60) at ../block.c:3351
6 enable_write_target (bs=0x559d62445690, errp=0x7ffc74c2ae60) at
../block/vvfat.c:3176
7 vvfat_open (bs=0x559d62445690, options=0x559d6244adb0, flags=155650,
errp=0x7ffc74c2ae60) at ../block/vvfat.c:1236
8 bdrv_open_driver (bs=0x559d62445690, drv=0x559d60d4f7e0
<bdrv_vvfat>, node_name=0x0,
options=0x559d6244adb0, open_flags=155650,
errp=0x7ffc74c2af70) at ../block.c:1557
9 bdrv_open_common (bs=0x559d62445690, file=0x0,
options=0x559d6244adb0, errp=0x7ffc74c2af70) at
...
(gdb) fr 1
#1 0x0000559d603ea3bf in bdrv_child_get_parent_aio_context
(c=0x559d62426d20) at ../block.c:1440
1440 return c->klass->get_parent_aio_context(c);
(gdb) p c->klass
$1 = (const BdrvChildClass *) 0x559d60c58d20 <child_vvfat_qcow>
(gdb) p c->klass->get_parent_aio_context
$2 = (AioContext *(*)(BdrvChild *)) 0x0
Fixes: 3ca1f3225727419ba573673b744edac10904276f
Fixes: 228ca37e12f97788e05bd0c92f89b3e5e4019607
Reported-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210524101257.119377-2-vsementsov@virtuozzo.com>
Tested-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-05-24 13:12:56 +03:00
|
|
|
.get_parent_aio_context = child_of_bds_get_parent_aio_context,
|
2020-05-13 14:05:24 +03:00
|
|
|
};
|
|
|
|
|
2021-04-28 18:17:33 +03:00
|
|
|
AioContext *bdrv_child_get_parent_aio_context(BdrvChild *c)
|
|
|
|
{
|
block: Make bdrv_child_get_parent_aio_context I/O
We want to use bdrv_child_get_parent_aio_context() from
bdrv_parent_drained_{begin,end}_single(), both of which are "I/O or GS"
functions.
Prior to 3ed4f708fe1, all the implementations were I/O code anyway.
3ed4f708fe1 has put block jobs' AioContext field under the job mutex, so
to make child_job_get_parent_aio_context() work in an I/O context, we
need to take that lock there.
Furthermore, blk_root_get_parent_aio_context() is not marked as
anything, but is safe to run in an I/O context, so mark it that way now.
(blk_get_aio_context() is an I/O code function.)
With that done, all implementations explicitly are I/O code, so we can
mark bdrv_child_get_parent_aio_context() as I/O code, too, so callers
know it is safe to run from both GS and I/O contexts.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221107151321.211175-2-hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-11-07 18:13:19 +03:00
|
|
|
IO_CODE();
|
2021-04-28 18:17:33 +03:00
|
|
|
return c->klass->get_parent_aio_context(c);
|
|
|
|
}
|
|
|
|
|
2012-11-12 20:05:39 +04:00
|
|
|
static int bdrv_open_flags(BlockDriverState *bs, int flags)
|
|
|
|
{
|
2016-03-18 19:46:45 +03:00
|
|
|
int open_flags = flags;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2012-11-12 20:05:39 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Clear flags that are internal to the block layer before opening the
|
|
|
|
* image.
|
|
|
|
*/
|
2014-06-04 16:33:27 +04:00
|
|
|
open_flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_PROTOCOL);
|
2012-11-12 20:05:39 +04:00
|
|
|
|
|
|
|
return open_flags;
|
|
|
|
}
|
|
|
|
|
2015-05-08 18:49:53 +03:00
|
|
|
static void update_flags_from_options(int *flags, QemuOpts *opts)
|
|
|
|
{
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2018-11-12 17:00:48 +03:00
|
|
|
*flags &= ~(BDRV_O_CACHE_MASK | BDRV_O_RDWR | BDRV_O_AUTO_RDONLY);
|
2015-05-08 18:49:53 +03:00
|
|
|
|
2018-09-06 12:37:06 +03:00
|
|
|
if (qemu_opt_get_bool_del(opts, BDRV_OPT_CACHE_NO_FLUSH, false)) {
|
2015-05-08 18:49:53 +03:00
|
|
|
*flags |= BDRV_O_NO_FLUSH;
|
|
|
|
}
|
|
|
|
|
2018-09-06 12:37:06 +03:00
|
|
|
if (qemu_opt_get_bool_del(opts, BDRV_OPT_CACHE_DIRECT, false)) {
|
2015-05-08 18:49:53 +03:00
|
|
|
*flags |= BDRV_O_NOCACHE;
|
|
|
|
}
|
2016-09-15 17:53:02 +03:00
|
|
|
|
2018-09-06 12:37:06 +03:00
|
|
|
if (!qemu_opt_get_bool_del(opts, BDRV_OPT_READ_ONLY, false)) {
|
2016-09-15 17:53:02 +03:00
|
|
|
*flags |= BDRV_O_RDWR;
|
|
|
|
}
|
|
|
|
|
block: Add auto-read-only option
If a management application builds the block graph node by node, the
protocol layer doesn't inherit its read-only option from the format
layer any more, so it must be set explicitly.
Backing files should work on read-only storage, but at the same time, a
block job like commit should be able to reopen them read-write if they
are on read-write storage. However, without option inheritance, reopen
only changes the read-only option for the root node (typically the
format layer), but not the protocol layer, so reopening fails (the
format layer wants to get write permissions, but the protocol layer is
still read-only).
A simple workaround for the problem in the management tool would be to
open the protocol layer always read-write and to make only the format
layer read-only for backing files. However, sometimes the file is
actually stored on read-only storage and we don't know whether the image
can be opened read-write (for example, for NBD it depends on the server
we're trying to connect to). This adds an option that makes QEMU try to
open the image read-write, but allows it to degrade to a read-only mode
without returning an error.
The documentation for this option is consciously phrased in a way that
allows QEMU to switch to a better model eventually: Instead of trying
when the image is first opened, making the read-only flag dynamic and
changing it automatically whenever the first BLK_PERM_WRITE user is
attached or the last one is detached would be much more useful
behaviour.
Unfortunately, this more useful behaviour is also a lot harder to
implement, and libvirt needs a solution now before it can switch to
-blockdev, so let's start with this easier approach for now.
Instead of adding a new auto-read-only option, turning the existing
read-only into an enum (with a bool alternate for compatibility) was
considered, but it complicated the implementation to the point that it
didn't seem to be worth it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-05 19:57:40 +03:00
|
|
|
if (qemu_opt_get_bool_del(opts, BDRV_OPT_AUTO_READ_ONLY, false)) {
|
|
|
|
*flags |= BDRV_O_AUTO_RDONLY;
|
|
|
|
}
|
2015-05-08 18:49:53 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static void update_options_from_flags(QDict *options, int flags)
|
|
|
|
{
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2015-05-08 18:49:53 +03:00
|
|
|
if (!qdict_haskey(options, BDRV_OPT_CACHE_DIRECT)) {
|
2017-04-28 00:58:17 +03:00
|
|
|
qdict_put_bool(options, BDRV_OPT_CACHE_DIRECT, flags & BDRV_O_NOCACHE);
|
2015-05-08 18:49:53 +03:00
|
|
|
}
|
|
|
|
if (!qdict_haskey(options, BDRV_OPT_CACHE_NO_FLUSH)) {
|
2017-04-28 00:58:17 +03:00
|
|
|
qdict_put_bool(options, BDRV_OPT_CACHE_NO_FLUSH,
|
|
|
|
flags & BDRV_O_NO_FLUSH);
|
2015-05-08 18:49:53 +03:00
|
|
|
}
|
2016-09-15 17:53:02 +03:00
|
|
|
if (!qdict_haskey(options, BDRV_OPT_READ_ONLY)) {
|
2017-04-28 00:58:17 +03:00
|
|
|
qdict_put_bool(options, BDRV_OPT_READ_ONLY, !(flags & BDRV_O_RDWR));
|
2016-09-15 17:53:02 +03:00
|
|
|
}
|
block: Add auto-read-only option
If a management application builds the block graph node by node, the
protocol layer doesn't inherit its read-only option from the format
layer any more, so it must be set explicitly.
Backing files should work on read-only storage, but at the same time, a
block job like commit should be able to reopen them read-write if they
are on read-write storage. However, without option inheritance, reopen
only changes the read-only option for the root node (typically the
format layer), but not the protocol layer, so reopening fails (the
format layer wants to get write permissions, but the protocol layer is
still read-only).
A simple workaround for the problem in the management tool would be to
open the protocol layer always read-write and to make only the format
layer read-only for backing files. However, sometimes the file is
actually stored on read-only storage and we don't know whether the image
can be opened read-write (for example, for NBD it depends on the server
we're trying to connect to). This adds an option that makes QEMU try to
open the image read-write, but allows it to degrade to a read-only mode
without returning an error.
The documentation for this option is consciously phrased in a way that
allows QEMU to switch to a better model eventually: Instead of trying
when the image is first opened, making the read-only flag dynamic and
changing it automatically whenever the first BLK_PERM_WRITE user is
attached or the last one is detached would be much more useful
behaviour.
Unfortunately, this more useful behaviour is also a lot harder to
implement, and libvirt needs a solution now before it can switch to
-blockdev, so let's start with this easier approach for now.
Instead of adding a new auto-read-only option, turning the existing
read-only into an enum (with a bool alternate for compatibility) was
considered, but it complicated the implementation to the point that it
didn't seem to be worth it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-05 19:57:40 +03:00
|
|
|
if (!qdict_haskey(options, BDRV_OPT_AUTO_READ_ONLY)) {
|
|
|
|
qdict_put_bool(options, BDRV_OPT_AUTO_READ_ONLY,
|
|
|
|
flags & BDRV_O_AUTO_RDONLY);
|
|
|
|
}
|
2015-05-08 18:49:53 +03:00
|
|
|
}
|
|
|
|
|
2014-01-24 17:11:52 +04:00
|
|
|
static void bdrv_assign_node_name(BlockDriverState *bs,
|
|
|
|
const char *node_name,
|
|
|
|
Error **errp)
|
2014-01-24 00:31:33 +04:00
|
|
|
{
|
2015-10-13 02:36:50 +03:00
|
|
|
char *gen_node_name = NULL;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-01-24 00:31:33 +04:00
|
|
|
|
2015-10-13 02:36:50 +03:00
|
|
|
if (!node_name) {
|
|
|
|
node_name = gen_node_name = id_generate(ID_BLOCK);
|
|
|
|
} else if (!id_wellformed(node_name)) {
|
|
|
|
/*
|
|
|
|
* Check for empty string or invalid characters, but not if it is
|
|
|
|
* generated (generated names use characters not available to the user)
|
|
|
|
*/
|
2021-03-05 18:19:28 +03:00
|
|
|
error_setg(errp, "Invalid node-name: '%s'", node_name);
|
2014-01-24 17:11:52 +04:00
|
|
|
return;
|
2014-01-24 00:31:33 +04:00
|
|
|
}
|
|
|
|
|
2014-02-12 20:15:07 +04:00
|
|
|
/* takes care of avoiding namespaces collisions */
|
2014-10-07 15:59:12 +04:00
|
|
|
if (blk_by_name(node_name)) {
|
2014-02-12 20:15:07 +04:00
|
|
|
error_setg(errp, "node-name=%s is conflicting with a device id",
|
|
|
|
node_name);
|
2015-10-13 02:36:50 +03:00
|
|
|
goto out;
|
2014-02-12 20:15:07 +04:00
|
|
|
}
|
|
|
|
|
2014-01-24 00:31:33 +04:00
|
|
|
/* takes care of avoiding duplicates node names */
|
|
|
|
if (bdrv_find_node(node_name)) {
|
2021-03-05 18:19:28 +03:00
|
|
|
error_setg(errp, "Duplicate nodes with node-name='%s'", node_name);
|
2015-10-13 02:36:50 +03:00
|
|
|
goto out;
|
2014-01-24 00:31:33 +04:00
|
|
|
}
|
|
|
|
|
2018-07-04 14:28:29 +03:00
|
|
|
/* Make sure that the node name isn't truncated */
|
|
|
|
if (strlen(node_name) >= sizeof(bs->node_name)) {
|
|
|
|
error_setg(errp, "Node name too long");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2014-01-24 00:31:33 +04:00
|
|
|
/* copy node name into the bs and insert it into the graph list */
|
|
|
|
pstrcpy(bs->node_name, sizeof(bs->node_name), node_name);
|
|
|
|
QTAILQ_INSERT_TAIL(&graph_bdrv_states, bs, node_list);
|
2015-10-13 02:36:50 +03:00
|
|
|
out:
|
|
|
|
g_free(gen_node_name);
|
2014-01-24 00:31:33 +04:00
|
|
|
}
|
|
|
|
|
2023-01-13 23:42:04 +03:00
|
|
|
/*
|
|
|
|
* The caller must always hold @bs AioContext lock, because this function calls
|
|
|
|
* bdrv_refresh_total_sectors() which polls when called from non-coroutine
|
|
|
|
* context.
|
|
|
|
*/
|
2023-05-04 14:57:38 +03:00
|
|
|
static int no_coroutine_fn GRAPH_UNLOCKED
|
|
|
|
bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
|
|
|
|
QDict *options, int open_flags, Error **errp)
|
2017-01-18 17:51:56 +03:00
|
|
|
{
|
|
|
|
Error *local_err = NULL;
|
2018-03-28 19:29:18 +03:00
|
|
|
int i, ret;
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2017-01-18 17:51:56 +03:00
|
|
|
|
|
|
|
bdrv_assign_node_name(bs, node_name, &local_err);
|
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
bs->drv = drv;
|
|
|
|
bs->opaque = g_malloc0(drv->instance_size);
|
|
|
|
|
|
|
|
if (drv->bdrv_file_open) {
|
|
|
|
assert(!drv->bdrv_needs_filename || bs->filename[0]);
|
|
|
|
ret = drv->bdrv_file_open(bs, options, open_flags, &local_err);
|
2017-01-18 19:16:41 +03:00
|
|
|
} else if (drv->bdrv_open) {
|
2017-01-18 17:51:56 +03:00
|
|
|
ret = drv->bdrv_open(bs, options, open_flags, &local_err);
|
2017-01-18 19:16:41 +03:00
|
|
|
} else {
|
|
|
|
ret = 0;
|
2017-01-18 17:51:56 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (ret < 0) {
|
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
} else if (bs->filename[0]) {
|
|
|
|
error_setg_errno(errp, -ret, "Could not open '%s'", bs->filename);
|
|
|
|
} else {
|
|
|
|
error_setg_errno(errp, -ret, "Could not open image");
|
|
|
|
}
|
2017-07-14 17:35:48 +03:00
|
|
|
goto open_failed;
|
2017-01-18 17:51:56 +03:00
|
|
|
}
|
|
|
|
|
2022-10-13 21:59:01 +03:00
|
|
|
assert(!(bs->supported_read_flags & ~BDRV_REQ_MASK));
|
|
|
|
assert(!(bs->supported_write_flags & ~BDRV_REQ_MASK));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Always allow the BDRV_REQ_REGISTERED_BUF optimization hint. This saves
|
|
|
|
* drivers that pass read/write requests through to a child the trouble of
|
|
|
|
* declaring support explicitly.
|
|
|
|
*
|
|
|
|
* Drivers must not propagate this flag accidentally when they initiate I/O
|
|
|
|
* to a bounce buffer. That case should be rare though.
|
|
|
|
*/
|
|
|
|
bs->supported_read_flags |= BDRV_REQ_REGISTERED_BUF;
|
|
|
|
bs->supported_write_flags |= BDRV_REQ_REGISTERED_BUF;
|
|
|
|
|
2023-01-13 23:42:03 +03:00
|
|
|
ret = bdrv_refresh_total_sectors(bs, bs->total_sectors);
|
2017-01-18 17:51:56 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
error_setg_errno(errp, -ret, "Could not refresh total sector count");
|
2017-07-14 17:35:48 +03:00
|
|
|
return ret;
|
2017-01-18 17:51:56 +03:00
|
|
|
}
|
|
|
|
|
2023-05-04 14:57:50 +03:00
|
|
|
bdrv_graph_rdlock_main_loop();
|
2021-04-28 18:17:55 +03:00
|
|
|
bdrv_refresh_limits(bs, NULL, &local_err);
|
2023-05-04 14:57:50 +03:00
|
|
|
bdrv_graph_rdunlock_main_loop();
|
|
|
|
|
2017-01-18 17:51:56 +03:00
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
2017-07-14 17:35:48 +03:00
|
|
|
return -EINVAL;
|
2017-01-18 17:51:56 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
assert(bdrv_opt_mem_align(bs) != 0);
|
|
|
|
assert(bdrv_min_mem_align(bs) != 0);
|
|
|
|
assert(is_power_of_2(bs->bl.request_alignment));
|
|
|
|
|
2018-03-28 19:29:18 +03:00
|
|
|
for (i = 0; i < bs->quiesce_counter; i++) {
|
2022-11-18 20:40:58 +03:00
|
|
|
if (drv->bdrv_drain_begin) {
|
|
|
|
drv->bdrv_drain_begin(bs);
|
2018-03-28 19:29:18 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-01-18 17:51:56 +03:00
|
|
|
return 0;
|
2017-07-14 17:35:48 +03:00
|
|
|
open_failed:
|
|
|
|
bs->drv = NULL;
|
|
|
|
if (bs->file != NULL) {
|
|
|
|
bdrv_unref_child(bs, bs->file);
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
assert(!bs->file);
|
2017-07-14 17:35:48 +03:00
|
|
|
}
|
2017-01-18 17:51:56 +03:00
|
|
|
g_free(bs->opaque);
|
|
|
|
bs->opaque = NULL;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-09-20 14:55:34 +03:00
|
|
|
/*
|
|
|
|
* Create and open a block node.
|
|
|
|
*
|
|
|
|
* @options is a QDict of options to pass to the block drivers, or NULL for an
|
|
|
|
* empty set of options. The reference to the QDict belongs to the block layer
|
|
|
|
* after the call (even on failure), so if the caller intends to reuse the
|
|
|
|
* dictionary, it needs to use qobject_ref() before calling bdrv_open.
|
|
|
|
*/
|
|
|
|
BlockDriverState *bdrv_new_open_driver_opts(BlockDriver *drv,
|
|
|
|
const char *node_name,
|
|
|
|
QDict *options, int flags,
|
|
|
|
Error **errp)
|
2017-01-18 19:16:41 +03:00
|
|
|
{
|
|
|
|
BlockDriverState *bs;
|
|
|
|
int ret;
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2017-01-18 19:16:41 +03:00
|
|
|
bs = bdrv_new();
|
|
|
|
bs->open_flags = flags;
|
2021-09-20 14:55:34 +03:00
|
|
|
bs->options = options ?: qdict_new();
|
|
|
|
bs->explicit_options = qdict_clone_shallow(bs->options);
|
2017-01-18 19:16:41 +03:00
|
|
|
bs->opaque = NULL;
|
|
|
|
|
|
|
|
update_options_from_flags(bs->options, flags);
|
|
|
|
|
|
|
|
ret = bdrv_open_driver(bs, drv, node_name, bs->options, flags, errp);
|
|
|
|
if (ret < 0) {
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(bs->explicit_options);
|
2017-07-14 17:35:48 +03:00
|
|
|
bs->explicit_options = NULL;
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(bs->options);
|
2017-07-14 17:35:48 +03:00
|
|
|
bs->options = NULL;
|
2017-01-18 19:16:41 +03:00
|
|
|
bdrv_unref(bs);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return bs;
|
|
|
|
}
|
|
|
|
|
2021-09-20 14:55:34 +03:00
|
|
|
/* Create and open a block node. */
|
|
|
|
BlockDriverState *bdrv_new_open_driver(BlockDriver *drv, const char *node_name,
|
|
|
|
int flags, Error **errp)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-09-20 14:55:34 +03:00
|
|
|
return bdrv_new_open_driver_opts(drv, node_name, NULL, flags, errp);
|
|
|
|
}
|
|
|
|
|
2016-10-06 12:33:17 +03:00
|
|
|
QemuOptsList bdrv_runtime_opts = {
|
2015-04-07 18:12:56 +03:00
|
|
|
.name = "bdrv_common",
|
|
|
|
.head = QTAILQ_HEAD_INITIALIZER(bdrv_runtime_opts.head),
|
|
|
|
.desc = {
|
|
|
|
{
|
|
|
|
.name = "node-name",
|
|
|
|
.type = QEMU_OPT_STRING,
|
|
|
|
.help = "Node name of the block device node",
|
|
|
|
},
|
2015-04-24 17:38:02 +03:00
|
|
|
{
|
|
|
|
.name = "driver",
|
|
|
|
.type = QEMU_OPT_STRING,
|
|
|
|
.help = "Block driver to use for the node",
|
|
|
|
},
|
2015-05-08 18:49:53 +03:00
|
|
|
{
|
|
|
|
.name = BDRV_OPT_CACHE_DIRECT,
|
|
|
|
.type = QEMU_OPT_BOOL,
|
|
|
|
.help = "Bypass software writeback cache on the host",
|
|
|
|
},
|
|
|
|
{
|
|
|
|
.name = BDRV_OPT_CACHE_NO_FLUSH,
|
|
|
|
.type = QEMU_OPT_BOOL,
|
|
|
|
.help = "Ignore flush requests",
|
|
|
|
},
|
2016-09-15 17:53:02 +03:00
|
|
|
{
|
|
|
|
.name = BDRV_OPT_READ_ONLY,
|
|
|
|
.type = QEMU_OPT_BOOL,
|
|
|
|
.help = "Node is opened in read-only mode",
|
|
|
|
},
|
block: Add auto-read-only option
If a management application builds the block graph node by node, the
protocol layer doesn't inherit its read-only option from the format
layer any more, so it must be set explicitly.
Backing files should work on read-only storage, but at the same time, a
block job like commit should be able to reopen them read-write if they
are on read-write storage. However, without option inheritance, reopen
only changes the read-only option for the root node (typically the
format layer), but not the protocol layer, so reopening fails (the
format layer wants to get write permissions, but the protocol layer is
still read-only).
A simple workaround for the problem in the management tool would be to
open the protocol layer always read-write and to make only the format
layer read-only for backing files. However, sometimes the file is
actually stored on read-only storage and we don't know whether the image
can be opened read-write (for example, for NBD it depends on the server
we're trying to connect to). This adds an option that makes QEMU try to
open the image read-write, but allows it to degrade to a read-only mode
without returning an error.
The documentation for this option is consciously phrased in a way that
allows QEMU to switch to a better model eventually: Instead of trying
when the image is first opened, making the read-only flag dynamic and
changing it automatically whenever the first BLK_PERM_WRITE user is
attached or the last one is detached would be much more useful
behaviour.
Unfortunately, this more useful behaviour is also a lot harder to
implement, and libvirt needs a solution now before it can switch to
-blockdev, so let's start with this easier approach for now.
Instead of adding a new auto-read-only option, turning the existing
read-only into an enum (with a bool alternate for compatibility) was
considered, but it complicated the implementation to the point that it
didn't seem to be worth it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-05 19:57:40 +03:00
|
|
|
{
|
|
|
|
.name = BDRV_OPT_AUTO_READ_ONLY,
|
|
|
|
.type = QEMU_OPT_BOOL,
|
|
|
|
.help = "Node can become read-only if opening read-write fails",
|
|
|
|
},
|
2016-09-12 22:00:41 +03:00
|
|
|
{
|
|
|
|
.name = "detect-zeroes",
|
|
|
|
.type = QEMU_OPT_STRING,
|
|
|
|
.help = "try to optimize zero writes (off, on, unmap)",
|
|
|
|
},
|
2016-09-12 19:03:18 +03:00
|
|
|
{
|
2018-10-03 13:23:13 +03:00
|
|
|
.name = BDRV_OPT_DISCARD,
|
2016-09-12 19:03:18 +03:00
|
|
|
.type = QEMU_OPT_STRING,
|
|
|
|
.help = "discard operation (ignore/off, unmap/on)",
|
|
|
|
},
|
2017-05-02 19:35:37 +03:00
|
|
|
{
|
|
|
|
.name = BDRV_OPT_FORCE_SHARE,
|
|
|
|
.type = QEMU_OPT_BOOL,
|
|
|
|
.help = "always accept other writers (default: off)",
|
|
|
|
},
|
2015-04-07 18:12:56 +03:00
|
|
|
{ /* end of list */ }
|
|
|
|
},
|
|
|
|
};
|
|
|
|
|
2020-03-26 04:12:18 +03:00
|
|
|
QemuOptsList bdrv_create_opts_simple = {
|
|
|
|
.name = "simple-create-opts",
|
|
|
|
.head = QTAILQ_HEAD_INITIALIZER(bdrv_create_opts_simple.head),
|
2020-01-22 19:45:29 +03:00
|
|
|
.desc = {
|
|
|
|
{
|
|
|
|
.name = BLOCK_OPT_SIZE,
|
|
|
|
.type = QEMU_OPT_SIZE,
|
|
|
|
.help = "Virtual disk size"
|
|
|
|
},
|
|
|
|
{
|
|
|
|
.name = BLOCK_OPT_PREALLOC,
|
|
|
|
.type = QEMU_OPT_STRING,
|
|
|
|
.help = "Preallocation mode (allowed values: off)"
|
|
|
|
},
|
|
|
|
{ /* end of list */ }
|
|
|
|
}
|
|
|
|
};
|
|
|
|
|
2010-04-14 17:24:50 +04:00
|
|
|
/*
|
|
|
|
* Common part for opening disk images and files
|
2013-03-15 13:35:04 +04:00
|
|
|
*
|
|
|
|
* Removes all processed options from *options.
|
2010-04-14 17:24:50 +04:00
|
|
|
*/
|
2017-02-17 20:39:24 +03:00
|
|
|
static int bdrv_open_common(BlockDriverState *bs, BlockBackend *file,
|
2016-01-11 21:07:50 +03:00
|
|
|
QDict *options, Error **errp)
|
2010-04-14 17:24:50 +04:00
|
|
|
{
|
|
|
|
int ret, open_flags;
|
2013-04-09 16:34:19 +04:00
|
|
|
const char *filename;
|
2015-04-24 17:38:02 +03:00
|
|
|
const char *driver_name = NULL;
|
2014-01-24 00:31:33 +04:00
|
|
|
const char *node_name = NULL;
|
2016-09-12 19:03:18 +03:00
|
|
|
const char *discard;
|
2015-04-07 18:12:56 +03:00
|
|
|
QemuOpts *opts;
|
2015-04-24 17:38:02 +03:00
|
|
|
BlockDriver *drv;
|
2013-09-05 16:45:29 +04:00
|
|
|
Error *local_err = NULL;
|
2021-05-27 18:40:54 +03:00
|
|
|
bool ro;
|
2010-04-14 17:24:50 +04:00
|
|
|
|
2012-05-08 18:51:49 +04:00
|
|
|
assert(bs->file == NULL);
|
2013-03-06 15:20:31 +04:00
|
|
|
assert(options != NULL && bs->options != options);
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2010-04-14 17:24:50 +04:00
|
|
|
|
2015-04-24 17:38:02 +03:00
|
|
|
opts = qemu_opts_create(&bdrv_runtime_opts, NULL, 0, &error_abort);
|
2020-07-07 19:06:03 +03:00
|
|
|
if (!qemu_opts_absorb_qdict(opts, options, errp)) {
|
2015-04-24 17:38:02 +03:00
|
|
|
ret = -EINVAL;
|
|
|
|
goto fail_opts;
|
|
|
|
}
|
|
|
|
|
2016-09-15 17:53:01 +03:00
|
|
|
update_flags_from_options(&bs->open_flags, opts);
|
|
|
|
|
2015-04-24 17:38:02 +03:00
|
|
|
driver_name = qemu_opt_get(opts, "driver");
|
|
|
|
drv = bdrv_find_format(driver_name);
|
|
|
|
assert(drv != NULL);
|
|
|
|
|
2017-05-02 19:35:37 +03:00
|
|
|
bs->force_share = qemu_opt_get_bool(opts, BDRV_OPT_FORCE_SHARE, false);
|
|
|
|
|
|
|
|
if (bs->force_share && (bs->open_flags & BDRV_O_RDWR)) {
|
|
|
|
error_setg(errp,
|
|
|
|
BDRV_OPT_FORCE_SHARE
|
|
|
|
"=on can only be used with read-only images");
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto fail_opts;
|
|
|
|
}
|
|
|
|
|
2013-04-22 19:48:40 +04:00
|
|
|
if (file != NULL) {
|
block: Use bdrv_refresh_filename() to pull
Before this patch, bdrv_refresh_filename() is used in a pushing manner:
Whenever the BDS graph is modified, the parents of the modified edges
are supposed to be updated (recursively upwards). However, that is
nonviable, considering that we want child changes not to concern
parents.
Also, in the long run we want a pull model anyway: Here, we would have a
bdrv_filename() function which returns a BDS's filename, freshly
constructed.
This patch is an intermediate step. It adds bdrv_refresh_filename()
calls before every place a BDS.filename value is used. The only
exceptions are protocol drivers that use their own filename, which
clearly would not profit from refreshing that filename before.
Also, bdrv_get_encrypted_filename() is removed along the way (as a user
of BDS.filename), since it is completely unused.
In turn, all of the calls to bdrv_refresh_filename() before this patch
are removed, because we no longer have to call this function on graph
changes.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:05 +03:00
|
|
|
bdrv_refresh_filename(blk_bs(file));
|
2017-02-17 20:39:24 +03:00
|
|
|
filename = blk_bs(file)->filename;
|
2013-04-22 19:48:40 +04:00
|
|
|
} else {
|
block: Document -drive problematic code and bugs
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-30 20:43:12 +03:00
|
|
|
/*
|
|
|
|
* Caution: while qdict_get_try_str() is fine, getting
|
|
|
|
* non-string types would require more care. When @options
|
|
|
|
* come from -blockdev or blockdev_add, its members are typed
|
|
|
|
* according to the QAPI schema, but when they come from
|
|
|
|
* -drive, they're all QString.
|
|
|
|
*/
|
2013-04-22 19:48:40 +04:00
|
|
|
filename = qdict_get_try_str(options, "filename");
|
|
|
|
}
|
|
|
|
|
2017-04-13 19:06:24 +03:00
|
|
|
if (drv->bdrv_needs_filename && (!filename || !filename[0])) {
|
2014-02-03 17:49:42 +04:00
|
|
|
error_setg(errp, "The '%s' block driver requires a file name",
|
|
|
|
drv->format_name);
|
2015-04-07 18:12:56 +03:00
|
|
|
ret = -EINVAL;
|
|
|
|
goto fail_opts;
|
2014-01-24 00:31:33 +04:00
|
|
|
}
|
|
|
|
|
2016-01-11 21:07:50 +03:00
|
|
|
trace_bdrv_open_common(bs, filename ?: "", bs->open_flags,
|
|
|
|
drv->format_name);
|
2015-04-24 17:38:02 +03:00
|
|
|
|
2021-05-27 18:40:54 +03:00
|
|
|
ro = bdrv_is_read_only(bs);
|
|
|
|
|
|
|
|
if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, ro)) {
|
|
|
|
if (!ro && bdrv_is_whitelisted(drv, true)) {
|
2019-01-22 15:15:31 +03:00
|
|
|
ret = bdrv_apply_auto_read_only(bs, NULL, NULL);
|
|
|
|
} else {
|
|
|
|
ret = -ENOTSUP;
|
|
|
|
}
|
|
|
|
if (ret < 0) {
|
|
|
|
error_setg(errp,
|
2021-05-27 18:40:54 +03:00
|
|
|
!ro && bdrv_is_whitelisted(drv, true)
|
2019-01-22 15:15:31 +03:00
|
|
|
? "Driver '%s' can only be used for read-only devices"
|
|
|
|
: "Driver '%s' is not whitelisted",
|
|
|
|
drv->format_name);
|
|
|
|
goto fail_opts;
|
|
|
|
}
|
2013-05-29 15:35:40 +04:00
|
|
|
}
|
2010-04-14 17:24:50 +04:00
|
|
|
|
2017-06-05 15:38:50 +03:00
|
|
|
/* bdrv_new() and bdrv_close() make it so */
|
2020-09-23 13:56:46 +03:00
|
|
|
assert(qatomic_read(&bs->copy_on_read) == 0);
|
2017-06-05 15:38:50 +03:00
|
|
|
|
2016-01-11 21:07:50 +03:00
|
|
|
if (bs->open_flags & BDRV_O_COPY_ON_READ) {
|
2021-05-27 18:40:54 +03:00
|
|
|
if (!ro) {
|
2013-09-19 17:12:18 +04:00
|
|
|
bdrv_enable_copy_on_read(bs);
|
|
|
|
} else {
|
|
|
|
error_setg(errp, "Can't use copy-on-read on read-only device");
|
2015-04-07 18:12:56 +03:00
|
|
|
ret = -EINVAL;
|
|
|
|
goto fail_opts;
|
2013-09-19 17:12:18 +04:00
|
|
|
}
|
2011-11-28 20:08:47 +04:00
|
|
|
}
|
|
|
|
|
2018-10-03 13:23:13 +03:00
|
|
|
discard = qemu_opt_get(opts, BDRV_OPT_DISCARD);
|
2016-09-12 19:03:18 +03:00
|
|
|
if (discard != NULL) {
|
|
|
|
if (bdrv_parse_discard_flags(discard, &bs->open_flags) != 0) {
|
|
|
|
error_setg(errp, "Invalid discard option");
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto fail_opts;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-09-06 12:37:09 +03:00
|
|
|
bs->detect_zeroes =
|
|
|
|
bdrv_parse_detect_zeroes(opts, bs->open_flags, &local_err);
|
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto fail_opts;
|
2016-09-12 22:00:41 +03:00
|
|
|
}
|
|
|
|
|
2013-03-18 19:40:51 +04:00
|
|
|
if (filename != NULL) {
|
|
|
|
pstrcpy(bs->filename, sizeof(bs->filename), filename);
|
|
|
|
} else {
|
|
|
|
bs->filename[0] = '\0';
|
|
|
|
}
|
2014-07-18 22:24:56 +04:00
|
|
|
pstrcpy(bs->exact_filename, sizeof(bs->exact_filename), bs->filename);
|
2010-04-14 17:24:50 +04:00
|
|
|
|
2010-04-14 16:17:38 +04:00
|
|
|
/* Open the image, either directly or using a protocol */
|
2016-01-11 21:07:50 +03:00
|
|
|
open_flags = bdrv_open_flags(bs, bs->open_flags);
|
2017-01-18 17:51:56 +03:00
|
|
|
node_name = qemu_opt_get(opts, "node-name");
|
2010-04-14 17:24:50 +04:00
|
|
|
|
2017-01-18 17:51:56 +03:00
|
|
|
assert(!drv->bdrv_file_open || file == NULL);
|
|
|
|
ret = bdrv_open_driver(bs, drv, node_name, options, open_flags, errp);
|
2010-04-19 19:56:41 +04:00
|
|
|
if (ret < 0) {
|
2017-01-18 17:51:56 +03:00
|
|
|
goto fail_opts;
|
2014-07-16 19:48:16 +04:00
|
|
|
}
|
|
|
|
|
2015-04-07 18:12:56 +03:00
|
|
|
qemu_opts_del(opts);
|
2010-04-14 17:24:50 +04:00
|
|
|
return 0;
|
|
|
|
|
2015-04-07 18:12:56 +03:00
|
|
|
fail_opts:
|
|
|
|
qemu_opts_del(opts);
|
2010-04-14 17:24:50 +04:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-05-26 13:45:08 +04:00
|
|
|
static QDict *parse_json_filename(const char *filename, Error **errp)
|
|
|
|
{
|
|
|
|
QObject *options_obj;
|
|
|
|
QDict *options;
|
|
|
|
int ret;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-05-26 13:45:08 +04:00
|
|
|
|
|
|
|
ret = strstart(filename, "json:", &filename);
|
|
|
|
assert(ret);
|
|
|
|
|
2017-03-01 00:26:59 +03:00
|
|
|
options_obj = qobject_from_json(filename, errp);
|
2014-05-26 13:45:08 +04:00
|
|
|
if (!options_obj) {
|
2017-03-01 00:26:59 +03:00
|
|
|
error_prepend(errp, "Could not parse the JSON options: ");
|
2014-05-26 13:45:08 +04:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2018-02-24 18:40:29 +03:00
|
|
|
options = qobject_to(QDict, options_obj);
|
2017-02-17 23:38:18 +03:00
|
|
|
if (!options) {
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(options_obj);
|
2014-05-26 13:45:08 +04:00
|
|
|
error_setg(errp, "Invalid JSON object given");
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
qdict_flatten(options);
|
|
|
|
|
|
|
|
return options;
|
|
|
|
}
|
|
|
|
|
2015-10-29 17:24:41 +03:00
|
|
|
static void parse_json_protocol(QDict *options, const char **pfilename,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
QDict *json_options;
|
|
|
|
Error *local_err = NULL;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2015-10-29 17:24:41 +03:00
|
|
|
|
|
|
|
/* Parse json: pseudo-protocol */
|
|
|
|
if (!*pfilename || !g_str_has_prefix(*pfilename, "json:")) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
json_options = parse_json_filename(*pfilename, &local_err);
|
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Options given in the filename have lower priority than options
|
|
|
|
* specified directly */
|
|
|
|
qdict_join(options, json_options, false);
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(json_options);
|
2015-10-29 17:24:41 +03:00
|
|
|
*pfilename = NULL;
|
|
|
|
}
|
|
|
|
|
2010-04-12 18:37:13 +04:00
|
|
|
/*
|
2014-05-26 13:09:59 +04:00
|
|
|
* Fills in default options for opening images and converts the legacy
|
|
|
|
* filename/flags pair to option QDict entries.
|
block: driver should override flags in bdrv_open()
The BDRV_O_PROTOCOL flag should have an impact only if no driver is
specified explicitly. Therefore, if bdrv_open() is called with an
explicit block driver argument (either through the options QDict or
through the drv parameter) and that block driver is a protocol block
driver, BDRV_O_PROTOCOL should be set; if it is a format block driver,
BDRV_O_PROTOCOL should be unset.
While there was code to unset the flag in case a format block driver
has been selected, it only followed the bdrv_fill_options() function
call whereas the flag in fact needs to be adjusted before it is used
there.
With that change, BDRV_O_PROTOCOL will always be set if the BDS should
be a protocol driver; if the driver has been specified explicitly, the
new code will set it; and bdrv_fill_options() will only "probe" a
protocol driver if BDRV_O_PROTOCOL is set. The probing after
bdrv_fill_options() cannot select a protocol driver.
Thus, bdrv_open_image() to open BDS.file is never called if a protocol
BDS is about to be created. With that change in turn it is impossible to
call bdrv_open_common() with a protocol drv and file != NULL, which
allows us to remove the bdrv_swap() call.
This change breaks a test case in qemu-iotest 051:
"-drive file=t.qcow2,file.driver=qcow2" now works because the explicitly
specified "qcow2" overrides the BDRV_O_PROTOCOL which is automatically
set for the "file" BDS (and the filename is just passed down).
Therefore, this patch removes that test case.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-19 21:53:16 +03:00
|
|
|
* The BDRV_O_PROTOCOL flag in *flags will be set or cleared accordingly if a
|
|
|
|
* block driver has been specified explicitly.
|
2010-04-12 18:37:13 +04:00
|
|
|
*/
|
2015-10-29 17:24:41 +03:00
|
|
|
static int bdrv_fill_options(QDict **options, const char *filename,
|
2015-08-26 20:47:51 +03:00
|
|
|
int *flags, Error **errp)
|
2004-08-02 01:59:26 +04:00
|
|
|
{
|
2013-03-18 19:40:51 +04:00
|
|
|
const char *drvname;
|
block: driver should override flags in bdrv_open()
The BDRV_O_PROTOCOL flag should have an impact only if no driver is
specified explicitly. Therefore, if bdrv_open() is called with an
explicit block driver argument (either through the options QDict or
through the drv parameter) and that block driver is a protocol block
driver, BDRV_O_PROTOCOL should be set; if it is a format block driver,
BDRV_O_PROTOCOL should be unset.
While there was code to unset the flag in case a format block driver
has been selected, it only followed the bdrv_fill_options() function
call whereas the flag in fact needs to be adjusted before it is used
there.
With that change, BDRV_O_PROTOCOL will always be set if the BDS should
be a protocol driver; if the driver has been specified explicitly, the
new code will set it; and bdrv_fill_options() will only "probe" a
protocol driver if BDRV_O_PROTOCOL is set. The probing after
bdrv_fill_options() cannot select a protocol driver.
Thus, bdrv_open_image() to open BDS.file is never called if a protocol
BDS is about to be created. With that change in turn it is impossible to
call bdrv_open_common() with a protocol drv and file != NULL, which
allows us to remove the bdrv_swap() call.
This change breaks a test case in qemu-iotest 051:
"-drive file=t.qcow2,file.driver=qcow2" now works because the explicitly
specified "qcow2" overrides the BDRV_O_PROTOCOL which is automatically
set for the "file" BDS (and the filename is just passed down).
Therefore, this patch removes that test case.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-19 21:53:16 +03:00
|
|
|
bool protocol = *flags & BDRV_O_PROTOCOL;
|
2014-04-03 14:45:51 +04:00
|
|
|
bool parse_filename = false;
|
2015-08-26 20:47:51 +03:00
|
|
|
BlockDriver *drv = NULL;
|
2013-09-05 16:45:29 +04:00
|
|
|
Error *local_err = NULL;
|
2006-08-01 20:21:11 +04:00
|
|
|
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
block: Document -drive problematic code and bugs
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-30 20:43:12 +03:00
|
|
|
/*
|
|
|
|
* Caution: while qdict_get_try_str() is fine, getting non-string
|
|
|
|
* types would require more care. When @options come from
|
|
|
|
* -blockdev or blockdev_add, its members are typed according to
|
|
|
|
* the QAPI schema, but when they come from -drive, they're all
|
|
|
|
* QString.
|
|
|
|
*/
|
block: driver should override flags in bdrv_open()
The BDRV_O_PROTOCOL flag should have an impact only if no driver is
specified explicitly. Therefore, if bdrv_open() is called with an
explicit block driver argument (either through the options QDict or
through the drv parameter) and that block driver is a protocol block
driver, BDRV_O_PROTOCOL should be set; if it is a format block driver,
BDRV_O_PROTOCOL should be unset.
While there was code to unset the flag in case a format block driver
has been selected, it only followed the bdrv_fill_options() function
call whereas the flag in fact needs to be adjusted before it is used
there.
With that change, BDRV_O_PROTOCOL will always be set if the BDS should
be a protocol driver; if the driver has been specified explicitly, the
new code will set it; and bdrv_fill_options() will only "probe" a
protocol driver if BDRV_O_PROTOCOL is set. The probing after
bdrv_fill_options() cannot select a protocol driver.
Thus, bdrv_open_image() to open BDS.file is never called if a protocol
BDS is about to be created. With that change in turn it is impossible to
call bdrv_open_common() with a protocol drv and file != NULL, which
allows us to remove the bdrv_swap() call.
This change breaks a test case in qemu-iotest 051:
"-drive file=t.qcow2,file.driver=qcow2" now works because the explicitly
specified "qcow2" overrides the BDRV_O_PROTOCOL which is automatically
set for the "file" BDS (and the filename is just passed down).
Therefore, this patch removes that test case.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-19 21:53:16 +03:00
|
|
|
drvname = qdict_get_try_str(*options, "driver");
|
2015-08-26 20:47:51 +03:00
|
|
|
if (drvname) {
|
|
|
|
drv = bdrv_find_format(drvname);
|
|
|
|
if (!drv) {
|
|
|
|
error_setg(errp, "Unknown driver '%s'", drvname);
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
/* If the user has explicitly specified the driver, this choice should
|
|
|
|
* override the BDRV_O_PROTOCOL flag */
|
|
|
|
protocol = drv->bdrv_file_open;
|
block: driver should override flags in bdrv_open()
The BDRV_O_PROTOCOL flag should have an impact only if no driver is
specified explicitly. Therefore, if bdrv_open() is called with an
explicit block driver argument (either through the options QDict or
through the drv parameter) and that block driver is a protocol block
driver, BDRV_O_PROTOCOL should be set; if it is a format block driver,
BDRV_O_PROTOCOL should be unset.
While there was code to unset the flag in case a format block driver
has been selected, it only followed the bdrv_fill_options() function
call whereas the flag in fact needs to be adjusted before it is used
there.
With that change, BDRV_O_PROTOCOL will always be set if the BDS should
be a protocol driver; if the driver has been specified explicitly, the
new code will set it; and bdrv_fill_options() will only "probe" a
protocol driver if BDRV_O_PROTOCOL is set. The probing after
bdrv_fill_options() cannot select a protocol driver.
Thus, bdrv_open_image() to open BDS.file is never called if a protocol
BDS is about to be created. With that change in turn it is impossible to
call bdrv_open_common() with a protocol drv and file != NULL, which
allows us to remove the bdrv_swap() call.
This change breaks a test case in qemu-iotest 051:
"-drive file=t.qcow2,file.driver=qcow2" now works because the explicitly
specified "qcow2" overrides the BDRV_O_PROTOCOL which is automatically
set for the "file" BDS (and the filename is just passed down).
Therefore, this patch removes that test case.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-19 21:53:16 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (protocol) {
|
|
|
|
*flags |= BDRV_O_PROTOCOL;
|
|
|
|
} else {
|
|
|
|
*flags &= ~BDRV_O_PROTOCOL;
|
|
|
|
}
|
|
|
|
|
2015-05-08 18:49:53 +03:00
|
|
|
/* Translate cache options from flags into options */
|
|
|
|
update_options_from_flags(*options, *flags);
|
|
|
|
|
2013-04-09 16:34:19 +04:00
|
|
|
/* Fetch the file name from the options QDict if necessary */
|
2014-05-27 12:50:29 +04:00
|
|
|
if (protocol && filename) {
|
2014-05-26 13:09:59 +04:00
|
|
|
if (!qdict_haskey(*options, "filename")) {
|
2017-04-28 00:58:17 +03:00
|
|
|
qdict_put_str(*options, "filename", filename);
|
2014-05-26 13:09:59 +04:00
|
|
|
parse_filename = true;
|
|
|
|
} else {
|
|
|
|
error_setg(errp, "Can't specify 'file' and 'filename' options at "
|
|
|
|
"the same time");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2013-04-09 16:34:19 +04:00
|
|
|
}
|
|
|
|
|
2013-03-18 19:40:51 +04:00
|
|
|
/* Find the right block driver */
|
block: Document -drive problematic code and bugs
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-30 20:43:12 +03:00
|
|
|
/* See cautionary note on accessing @options above */
|
2014-05-26 13:09:59 +04:00
|
|
|
filename = qdict_get_try_str(*options, "filename");
|
|
|
|
|
2015-08-26 20:47:51 +03:00
|
|
|
if (!drvname && protocol) {
|
|
|
|
if (filename) {
|
|
|
|
drv = bdrv_find_protocol(filename, parse_filename, errp);
|
2014-05-27 12:50:29 +04:00
|
|
|
if (!drv) {
|
2015-08-26 20:47:51 +03:00
|
|
|
return -EINVAL;
|
2014-05-27 12:50:29 +04:00
|
|
|
}
|
2015-08-26 20:47:51 +03:00
|
|
|
|
|
|
|
drvname = drv->format_name;
|
2017-04-28 00:58:17 +03:00
|
|
|
qdict_put_str(*options, "driver", drvname);
|
2015-08-26 20:47:51 +03:00
|
|
|
} else {
|
|
|
|
error_setg(errp, "Must specify either driver or file");
|
|
|
|
return -EINVAL;
|
2013-07-10 17:47:39 +04:00
|
|
|
}
|
2013-03-18 19:40:51 +04:00
|
|
|
}
|
|
|
|
|
2014-05-27 12:50:29 +04:00
|
|
|
assert(drv || !protocol);
|
2013-03-18 19:40:51 +04:00
|
|
|
|
2014-05-26 13:09:59 +04:00
|
|
|
/* Driver-specific filename parsing */
|
2014-05-27 12:50:29 +04:00
|
|
|
if (drv && drv->bdrv_parse_filename && parse_filename) {
|
2014-02-18 21:33:11 +04:00
|
|
|
drv->bdrv_parse_filename(filename, *options, &local_err);
|
2014-01-30 18:07:28 +04:00
|
|
|
if (local_err) {
|
2013-09-05 16:45:29 +04:00
|
|
|
error_propagate(errp, local_err);
|
2014-05-26 13:09:59 +04:00
|
|
|
return -EINVAL;
|
2013-03-15 21:47:22 +04:00
|
|
|
}
|
2014-03-06 01:41:36 +04:00
|
|
|
|
|
|
|
if (!drv->bdrv_needs_filename) {
|
|
|
|
qdict_del(*options, "filename");
|
|
|
|
}
|
2013-03-15 21:47:22 +04:00
|
|
|
}
|
|
|
|
|
2014-05-26 13:09:59 +04:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-09-14 15:32:04 +03:00
|
|
|
typedef struct BlockReopenQueueEntry {
|
|
|
|
bool prepared;
|
2019-03-05 19:18:22 +03:00
|
|
|
bool perms_checked;
|
2017-09-14 15:32:04 +03:00
|
|
|
BDRVReopenState state;
|
2019-09-27 15:23:47 +03:00
|
|
|
QTAILQ_ENTRY(BlockReopenQueueEntry) entry;
|
2017-09-14 15:32:04 +03:00
|
|
|
} BlockReopenQueueEntry;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the flags that @bs will have after the reopens in @q have
|
|
|
|
* successfully completed. If @q is NULL (or @bs is not contained in @q),
|
|
|
|
* return the current flags.
|
|
|
|
*/
|
|
|
|
static int bdrv_reopen_get_flags(BlockReopenQueue *q, BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
BlockReopenQueueEntry *entry;
|
|
|
|
|
|
|
|
if (q != NULL) {
|
2019-09-27 15:23:47 +03:00
|
|
|
QTAILQ_FOREACH(entry, q, entry) {
|
2017-09-14 15:32:04 +03:00
|
|
|
if (entry->state.bs == bs) {
|
|
|
|
return entry->state.flags;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return bs->open_flags;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Returns whether the image file can be written to after the reopen queue @q
|
|
|
|
* has been successfully applied, or right now if @q is NULL. */
|
2018-06-06 22:37:00 +03:00
|
|
|
static bool bdrv_is_writable_after_reopen(BlockDriverState *bs,
|
|
|
|
BlockReopenQueue *q)
|
2017-09-14 15:32:04 +03:00
|
|
|
{
|
|
|
|
int flags = bdrv_reopen_get_flags(q, bs);
|
|
|
|
|
|
|
|
return (flags & (BDRV_O_RDWR | BDRV_O_INACTIVE)) == BDRV_O_RDWR;
|
|
|
|
}
|
|
|
|
|
2018-06-06 22:37:00 +03:00
|
|
|
/*
|
|
|
|
* Return whether the BDS can be written to. This is not necessarily
|
|
|
|
* the same as !bdrv_is_read_only(bs), as inactivated images may not
|
|
|
|
* be written to but do not count as read-only images.
|
|
|
|
*/
|
|
|
|
bool bdrv_is_writable(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2018-06-06 22:37:00 +03:00
|
|
|
return bdrv_is_writable_after_reopen(bs, NULL);
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:37 +03:00
|
|
|
static char *bdrv_child_user_desc(BdrvChild *c)
|
|
|
|
{
|
2022-03-03 18:16:13 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-06-01 10:52:17 +03:00
|
|
|
return c->klass->get_parent_desc(c);
|
2021-04-28 18:17:37 +03:00
|
|
|
}
|
|
|
|
|
2021-06-01 10:52:18 +03:00
|
|
|
/*
|
|
|
|
* Check that @a allows everything that @b needs. @a and @b must reference same
|
|
|
|
* child node.
|
|
|
|
*/
|
2021-04-28 18:17:37 +03:00
|
|
|
static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, Error **errp)
|
|
|
|
{
|
2021-06-01 10:52:18 +03:00
|
|
|
const char *child_bs_name;
|
|
|
|
g_autofree char *a_user = NULL;
|
|
|
|
g_autofree char *b_user = NULL;
|
|
|
|
g_autofree char *perms = NULL;
|
|
|
|
|
|
|
|
assert(a->bs);
|
|
|
|
assert(a->bs == b->bs);
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-04-28 18:17:37 +03:00
|
|
|
|
|
|
|
if ((b->perm & a->shared_perm) == b->perm) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2021-06-01 10:52:18 +03:00
|
|
|
child_bs_name = bdrv_get_node_name(b->bs);
|
|
|
|
a_user = bdrv_child_user_desc(a);
|
|
|
|
b_user = bdrv_child_user_desc(b);
|
|
|
|
perms = bdrv_perm_names(b->perm & ~a->shared_perm);
|
|
|
|
|
|
|
|
error_setg(errp, "Permission conflict on node '%s': permissions '%s' are "
|
|
|
|
"both required by %s (uses node '%s' as '%s' child) and "
|
|
|
|
"unshared by %s (uses node '%s' as '%s' child).",
|
|
|
|
child_bs_name, perms,
|
|
|
|
b_user, child_bs_name, b->name,
|
|
|
|
a_user, child_bs_name, a->name);
|
2021-04-28 18:17:37 +03:00
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:53 +03:00
|
|
|
static bool bdrv_parent_perms_conflict(BlockDriverState *bs, Error **errp)
|
2021-04-28 18:17:37 +03:00
|
|
|
{
|
|
|
|
BdrvChild *a, *b;
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-04-28 18:17:37 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* During the loop we'll look at each pair twice. That's correct because
|
|
|
|
* bdrv_a_allow_b() is asymmetric and we should check each pair in both
|
|
|
|
* directions.
|
|
|
|
*/
|
|
|
|
QLIST_FOREACH(a, &bs->parents, next_parent) {
|
|
|
|
QLIST_FOREACH(b, &bs->parents, next_parent) {
|
2021-04-28 18:17:53 +03:00
|
|
|
if (a == b) {
|
2021-04-28 18:17:37 +03:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!bdrv_a_allow_b(a, b, errp)) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2017-05-02 19:35:38 +03:00
|
|
|
static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
|
2020-05-13 14:05:44 +03:00
|
|
|
BdrvChild *c, BdrvChildRole role,
|
|
|
|
BlockReopenQueue *reopen_queue,
|
2017-05-02 19:35:38 +03:00
|
|
|
uint64_t parent_perm, uint64_t parent_shared,
|
|
|
|
uint64_t *nperm, uint64_t *nshared)
|
|
|
|
{
|
2019-04-04 14:29:53 +03:00
|
|
|
assert(bs->drv && bs->drv->bdrv_child_perm);
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2020-05-13 14:05:44 +03:00
|
|
|
bs->drv->bdrv_child_perm(bs, c, role, reopen_queue,
|
2019-04-04 14:29:53 +03:00
|
|
|
parent_perm, parent_shared,
|
|
|
|
nperm, nshared);
|
2017-09-14 13:47:11 +03:00
|
|
|
/* TODO Take force_share from reopen_queue */
|
2017-05-02 19:35:38 +03:00
|
|
|
if (child_bs && child_bs->force_share) {
|
|
|
|
*nshared = BLK_PERM_ALL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
block: use topological sort for permission update
Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
to update nodes in topological sort order instead of simple DFS. With
topologically sorted nodes, we update a node only when all its parents
already updated. With DFS it's not so.
Consider the following example:
A -+
| |
| v
| B
| |
v |
C<-+
A is parent for B and C, B is parent for C.
Obviously, to update permissions, we should go in order A B C, so, when
we update C, all parent permissions already updated. But with current
approach (simple recursion) we can update in sequence A C B C (C is
updated twice). On first update of C, we consider old B permissions, so
doing wrong thing. If it succeed, all is OK, on second C update we will
finish with correct graph. But if the wrong thing failed, we break the
whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).
Also new approach gives a way to simultaneously and correctly update
several nodes, we just need to run bdrv_topological_dfs() several times
to add all nodes and their subtrees into one topologically sorted list
(next patch will update bdrv_replace_node() in this manner).
Test test_parallel_perm_update() is now passing, so move it out of
debugging "if".
We also need to support ignore_children in
bdrv_parent_perms_conflict()
For test 283 order of conflicting parents check is changed.
Note also that in bdrv_check_perm() we don't check for parents conflict
at root bs, as we may be in the middle of permission update in
bdrv_reopen_multiple(). bdrv_reopen_multiple() will be updated soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-14-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-28 18:17:41 +03:00
|
|
|
/*
|
|
|
|
* Adds the whole subtree of @bs (including @bs itself) to the @list (except for
|
|
|
|
* nodes that are already in the @list, of course) so that final list is
|
|
|
|
* topologically sorted. Return the result (GSList @list object is updated, so
|
|
|
|
* don't use old reference after function call).
|
|
|
|
*
|
|
|
|
* On function start @list must be already topologically sorted and for any node
|
|
|
|
* in the @list the whole subtree of the node must be in the @list as well. The
|
|
|
|
* simplest way to satisfy this criteria: use only result of
|
|
|
|
* bdrv_topological_dfs() or NULL as @list parameter.
|
|
|
|
*/
|
|
|
|
static GSList *bdrv_topological_dfs(GSList *list, GHashTable *found,
|
|
|
|
BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
BdrvChild *child;
|
|
|
|
g_autoptr(GHashTable) local_found = NULL;
|
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
block: use topological sort for permission update
Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
to update nodes in topological sort order instead of simple DFS. With
topologically sorted nodes, we update a node only when all its parents
already updated. With DFS it's not so.
Consider the following example:
A -+
| |
| v
| B
| |
v |
C<-+
A is parent for B and C, B is parent for C.
Obviously, to update permissions, we should go in order A B C, so, when
we update C, all parent permissions already updated. But with current
approach (simple recursion) we can update in sequence A C B C (C is
updated twice). On first update of C, we consider old B permissions, so
doing wrong thing. If it succeed, all is OK, on second C update we will
finish with correct graph. But if the wrong thing failed, we break the
whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).
Also new approach gives a way to simultaneously and correctly update
several nodes, we just need to run bdrv_topological_dfs() several times
to add all nodes and their subtrees into one topologically sorted list
(next patch will update bdrv_replace_node() in this manner).
Test test_parallel_perm_update() is now passing, so move it out of
debugging "if".
We also need to support ignore_children in
bdrv_parent_perms_conflict()
For test 283 order of conflicting parents check is changed.
Note also that in bdrv_check_perm() we don't check for parents conflict
at root bs, as we may be in the middle of permission update in
bdrv_reopen_multiple(). bdrv_reopen_multiple() will be updated soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-14-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-28 18:17:41 +03:00
|
|
|
if (!found) {
|
|
|
|
assert(!list);
|
|
|
|
found = local_found = g_hash_table_new(NULL, NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (g_hash_table_contains(found, bs)) {
|
|
|
|
return list;
|
|
|
|
}
|
|
|
|
g_hash_table_add(found, bs);
|
|
|
|
|
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
|
|
|
list = bdrv_topological_dfs(list, found, child->bs);
|
|
|
|
}
|
|
|
|
|
|
|
|
return g_slist_prepend(list, bs);
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:18:02 +03:00
|
|
|
typedef struct BdrvChildSetPermState {
|
|
|
|
BdrvChild *child;
|
|
|
|
uint64_t old_perm;
|
|
|
|
uint64_t old_shared_perm;
|
|
|
|
} BdrvChildSetPermState;
|
2021-04-28 18:17:38 +03:00
|
|
|
|
|
|
|
static void bdrv_child_set_perm_abort(void *opaque)
|
|
|
|
{
|
2021-04-28 18:18:02 +03:00
|
|
|
BdrvChildSetPermState *s = opaque;
|
|
|
|
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2021-04-28 18:18:02 +03:00
|
|
|
s->child->perm = s->old_perm;
|
|
|
|
s->child->shared_perm = s->old_shared_perm;
|
2021-04-28 18:17:38 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static TransactionActionDrv bdrv_child_set_pem_drv = {
|
|
|
|
.abort = bdrv_child_set_perm_abort,
|
2021-04-28 18:18:02 +03:00
|
|
|
.clean = g_free,
|
2021-04-28 18:17:38 +03:00
|
|
|
};
|
|
|
|
|
2021-04-28 18:18:02 +03:00
|
|
|
static void bdrv_child_set_perm(BdrvChild *c, uint64_t perm,
|
|
|
|
uint64_t shared, Transaction *tran)
|
2021-04-28 18:17:38 +03:00
|
|
|
{
|
2021-04-28 18:18:02 +03:00
|
|
|
BdrvChildSetPermState *s = g_new(BdrvChildSetPermState, 1);
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-04-28 18:18:02 +03:00
|
|
|
|
|
|
|
*s = (BdrvChildSetPermState) {
|
|
|
|
.child = c,
|
|
|
|
.old_perm = c->perm,
|
|
|
|
.old_shared_perm = c->shared_perm,
|
|
|
|
};
|
2021-04-28 18:17:38 +03:00
|
|
|
|
|
|
|
c->perm = perm;
|
|
|
|
c->shared_perm = shared;
|
|
|
|
|
2021-04-28 18:18:02 +03:00
|
|
|
tran_add(tran, &bdrv_child_set_pem_drv, s);
|
2021-04-28 18:17:38 +03:00
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:42 +03:00
|
|
|
static void bdrv_drv_set_perm_commit(void *opaque)
|
|
|
|
{
|
|
|
|
BlockDriverState *bs = opaque;
|
|
|
|
uint64_t cumulative_perms, cumulative_shared_perms;
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-04-28 18:17:42 +03:00
|
|
|
|
|
|
|
if (bs->drv->bdrv_set_perm) {
|
|
|
|
bdrv_get_cumulative_perm(bs, &cumulative_perms,
|
|
|
|
&cumulative_shared_perms);
|
|
|
|
bs->drv->bdrv_set_perm(bs, cumulative_perms, cumulative_shared_perms);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bdrv_drv_set_perm_abort(void *opaque)
|
|
|
|
{
|
|
|
|
BlockDriverState *bs = opaque;
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-04-28 18:17:42 +03:00
|
|
|
|
|
|
|
if (bs->drv->bdrv_abort_perm_update) {
|
|
|
|
bs->drv->bdrv_abort_perm_update(bs);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
TransactionActionDrv bdrv_drv_set_perm_drv = {
|
|
|
|
.abort = bdrv_drv_set_perm_abort,
|
|
|
|
.commit = bdrv_drv_set_perm_commit,
|
|
|
|
};
|
|
|
|
|
|
|
|
static int bdrv_drv_set_perm(BlockDriverState *bs, uint64_t perm,
|
|
|
|
uint64_t shared_perm, Transaction *tran,
|
|
|
|
Error **errp)
|
|
|
|
{
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-04-28 18:17:42 +03:00
|
|
|
if (!bs->drv) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (bs->drv->bdrv_check_perm) {
|
|
|
|
int ret = bs->drv->bdrv_check_perm(bs, perm, shared_perm, errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (tran) {
|
|
|
|
tran_add(tran, &bdrv_drv_set_perm_drv, bs);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:44 +03:00
|
|
|
typedef struct BdrvReplaceChildState {
|
|
|
|
BdrvChild *child;
|
|
|
|
BlockDriverState *old_bs;
|
|
|
|
} BdrvReplaceChildState;
|
|
|
|
|
|
|
|
static void bdrv_replace_child_commit(void *opaque)
|
|
|
|
{
|
|
|
|
BdrvReplaceChildState *s = opaque;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-04-28 18:17:44 +03:00
|
|
|
|
|
|
|
bdrv_unref(s->old_bs);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bdrv_replace_child_abort(void *opaque)
|
|
|
|
{
|
|
|
|
BdrvReplaceChildState *s = opaque;
|
|
|
|
BlockDriverState *new_bs = s->child->bs;
|
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2022-07-26 23:11:29 +03:00
|
|
|
/* old_bs reference is transparently moved from @s to @s->child */
|
2022-11-18 20:41:09 +03:00
|
|
|
if (!s->child->bs) {
|
|
|
|
/*
|
|
|
|
* The parents were undrained when removing old_bs from the child. New
|
|
|
|
* requests can't have been made, though, because the child was empty.
|
|
|
|
*
|
|
|
|
* TODO Make bdrv_replace_child_noperm() transactionable to avoid
|
|
|
|
* undraining the parent in the first place. Once this is done, having
|
|
|
|
* new_bs drained when calling bdrv_replace_child_tran() is not a
|
|
|
|
* requirement any more.
|
|
|
|
*/
|
2022-11-18 20:41:10 +03:00
|
|
|
bdrv_parent_drained_begin_single(s->child);
|
2022-11-18 20:41:09 +03:00
|
|
|
assert(!bdrv_parent_drained_poll_single(s->child));
|
|
|
|
}
|
|
|
|
assert(s->child->quiesced_parent);
|
2022-07-26 23:11:31 +03:00
|
|
|
bdrv_replace_child_noperm(s->child, s->old_bs);
|
2021-04-28 18:17:44 +03:00
|
|
|
bdrv_unref(new_bs);
|
|
|
|
}
|
|
|
|
|
|
|
|
static TransactionActionDrv bdrv_replace_child_drv = {
|
|
|
|
.commit = bdrv_replace_child_commit,
|
|
|
|
.abort = bdrv_replace_child_abort,
|
|
|
|
.clean = g_free,
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
2021-06-10 14:25:44 +03:00
|
|
|
* bdrv_replace_child_tran
|
2021-04-28 18:17:44 +03:00
|
|
|
*
|
|
|
|
* Note: real unref of old_bs is done only on commit.
|
2021-06-10 14:25:44 +03:00
|
|
|
*
|
2022-11-18 20:41:09 +03:00
|
|
|
* Both @child->bs and @new_bs (if non-NULL) must be drained. @new_bs must be
|
|
|
|
* kept drained until the transaction is completed.
|
|
|
|
*
|
2021-06-10 14:25:44 +03:00
|
|
|
* The function doesn't update permissions, caller is responsible for this.
|
2021-04-28 18:17:44 +03:00
|
|
|
*/
|
2022-07-26 23:11:29 +03:00
|
|
|
static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
|
2022-07-26 23:11:28 +03:00
|
|
|
Transaction *tran)
|
2021-04-28 18:17:44 +03:00
|
|
|
{
|
|
|
|
BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
|
2022-11-18 20:41:09 +03:00
|
|
|
|
|
|
|
assert(child->quiesced_parent);
|
|
|
|
assert(!new_bs || new_bs->quiesce_counter);
|
|
|
|
|
2021-04-28 18:17:44 +03:00
|
|
|
*s = (BdrvReplaceChildState) {
|
2022-07-26 23:11:29 +03:00
|
|
|
.child = child,
|
|
|
|
.old_bs = child->bs,
|
2021-04-28 18:17:44 +03:00
|
|
|
};
|
|
|
|
tran_add(tran, &bdrv_replace_child_drv, s);
|
|
|
|
|
|
|
|
if (new_bs) {
|
|
|
|
bdrv_ref(new_bs);
|
|
|
|
}
|
2022-07-26 23:11:31 +03:00
|
|
|
bdrv_replace_child_noperm(child, new_bs);
|
2022-07-26 23:11:29 +03:00
|
|
|
/* old_bs reference is transparently moved from @child to @s */
|
2021-04-28 18:17:44 +03:00
|
|
|
}
|
|
|
|
|
2016-12-15 15:04:20 +03:00
|
|
|
/*
|
2021-04-28 18:18:04 +03:00
|
|
|
* Refresh permissions in @bs subtree. The function is intended to be called
|
|
|
|
* after some graph modification that was done without permission update.
|
2016-12-15 15:04:20 +03:00
|
|
|
*/
|
2021-04-28 18:18:04 +03:00
|
|
|
static int bdrv_node_refresh_perm(BlockDriverState *bs, BlockReopenQueue *q,
|
|
|
|
Transaction *tran, Error **errp)
|
2016-12-15 15:04:20 +03:00
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
|
|
|
BdrvChild *c;
|
|
|
|
int ret;
|
2021-04-28 18:18:04 +03:00
|
|
|
uint64_t cumulative_perms, cumulative_shared_perms;
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-04-28 18:18:04 +03:00
|
|
|
|
|
|
|
bdrv_get_cumulative_perm(bs, &cumulative_perms, &cumulative_shared_perms);
|
2016-12-15 15:04:20 +03:00
|
|
|
|
|
|
|
/* Write permissions never work with read-only images */
|
|
|
|
if ((cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) &&
|
2018-06-06 22:37:00 +03:00
|
|
|
!bdrv_is_writable_after_reopen(bs, q))
|
2016-12-15 15:04:20 +03:00
|
|
|
{
|
2019-05-15 23:15:00 +03:00
|
|
|
if (!bdrv_is_writable_after_reopen(bs, NULL)) {
|
|
|
|
error_setg(errp, "Block node is read-only");
|
|
|
|
} else {
|
2021-04-28 18:18:04 +03:00
|
|
|
error_setg(errp, "Read-only block node '%s' cannot support "
|
|
|
|
"read-write users", bdrv_get_node_name(bs));
|
2019-05-15 23:15:00 +03:00
|
|
|
}
|
|
|
|
|
2016-12-15 15:04:20 +03:00
|
|
|
return -EPERM;
|
|
|
|
}
|
|
|
|
|
2020-07-16 17:26:00 +03:00
|
|
|
/*
|
|
|
|
* Unaligned requests will automatically be aligned to bl.request_alignment
|
|
|
|
* and without RESIZE we can't extend requests to write to space beyond the
|
|
|
|
* end of the image, so it's required that the image size is aligned.
|
|
|
|
*/
|
|
|
|
if ((cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) &&
|
|
|
|
!(cumulative_perms & BLK_PERM_RESIZE))
|
|
|
|
{
|
|
|
|
if ((bs->total_sectors * BDRV_SECTOR_SIZE) % bs->bl.request_alignment) {
|
|
|
|
error_setg(errp, "Cannot get 'write' permission without 'resize': "
|
|
|
|
"Image size is not a multiple of request "
|
|
|
|
"alignment");
|
|
|
|
return -EPERM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-12-15 15:04:20 +03:00
|
|
|
/* Check this node */
|
|
|
|
if (!drv) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:43 +03:00
|
|
|
ret = bdrv_drv_set_perm(bs, cumulative_perms, cumulative_shared_perms, tran,
|
2021-04-28 18:17:42 +03:00
|
|
|
errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
2016-12-15 15:04:20 +03:00
|
|
|
}
|
|
|
|
|
2016-12-21 01:25:12 +03:00
|
|
|
/* Drivers that never have children can omit .bdrv_child_perm() */
|
2016-12-15 15:04:20 +03:00
|
|
|
if (!drv->bdrv_child_perm) {
|
2016-12-21 01:25:12 +03:00
|
|
|
assert(QLIST_EMPTY(&bs->children));
|
2016-12-15 15:04:20 +03:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Check all children */
|
|
|
|
QLIST_FOREACH(c, &bs->children, next) {
|
|
|
|
uint64_t cur_perm, cur_shared;
|
2019-05-22 20:03:50 +03:00
|
|
|
|
2020-05-13 14:05:44 +03:00
|
|
|
bdrv_child_perm(bs, c->bs, c, c->role, q,
|
2017-05-02 19:35:38 +03:00
|
|
|
cumulative_perms, cumulative_shared_perms,
|
|
|
|
&cur_perm, &cur_shared);
|
2021-04-28 18:18:02 +03:00
|
|
|
bdrv_child_set_perm(c, cur_perm, cur_shared, tran);
|
block: use topological sort for permission update
Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
to update nodes in topological sort order instead of simple DFS. With
topologically sorted nodes, we update a node only when all its parents
already updated. With DFS it's not so.
Consider the following example:
A -+
| |
| v
| B
| |
v |
C<-+
A is parent for B and C, B is parent for C.
Obviously, to update permissions, we should go in order A B C, so, when
we update C, all parent permissions already updated. But with current
approach (simple recursion) we can update in sequence A C B C (C is
updated twice). On first update of C, we consider old B permissions, so
doing wrong thing. If it succeed, all is OK, on second C update we will
finish with correct graph. But if the wrong thing failed, we break the
whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).
Also new approach gives a way to simultaneously and correctly update
several nodes, we just need to run bdrv_topological_dfs() several times
to add all nodes and their subtrees into one topologically sorted list
(next patch will update bdrv_replace_node() in this manner).
Test test_parallel_perm_update() is now passing, so move it out of
debugging "if".
We also need to support ignore_children in
bdrv_parent_perms_conflict()
For test 283 order of conflicting parents check is changed.
Note also that in bdrv_check_perm() we don't check for parents conflict
at root bs, as we may be in the middle of permission update in
bdrv_reopen_multiple(). bdrv_reopen_multiple() will be updated soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-14-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-28 18:17:41 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2021-04-28 18:17:40 +03:00
|
|
|
|
2022-11-07 19:35:58 +03:00
|
|
|
/*
|
|
|
|
* @list is a product of bdrv_topological_dfs() (may be called several times) -
|
|
|
|
* a topologically sorted subgraph.
|
|
|
|
*/
|
|
|
|
static int bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q,
|
|
|
|
Transaction *tran, Error **errp)
|
block: use topological sort for permission update
Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
to update nodes in topological sort order instead of simple DFS. With
topologically sorted nodes, we update a node only when all its parents
already updated. With DFS it's not so.
Consider the following example:
A -+
| |
| v
| B
| |
v |
C<-+
A is parent for B and C, B is parent for C.
Obviously, to update permissions, we should go in order A B C, so, when
we update C, all parent permissions already updated. But with current
approach (simple recursion) we can update in sequence A C B C (C is
updated twice). On first update of C, we consider old B permissions, so
doing wrong thing. If it succeed, all is OK, on second C update we will
finish with correct graph. But if the wrong thing failed, we break the
whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).
Also new approach gives a way to simultaneously and correctly update
several nodes, we just need to run bdrv_topological_dfs() several times
to add all nodes and their subtrees into one topologically sorted list
(next patch will update bdrv_replace_node() in this manner).
Test test_parallel_perm_update() is now passing, so move it out of
debugging "if".
We also need to support ignore_children in
bdrv_parent_perms_conflict()
For test 283 order of conflicting parents check is changed.
Note also that in bdrv_check_perm() we don't check for parents conflict
at root bs, as we may be in the middle of permission update in
bdrv_reopen_multiple(). bdrv_reopen_multiple() will be updated soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-14-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-28 18:17:41 +03:00
|
|
|
{
|
|
|
|
int ret;
|
2021-04-28 18:17:43 +03:00
|
|
|
BlockDriverState *bs;
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
block: use topological sort for permission update
Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
to update nodes in topological sort order instead of simple DFS. With
topologically sorted nodes, we update a node only when all its parents
already updated. With DFS it's not so.
Consider the following example:
A -+
| |
| v
| B
| |
v |
C<-+
A is parent for B and C, B is parent for C.
Obviously, to update permissions, we should go in order A B C, so, when
we update C, all parent permissions already updated. But with current
approach (simple recursion) we can update in sequence A C B C (C is
updated twice). On first update of C, we consider old B permissions, so
doing wrong thing. If it succeed, all is OK, on second C update we will
finish with correct graph. But if the wrong thing failed, we break the
whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).
Also new approach gives a way to simultaneously and correctly update
several nodes, we just need to run bdrv_topological_dfs() several times
to add all nodes and their subtrees into one topologically sorted list
(next patch will update bdrv_replace_node() in this manner).
Test test_parallel_perm_update() is now passing, so move it out of
debugging "if".
We also need to support ignore_children in
bdrv_parent_perms_conflict()
For test 283 order of conflicting parents check is changed.
Note also that in bdrv_check_perm() we don't check for parents conflict
at root bs, as we may be in the middle of permission update in
bdrv_reopen_multiple(). bdrv_reopen_multiple() will be updated soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-14-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-28 18:17:41 +03:00
|
|
|
|
2021-04-28 18:17:43 +03:00
|
|
|
for ( ; list; list = list->next) {
|
|
|
|
bs = list->data;
|
|
|
|
|
2021-04-28 18:17:53 +03:00
|
|
|
if (bdrv_parent_perms_conflict(bs, errp)) {
|
2021-04-28 18:17:43 +03:00
|
|
|
return -EINVAL;
|
block: use topological sort for permission update
Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
to update nodes in topological sort order instead of simple DFS. With
topologically sorted nodes, we update a node only when all its parents
already updated. With DFS it's not so.
Consider the following example:
A -+
| |
| v
| B
| |
v |
C<-+
A is parent for B and C, B is parent for C.
Obviously, to update permissions, we should go in order A B C, so, when
we update C, all parent permissions already updated. But with current
approach (simple recursion) we can update in sequence A C B C (C is
updated twice). On first update of C, we consider old B permissions, so
doing wrong thing. If it succeed, all is OK, on second C update we will
finish with correct graph. But if the wrong thing failed, we break the
whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).
Also new approach gives a way to simultaneously and correctly update
several nodes, we just need to run bdrv_topological_dfs() several times
to add all nodes and their subtrees into one topologically sorted list
(next patch will update bdrv_replace_node() in this manner).
Test test_parallel_perm_update() is now passing, so move it out of
debugging "if".
We also need to support ignore_children in
bdrv_parent_perms_conflict()
For test 283 order of conflicting parents check is changed.
Note also that in bdrv_check_perm() we don't check for parents conflict
at root bs, as we may be in the middle of permission update in
bdrv_reopen_multiple(). bdrv_reopen_multiple() will be updated soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-14-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-28 18:17:41 +03:00
|
|
|
}
|
|
|
|
|
2021-04-28 18:18:04 +03:00
|
|
|
ret = bdrv_node_refresh_perm(bs, q, tran, errp);
|
2016-12-15 15:04:20 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-11-07 19:35:58 +03:00
|
|
|
/*
|
|
|
|
* @list is any list of nodes. List is completed by all subtrees and
|
|
|
|
* topologically sorted. It's not a problem if some node occurs in the @list
|
|
|
|
* several times.
|
|
|
|
*/
|
|
|
|
static int bdrv_list_refresh_perms(GSList *list, BlockReopenQueue *q,
|
|
|
|
Transaction *tran, Error **errp)
|
|
|
|
{
|
|
|
|
g_autoptr(GHashTable) found = g_hash_table_new(NULL, NULL);
|
|
|
|
g_autoptr(GSList) refresh_list = NULL;
|
|
|
|
|
|
|
|
for ( ; list; list = list->next) {
|
|
|
|
refresh_list = bdrv_topological_dfs(refresh_list, found, list->data);
|
|
|
|
}
|
|
|
|
|
|
|
|
return bdrv_do_refresh_perms(refresh_list, q, tran, errp);
|
|
|
|
}
|
|
|
|
|
2020-03-10 14:38:25 +03:00
|
|
|
void bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
|
|
|
|
uint64_t *shared_perm)
|
2016-12-15 15:04:20 +03:00
|
|
|
{
|
|
|
|
BdrvChild *c;
|
|
|
|
uint64_t cumulative_perms = 0;
|
|
|
|
uint64_t cumulative_shared_perms = BLK_PERM_ALL;
|
|
|
|
|
2022-03-03 18:15:57 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2016-12-15 15:04:20 +03:00
|
|
|
QLIST_FOREACH(c, &bs->parents, next_parent) {
|
|
|
|
cumulative_perms |= c->perm;
|
|
|
|
cumulative_shared_perms &= c->shared_perm;
|
|
|
|
}
|
|
|
|
|
|
|
|
*perm = cumulative_perms;
|
|
|
|
*shared_perm = cumulative_shared_perms;
|
|
|
|
}
|
|
|
|
|
2017-05-02 19:35:36 +03:00
|
|
|
char *bdrv_perm_names(uint64_t perm)
|
2017-01-16 20:26:20 +03:00
|
|
|
{
|
|
|
|
struct perm_name {
|
|
|
|
uint64_t perm;
|
|
|
|
const char *name;
|
|
|
|
} permissions[] = {
|
|
|
|
{ BLK_PERM_CONSISTENT_READ, "consistent read" },
|
|
|
|
{ BLK_PERM_WRITE, "write" },
|
|
|
|
{ BLK_PERM_WRITE_UNCHANGED, "write unchanged" },
|
|
|
|
{ BLK_PERM_RESIZE, "resize" },
|
|
|
|
{ 0, NULL }
|
|
|
|
};
|
|
|
|
|
2020-01-10 20:15:18 +03:00
|
|
|
GString *result = g_string_sized_new(30);
|
2017-01-16 20:26:20 +03:00
|
|
|
struct perm_name *p;
|
|
|
|
|
|
|
|
for (p = permissions; p->name; p++) {
|
|
|
|
if (perm & p->perm) {
|
2020-01-10 20:15:18 +03:00
|
|
|
if (result->len > 0) {
|
|
|
|
g_string_append(result, ", ");
|
|
|
|
}
|
|
|
|
g_string_append(result, p->name);
|
2017-01-16 20:26:20 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-01-10 20:15:18 +03:00
|
|
|
return g_string_free(result, FALSE);
|
2017-01-16 20:26:20 +03:00
|
|
|
}
|
|
|
|
|
2016-12-15 15:04:20 +03:00
|
|
|
|
2022-11-07 19:35:57 +03:00
|
|
|
/* @tran is allowed to be NULL. In this case no rollback is possible */
|
|
|
|
static int bdrv_refresh_perms(BlockDriverState *bs, Transaction *tran,
|
|
|
|
Error **errp)
|
2020-11-06 15:42:38 +03:00
|
|
|
{
|
|
|
|
int ret;
|
2022-11-07 19:35:57 +03:00
|
|
|
Transaction *local_tran = NULL;
|
2021-04-28 18:17:43 +03:00
|
|
|
g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, bs);
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2020-11-06 15:42:38 +03:00
|
|
|
|
2022-11-07 19:35:57 +03:00
|
|
|
if (!tran) {
|
|
|
|
tran = local_tran = tran_new();
|
|
|
|
}
|
|
|
|
|
2022-11-07 19:35:58 +03:00
|
|
|
ret = bdrv_do_refresh_perms(list, NULL, tran, errp);
|
2022-11-07 19:35:57 +03:00
|
|
|
|
|
|
|
if (local_tran) {
|
|
|
|
tran_finalize(local_tran, ret);
|
|
|
|
}
|
2020-11-06 15:42:38 +03:00
|
|
|
|
2021-04-28 18:17:43 +03:00
|
|
|
return ret;
|
2020-11-06 15:42:38 +03:00
|
|
|
}
|
|
|
|
|
2016-12-15 15:04:20 +03:00
|
|
|
int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
|
|
|
|
Error **errp)
|
|
|
|
{
|
block: Ignore loosening perm restrictions failures
We generally assume that loosening permission restrictions can never
fail. We have seen in the past that this assumption is wrong. This has
led to crashes because we generally pass &error_abort when loosening
permissions.
However, a failure in such a case should actually be handled in quite
the opposite way: It is very much not fatal, so qemu may report it, but
still consider the operation successful. The only realistic problem is
that qemu may then retain permissions and thus locks on images it
actually does not require. But again, that is not fatal.
To implement this behavior, we make all functions that change
permissions and that pass &error_abort to the initiating function
(bdrv_check_perm() or bdrv_child_check_perm()) evaluate the
@loosen_restrictions value introduced in the previous patch. If it is
true and an error did occur, we abort the permission update, discard the
error, and instead report success to the caller.
bdrv_child_try_set_perm() itself does not pass &error_abort, but it is
the only public function to change permissions. As such, callers may
pass &error_abort to it, expecting dropping permission restrictions to
never fail.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-22 20:03:51 +03:00
|
|
|
Error *local_err = NULL;
|
2021-04-28 18:17:39 +03:00
|
|
|
Transaction *tran = tran_new();
|
2016-12-15 15:04:20 +03:00
|
|
|
int ret;
|
|
|
|
|
2022-03-03 18:15:57 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2021-04-28 18:18:02 +03:00
|
|
|
bdrv_child_set_perm(c, perm, shared, tran);
|
2021-04-28 18:17:39 +03:00
|
|
|
|
2022-11-07 19:35:57 +03:00
|
|
|
ret = bdrv_refresh_perms(c->bs, tran, &local_err);
|
2021-04-28 18:17:39 +03:00
|
|
|
|
|
|
|
tran_finalize(tran, ret);
|
|
|
|
|
2016-12-15 15:04:20 +03:00
|
|
|
if (ret < 0) {
|
block: drop tighten_restrictions
The only users of this thing are:
1. bdrv_child_try_set_perm, to ignore failures on loosen restrictions
2. assertion in bdrv_replace_child
3. assertion in bdrv_inactivate_recurse
Assertions are not enough reason for overcomplication the permission
update system. So, look at bdrv_child_try_set_perm.
We are interested in tighten_restrictions only on failure. But on
failure this field is not reliable: we may fail in the middle of
permission update, some nodes are not touched and we don't know should
their permissions be tighten or not. So, we rely on the fact that if we
loose restrictions on some node (or BdrvChild), we'll not tighten
restriction in the whole subtree as part of this update (assertions 2
and 3 rely on this fact as well). And, if we rely on this fact anyway,
we can just check it on top, and don't pass additional pointer through
the whole recursive infrastructure.
Note also, that further patches will fix real bugs in permission update
system, so now is good time to simplify it, as a help for further
refactorings.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-8-vsementsov@virtuozzo.com>
[mreitz: Fixed rebase conflict]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-06 15:42:41 +03:00
|
|
|
if ((perm & ~c->perm) || (c->shared_perm & ~shared)) {
|
|
|
|
/* tighten permissions */
|
block: Ignore loosening perm restrictions failures
We generally assume that loosening permission restrictions can never
fail. We have seen in the past that this assumption is wrong. This has
led to crashes because we generally pass &error_abort when loosening
permissions.
However, a failure in such a case should actually be handled in quite
the opposite way: It is very much not fatal, so qemu may report it, but
still consider the operation successful. The only realistic problem is
that qemu may then retain permissions and thus locks on images it
actually does not require. But again, that is not fatal.
To implement this behavior, we make all functions that change
permissions and that pass &error_abort to the initiating function
(bdrv_check_perm() or bdrv_child_check_perm()) evaluate the
@loosen_restrictions value introduced in the previous patch. If it is
true and an error did occur, we abort the permission update, discard the
error, and instead report success to the caller.
bdrv_child_try_set_perm() itself does not pass &error_abort, but it is
the only public function to change permissions. As such, callers may
pass &error_abort to it, expecting dropping permission restrictions to
never fail.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-22 20:03:51 +03:00
|
|
|
error_propagate(errp, local_err);
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Our caller may intend to only loosen restrictions and
|
|
|
|
* does not expect this function to fail. Errors are not
|
|
|
|
* fatal in such a case, so we can just hide them from our
|
|
|
|
* caller.
|
|
|
|
*/
|
|
|
|
error_free(local_err);
|
|
|
|
ret = 0;
|
|
|
|
}
|
2016-12-15 15:04:20 +03:00
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:39 +03:00
|
|
|
return ret;
|
2016-12-14 19:24:36 +03:00
|
|
|
}
|
|
|
|
|
2019-05-22 20:03:46 +03:00
|
|
|
int bdrv_child_refresh_perms(BlockDriverState *bs, BdrvChild *c, Error **errp)
|
|
|
|
{
|
|
|
|
uint64_t parent_perms, parent_shared;
|
|
|
|
uint64_t perms, shared;
|
|
|
|
|
2022-03-03 18:15:57 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2019-05-22 20:03:46 +03:00
|
|
|
bdrv_get_cumulative_perm(bs, &parent_perms, &parent_shared);
|
2020-05-13 14:05:44 +03:00
|
|
|
bdrv_child_perm(bs, c->bs, c, c->role, NULL,
|
2020-05-13 14:05:16 +03:00
|
|
|
parent_perms, parent_shared, &perms, &shared);
|
2019-05-22 20:03:46 +03:00
|
|
|
|
|
|
|
return bdrv_child_try_set_perm(c, perms, shared, errp);
|
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:40 +03:00
|
|
|
/*
|
|
|
|
* Default implementation for .bdrv_child_perm() for block filters:
|
|
|
|
* Forward CONSISTENT_READ, WRITE, WRITE_UNCHANGED, and RESIZE to the
|
|
|
|
* filtered child.
|
|
|
|
*/
|
|
|
|
static void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
|
|
|
|
BdrvChildRole role,
|
|
|
|
BlockReopenQueue *reopen_queue,
|
|
|
|
uint64_t perm, uint64_t shared,
|
|
|
|
uint64_t *nperm, uint64_t *nshared)
|
2016-12-15 13:27:32 +03:00
|
|
|
{
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2019-08-02 16:59:41 +03:00
|
|
|
*nperm = perm & DEFAULT_PERM_PASSTHROUGH;
|
|
|
|
*nshared = (shared & DEFAULT_PERM_PASSTHROUGH) | DEFAULT_PERM_UNCHANGED;
|
2016-12-15 13:27:32 +03:00
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:26 +03:00
|
|
|
static void bdrv_default_perms_for_cow(BlockDriverState *bs, BdrvChild *c,
|
|
|
|
BdrvChildRole role,
|
|
|
|
BlockReopenQueue *reopen_queue,
|
|
|
|
uint64_t perm, uint64_t shared,
|
|
|
|
uint64_t *nperm, uint64_t *nshared)
|
|
|
|
{
|
2020-05-13 14:05:44 +03:00
|
|
|
assert(role & BDRV_CHILD_COW);
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2020-05-13 14:05:26 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We want consistent read from backing files if the parent needs it.
|
|
|
|
* No other operations are performed on backing files.
|
|
|
|
*/
|
|
|
|
perm &= BLK_PERM_CONSISTENT_READ;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the parent can deal with changing data, we're okay with a
|
|
|
|
* writable and resizable backing file.
|
|
|
|
* TODO Require !(perm & BLK_PERM_CONSISTENT_READ), too?
|
|
|
|
*/
|
|
|
|
if (shared & BLK_PERM_WRITE) {
|
|
|
|
shared = BLK_PERM_WRITE | BLK_PERM_RESIZE;
|
|
|
|
} else {
|
|
|
|
shared = 0;
|
|
|
|
}
|
|
|
|
|
block: drop BLK_PERM_GRAPH_MOD
First, this permission never protected a node from being changed, as
generic child-replacing functions don't check it.
Second, it's a strange thing: it presents a permission of parent node
to change its child. But generally, children are replaced by different
mechanisms, like jobs or qmp commands, not by nodes.
Graph-mod permission is hard to understand. All other permissions
describe operations which done by parent node on its child: read,
write, resize. Graph modification operations are something completely
different.
The only place where BLK_PERM_GRAPH_MOD is used as "perm" (not shared
perm) is mirror_start_job, for s->target. Still modern code should use
bdrv_freeze_backing_chain() to protect from graph modification, if we
don't do it somewhere it may be considered as a bug. So, it's a bit
risky to drop GRAPH_MOD, and analyzing of possible loss of protection
is hard. But one day we should do it, let's do it now.
One more bit of information is that locking the corresponding byte in
file-posix doesn't make sense at all.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902093754.2352-1-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-09-02 12:37:54 +03:00
|
|
|
shared |= BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
|
2020-05-13 14:05:26 +03:00
|
|
|
|
|
|
|
if (bs->open_flags & BDRV_O_INACTIVE) {
|
|
|
|
shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
|
|
|
|
}
|
|
|
|
|
|
|
|
*nperm = perm;
|
|
|
|
*nshared = shared;
|
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:27 +03:00
|
|
|
static void bdrv_default_perms_for_storage(BlockDriverState *bs, BdrvChild *c,
|
|
|
|
BdrvChildRole role,
|
|
|
|
BlockReopenQueue *reopen_queue,
|
|
|
|
uint64_t perm, uint64_t shared,
|
|
|
|
uint64_t *nperm, uint64_t *nshared)
|
|
|
|
{
|
|
|
|
int flags;
|
|
|
|
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2020-05-13 14:05:44 +03:00
|
|
|
assert(role & (BDRV_CHILD_METADATA | BDRV_CHILD_DATA));
|
2020-05-13 14:05:27 +03:00
|
|
|
|
|
|
|
flags = bdrv_reopen_get_flags(reopen_queue, bs);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Apart from the modifications below, the same permissions are
|
|
|
|
* forwarded and left alone as for filters
|
|
|
|
*/
|
2020-05-13 14:05:44 +03:00
|
|
|
bdrv_filter_default_perms(bs, c, role, reopen_queue,
|
2020-05-13 14:05:27 +03:00
|
|
|
perm, shared, &perm, &shared);
|
|
|
|
|
2020-05-13 14:05:28 +03:00
|
|
|
if (role & BDRV_CHILD_METADATA) {
|
|
|
|
/* Format drivers may touch metadata even if the guest doesn't write */
|
|
|
|
if (bdrv_is_writable_after_reopen(bs, reopen_queue)) {
|
|
|
|
perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* bs->file always needs to be consistent because of the
|
|
|
|
* metadata. We can never allow other users to resize or write
|
|
|
|
* to it.
|
|
|
|
*/
|
|
|
|
if (!(flags & BDRV_O_NO_IO)) {
|
|
|
|
perm |= BLK_PERM_CONSISTENT_READ;
|
|
|
|
}
|
|
|
|
shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
|
2020-05-13 14:05:27 +03:00
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:28 +03:00
|
|
|
if (role & BDRV_CHILD_DATA) {
|
|
|
|
/*
|
|
|
|
* Technically, everything in this block is a subset of the
|
|
|
|
* BDRV_CHILD_METADATA path taken above, and so this could
|
|
|
|
* be an "else if" branch. However, that is not obvious, and
|
|
|
|
* this function is not performance critical, therefore we let
|
|
|
|
* this be an independent "if".
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We cannot allow other users to resize the file because the
|
|
|
|
* format driver might have some assumptions about the size
|
|
|
|
* (e.g. because it is stored in metadata, or because the file
|
|
|
|
* is split into fixed-size data files).
|
|
|
|
*/
|
|
|
|
shared &= ~BLK_PERM_RESIZE;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* WRITE_UNCHANGED often cannot be performed as such on the
|
|
|
|
* data file. For example, the qcow2 driver may still need to
|
|
|
|
* write copied clusters on copy-on-read.
|
|
|
|
*/
|
|
|
|
if (perm & BLK_PERM_WRITE_UNCHANGED) {
|
|
|
|
perm |= BLK_PERM_WRITE;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the data file is written to, the format driver may
|
|
|
|
* expect to be able to resize it by writing beyond the EOF.
|
|
|
|
*/
|
|
|
|
if (perm & BLK_PERM_WRITE) {
|
|
|
|
perm |= BLK_PERM_RESIZE;
|
|
|
|
}
|
2020-05-13 14:05:27 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (bs->open_flags & BDRV_O_INACTIVE) {
|
|
|
|
shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
|
|
|
|
}
|
|
|
|
|
|
|
|
*nperm = perm;
|
|
|
|
*nshared = shared;
|
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:29 +03:00
|
|
|
void bdrv_default_perms(BlockDriverState *bs, BdrvChild *c,
|
2020-05-13 14:05:44 +03:00
|
|
|
BdrvChildRole role, BlockReopenQueue *reopen_queue,
|
2020-05-13 14:05:29 +03:00
|
|
|
uint64_t perm, uint64_t shared,
|
|
|
|
uint64_t *nperm, uint64_t *nshared)
|
|
|
|
{
|
2022-03-03 18:15:57 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2020-05-13 14:05:29 +03:00
|
|
|
if (role & BDRV_CHILD_FILTERED) {
|
|
|
|
assert(!(role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
|
|
|
|
BDRV_CHILD_COW)));
|
2020-05-13 14:05:44 +03:00
|
|
|
bdrv_filter_default_perms(bs, c, role, reopen_queue,
|
2020-05-13 14:05:29 +03:00
|
|
|
perm, shared, nperm, nshared);
|
|
|
|
} else if (role & BDRV_CHILD_COW) {
|
|
|
|
assert(!(role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA)));
|
2020-05-13 14:05:44 +03:00
|
|
|
bdrv_default_perms_for_cow(bs, c, role, reopen_queue,
|
2020-05-13 14:05:29 +03:00
|
|
|
perm, shared, nperm, nshared);
|
|
|
|
} else if (role & (BDRV_CHILD_METADATA | BDRV_CHILD_DATA)) {
|
2020-05-13 14:05:44 +03:00
|
|
|
bdrv_default_perms_for_storage(bs, c, role, reopen_queue,
|
2020-05-13 14:05:29 +03:00
|
|
|
perm, shared, nperm, nshared);
|
|
|
|
} else {
|
|
|
|
g_assert_not_reached();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-11-08 15:34:51 +03:00
|
|
|
uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
|
|
|
|
{
|
|
|
|
static const uint64_t permissions[] = {
|
|
|
|
[BLOCK_PERMISSION_CONSISTENT_READ] = BLK_PERM_CONSISTENT_READ,
|
|
|
|
[BLOCK_PERMISSION_WRITE] = BLK_PERM_WRITE,
|
|
|
|
[BLOCK_PERMISSION_WRITE_UNCHANGED] = BLK_PERM_WRITE_UNCHANGED,
|
|
|
|
[BLOCK_PERMISSION_RESIZE] = BLK_PERM_RESIZE,
|
|
|
|
};
|
|
|
|
|
|
|
|
QEMU_BUILD_BUG_ON(ARRAY_SIZE(permissions) != BLOCK_PERMISSION__MAX);
|
|
|
|
QEMU_BUILD_BUG_ON(1UL << ARRAY_SIZE(permissions) != BLK_PERM_ALL + 1);
|
|
|
|
|
|
|
|
assert(qapi_perm < BLOCK_PERMISSION__MAX);
|
|
|
|
|
|
|
|
return permissions[qapi_perm];
|
|
|
|
}
|
|
|
|
|
2022-11-18 20:41:09 +03:00
|
|
|
/*
|
|
|
|
* Replaces the node that a BdrvChild points to without updating permissions.
|
|
|
|
*
|
|
|
|
* If @new_bs is non-NULL, the parent of @child must already be drained through
|
|
|
|
* @child.
|
|
|
|
*/
|
2022-07-26 23:11:31 +03:00
|
|
|
static void bdrv_replace_child_noperm(BdrvChild *child,
|
2022-07-26 23:11:28 +03:00
|
|
|
BlockDriverState *new_bs)
|
2016-05-23 16:52:26 +03:00
|
|
|
{
|
|
|
|
BlockDriverState *old_bs = child->bs;
|
block: Reduce (un)drains when replacing a child
Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.
This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way. bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.
In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained. So if anything, we have to drain the
parent before detaching the old child node. Conversely, we have to
undrain it only after attaching the new child node.
Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-22 16:33:44 +03:00
|
|
|
int new_bs_quiesce_counter;
|
2016-05-23 16:52:26 +03:00
|
|
|
|
2019-03-12 19:48:40 +03:00
|
|
|
assert(!child->frozen);
|
2022-11-18 20:41:09 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we want to change the BdrvChild to point to a drained node as its new
|
|
|
|
* child->bs, we need to make sure that its new parent is drained, too. In
|
|
|
|
* other words, either child->quiesce_parent must already be true or we must
|
|
|
|
* be able to set it and keep the parent's quiesce_counter consistent with
|
|
|
|
* that, but without polling or starting new requests (this function
|
|
|
|
* guarantees that it doesn't poll, and starting new requests would be
|
|
|
|
* against the invariants of drain sections).
|
|
|
|
*
|
|
|
|
* To keep things simple, we pick the first option (child->quiesce_parent
|
|
|
|
* must already be true). We also generalise the rule a bit to make it
|
|
|
|
* easier to verify in callers and more likely to be covered in test cases:
|
|
|
|
* The parent must be quiesced through this child even if new_bs isn't
|
|
|
|
* currently drained.
|
|
|
|
*
|
|
|
|
* The only exception is for callers that always pass new_bs == NULL. In
|
|
|
|
* this case, we obviously never need to consider the case of a drained
|
|
|
|
* new_bs, so we can keep the callers simpler by allowing them not to drain
|
|
|
|
* the parent.
|
|
|
|
*/
|
|
|
|
assert(!new_bs || child->quiesced_parent);
|
2021-10-18 16:47:14 +03:00
|
|
|
assert(old_bs != new_bs);
|
2022-03-03 18:16:13 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2019-03-12 19:48:40 +03:00
|
|
|
|
2017-04-07 09:54:10 +03:00
|
|
|
if (old_bs && new_bs) {
|
|
|
|
assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
|
|
|
|
}
|
block: Reduce (un)drains when replacing a child
Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.
This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way. bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.
In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained. So if anything, we have to drain the
parent before detaching the old child node. Conversely, we have to
undrain it only after attaching the new child node.
Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-22 16:33:44 +03:00
|
|
|
|
2022-12-07 16:18:31 +03:00
|
|
|
/* TODO Pull this up into the callers to avoid polling here */
|
|
|
|
bdrv_graph_wrlock();
|
2016-05-23 16:52:26 +03:00
|
|
|
if (old_bs) {
|
2020-05-13 14:05:13 +03:00
|
|
|
if (child->klass->detach) {
|
|
|
|
child->klass->detach(child);
|
2017-12-18 18:05:48 +03:00
|
|
|
}
|
2016-05-23 16:52:26 +03:00
|
|
|
QLIST_REMOVE(child, next_parent);
|
|
|
|
}
|
2016-05-17 15:51:55 +03:00
|
|
|
|
|
|
|
child->bs = new_bs;
|
|
|
|
|
2016-05-23 16:52:26 +03:00
|
|
|
if (new_bs) {
|
|
|
|
QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
|
2020-05-13 14:05:13 +03:00
|
|
|
if (child->klass->attach) {
|
|
|
|
child->klass->attach(child);
|
2017-03-06 15:45:28 +03:00
|
|
|
}
|
|
|
|
}
|
2022-12-07 16:18:31 +03:00
|
|
|
bdrv_graph_wrunlock();
|
block: Reduce (un)drains when replacing a child
Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.
This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way. bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.
In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained. So if anything, we have to drain the
parent before detaching the old child node. Conversely, we have to
undrain it only after attaching the new child node.
Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-22 16:33:44 +03:00
|
|
|
|
|
|
|
/*
|
2022-11-18 20:41:09 +03:00
|
|
|
* If the parent was drained through this BdrvChild previously, but new_bs
|
|
|
|
* is not drained, allow requests to come in only after the new node has
|
|
|
|
* been attached.
|
block: Reduce (un)drains when replacing a child
Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.
This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way. bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.
In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained. So if anything, we have to drain the
parent before detaching the old child node. Conversely, we have to
undrain it only after attaching the new child node.
Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-22 16:33:44 +03:00
|
|
|
*/
|
block: Call drain callbacks only once
We only need to call both the BlockDriver's callback and the parent
callbacks when going from undrained to drained or vice versa. A second
drain section doesn't make a difference for the driver or the parent,
they weren't supposed to send new requests before and after the second
drain.
One thing that gets in the way is the 'ignore_bds_parents' parameter in
bdrv_do_drained_begin_quiesce() and bdrv_do_drained_end(): It means that
bdrv_drain_all_begin() increases bs->quiesce_counter, but does not
quiesce the parent through BdrvChildClass callbacks. If an additional
drain section is started now, bs->quiesce_counter will be non-zero, but
we would still need to quiesce the parent through BdrvChildClass in
order to keep things consistent (and unquiesce it on the matching
bdrv_drained_end(), even though the counter would not reach 0 yet as
long as the bdrv_drain_all() section is still active).
Instead of keeping track of this, let's just get rid of the parameter.
It was introduced in commit 6cd5c9d7b2d as an optimisation so that
during bdrv_drain_all(), we wouldn't recursively drain all parents up to
the root for each node, resulting in quadratic complexity. As it happens,
calling the callbacks only once solves the same problem, so as of this
patch, we'll still have O(n) complexity and ignore_bds_parents is not
needed any more.
This patch only ignores the 'ignore_bds_parents' parameter. It will be
removed in a separate patch.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221118174110.55183-12-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-11-18 20:41:06 +03:00
|
|
|
new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
|
|
|
|
if (!new_bs_quiesce_counter && child->quiesced_parent) {
|
block: Reduce (un)drains when replacing a child
Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.
This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way. bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.
In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained. So if anything, we have to drain the
parent before detaching the old child node. Conversely, we have to
undrain it only after attaching the new child node.
Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-22 16:33:44 +03:00
|
|
|
bdrv_parent_drained_end_single(child);
|
|
|
|
}
|
2017-03-06 15:45:28 +03:00
|
|
|
}
|
2016-12-15 15:04:20 +03:00
|
|
|
|
2021-11-15 17:53:59 +03:00
|
|
|
/**
|
|
|
|
* Free the given @child.
|
|
|
|
*
|
|
|
|
* The child must be empty (i.e. `child->bs == NULL`) and it must be
|
|
|
|
* unused (i.e. not in a children list).
|
|
|
|
*/
|
|
|
|
static void bdrv_child_free(BdrvChild *child)
|
2015-06-15 12:53:47 +03:00
|
|
|
{
|
2021-04-28 18:17:46 +03:00
|
|
|
assert(!child->bs);
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-11-15 17:53:58 +03:00
|
|
|
assert(!child->next.le_prev); /* not in children list */
|
2021-11-15 17:53:59 +03:00
|
|
|
|
|
|
|
g_free(child->name);
|
|
|
|
g_free(child);
|
2021-04-28 18:17:46 +03:00
|
|
|
}
|
2016-12-14 19:24:36 +03:00
|
|
|
|
2021-04-28 18:17:46 +03:00
|
|
|
typedef struct BdrvAttachChildCommonState {
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
BdrvChild *child;
|
2021-04-28 18:17:46 +03:00
|
|
|
AioContext *old_parent_ctx;
|
|
|
|
AioContext *old_child_ctx;
|
|
|
|
} BdrvAttachChildCommonState;
|
|
|
|
|
|
|
|
static void bdrv_attach_child_common_abort(void *opaque)
|
|
|
|
{
|
|
|
|
BdrvAttachChildCommonState *s = opaque;
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
BlockDriverState *bs = s->child->bs;
|
2021-04-28 18:17:46 +03:00
|
|
|
|
2022-03-03 18:16:13 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
bdrv_replace_child_noperm(s->child, NULL);
|
2021-04-28 18:17:46 +03:00
|
|
|
|
|
|
|
if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
|
2022-10-25 11:49:52 +03:00
|
|
|
bdrv_try_change_aio_context(bs, s->old_child_ctx, NULL, &error_abort);
|
2021-04-28 18:17:46 +03:00
|
|
|
}
|
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
if (bdrv_child_get_parent_aio_context(s->child) != s->old_parent_ctx) {
|
2022-10-25 11:49:49 +03:00
|
|
|
Transaction *tran;
|
|
|
|
GHashTable *visited;
|
|
|
|
bool ret;
|
2021-04-28 18:17:46 +03:00
|
|
|
|
2022-10-25 11:49:49 +03:00
|
|
|
tran = tran_new();
|
2021-04-28 18:17:46 +03:00
|
|
|
|
2022-10-25 11:49:49 +03:00
|
|
|
/* No need to visit `child`, because it has been detached already */
|
|
|
|
visited = g_hash_table_new(NULL, NULL);
|
|
|
|
ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
|
|
|
|
visited, tran, &error_abort);
|
|
|
|
g_hash_table_destroy(visited);
|
|
|
|
|
|
|
|
/* transaction is supposed to always succeed */
|
|
|
|
assert(ret == true);
|
|
|
|
tran_commit(tran);
|
2016-12-14 19:24:36 +03:00
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:46 +03:00
|
|
|
bdrv_unref(bs);
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
bdrv_child_free(s->child);
|
2021-04-28 18:17:46 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static TransactionActionDrv bdrv_attach_child_common_drv = {
|
|
|
|
.abort = bdrv_attach_child_common_abort,
|
|
|
|
.clean = g_free,
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Common part of attaching bdrv child to bs or to blk or to job
|
2021-06-01 10:52:13 +03:00
|
|
|
*
|
2021-06-10 14:25:45 +03:00
|
|
|
* Function doesn't update permissions, caller is responsible for this.
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
*
|
|
|
|
* Returns new created child.
|
2021-04-28 18:17:46 +03:00
|
|
|
*/
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
|
|
|
|
const char *child_name,
|
|
|
|
const BdrvChildClass *child_class,
|
|
|
|
BdrvChildRole child_role,
|
|
|
|
uint64_t perm, uint64_t shared_perm,
|
|
|
|
void *opaque,
|
|
|
|
Transaction *tran, Error **errp)
|
2021-04-28 18:17:46 +03:00
|
|
|
{
|
|
|
|
BdrvChild *new_child;
|
|
|
|
AioContext *parent_ctx;
|
|
|
|
AioContext *child_ctx = bdrv_get_aio_context(child_bs);
|
|
|
|
|
2021-06-01 10:52:17 +03:00
|
|
|
assert(child_class->get_parent_desc);
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-04-28 18:17:46 +03:00
|
|
|
|
|
|
|
new_child = g_new(BdrvChild, 1);
|
|
|
|
*new_child = (BdrvChild) {
|
2016-12-14 19:24:36 +03:00
|
|
|
.bs = NULL,
|
|
|
|
.name = g_strdup(child_name),
|
2020-05-13 14:05:13 +03:00
|
|
|
.klass = child_class,
|
2020-05-13 14:05:15 +03:00
|
|
|
.role = child_role,
|
2016-12-14 19:24:36 +03:00
|
|
|
.perm = perm,
|
|
|
|
.shared_perm = shared_perm,
|
|
|
|
.opaque = opaque,
|
2015-06-15 12:53:47 +03:00
|
|
|
};
|
|
|
|
|
2021-04-28 18:17:46 +03:00
|
|
|
/*
|
|
|
|
* If the AioContexts don't match, first try to move the subtree of
|
2019-04-24 18:41:46 +03:00
|
|
|
* child_bs into the AioContext of the new parent. If this doesn't work,
|
2021-04-28 18:17:46 +03:00
|
|
|
* try moving the parent into the AioContext of child_bs instead.
|
|
|
|
*/
|
|
|
|
parent_ctx = bdrv_child_get_parent_aio_context(new_child);
|
|
|
|
if (child_ctx != parent_ctx) {
|
|
|
|
Error *local_err = NULL;
|
2022-10-25 11:49:52 +03:00
|
|
|
int ret = bdrv_try_change_aio_context(child_bs, parent_ctx, NULL,
|
|
|
|
&local_err);
|
2021-04-28 18:17:46 +03:00
|
|
|
|
2022-10-25 11:49:49 +03:00
|
|
|
if (ret < 0 && child_class->change_aio_ctx) {
|
|
|
|
Transaction *tran = tran_new();
|
|
|
|
GHashTable *visited = g_hash_table_new(NULL, NULL);
|
|
|
|
bool ret_child;
|
|
|
|
|
|
|
|
g_hash_table_add(visited, new_child);
|
|
|
|
ret_child = child_class->change_aio_ctx(new_child, child_ctx,
|
|
|
|
visited, tran, NULL);
|
|
|
|
if (ret_child == true) {
|
2019-04-24 18:41:46 +03:00
|
|
|
error_free(local_err);
|
|
|
|
ret = 0;
|
|
|
|
}
|
2022-10-25 11:49:49 +03:00
|
|
|
tran_finalize(tran, ret_child == true ? 0 : -1);
|
|
|
|
g_hash_table_destroy(visited);
|
2019-04-24 18:41:46 +03:00
|
|
|
}
|
2021-04-28 18:17:46 +03:00
|
|
|
|
2019-04-24 18:41:46 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
error_propagate(errp, local_err);
|
2021-11-15 17:53:59 +03:00
|
|
|
bdrv_child_free(new_child);
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
return NULL;
|
2019-04-24 18:41:46 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:46 +03:00
|
|
|
bdrv_ref(child_bs);
|
2022-11-18 20:41:09 +03:00
|
|
|
/*
|
|
|
|
* Let every new BdrvChild start with a drained parent. Inserting the child
|
|
|
|
* in the graph with bdrv_replace_child_noperm() will undrain it if
|
|
|
|
* @child_bs is not drained.
|
|
|
|
*
|
|
|
|
* The child was only just created and is not yet visible in global state
|
|
|
|
* until bdrv_replace_child_noperm() inserts it into the graph, so nobody
|
|
|
|
* could have sent requests and polling is not necessary.
|
|
|
|
*
|
|
|
|
* Note that this means that the parent isn't fully drained yet, we only
|
|
|
|
* stop new requests from coming in. This is fine, we don't care about the
|
|
|
|
* old requests here, they are not for this child. If another place enters a
|
|
|
|
* drain section for the same parent, but wants it to be fully quiesced, it
|
|
|
|
* will not run most of the the code in .drained_begin() again (which is not
|
|
|
|
* a problem, we already did this), but it will still poll until the parent
|
|
|
|
* is fully quiesced, so it will not be negatively affected either.
|
|
|
|
*/
|
2022-11-18 20:41:10 +03:00
|
|
|
bdrv_parent_drained_begin_single(new_child);
|
2022-07-26 23:11:31 +03:00
|
|
|
bdrv_replace_child_noperm(new_child, child_bs);
|
2015-06-15 14:24:19 +03:00
|
|
|
|
2021-04-28 18:17:46 +03:00
|
|
|
BdrvAttachChildCommonState *s = g_new(BdrvAttachChildCommonState, 1);
|
|
|
|
*s = (BdrvAttachChildCommonState) {
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
.child = new_child,
|
2021-04-28 18:17:46 +03:00
|
|
|
.old_parent_ctx = parent_ctx,
|
|
|
|
.old_child_ctx = child_ctx,
|
|
|
|
};
|
|
|
|
tran_add(tran, &bdrv_attach_child_common_drv, s);
|
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
return new_child;
|
2021-04-28 18:17:46 +03:00
|
|
|
}
|
|
|
|
|
2021-06-01 10:52:13 +03:00
|
|
|
/*
|
2021-06-10 14:25:45 +03:00
|
|
|
* Function doesn't update permissions, caller is responsible for this.
|
2021-06-01 10:52:13 +03:00
|
|
|
*/
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
static BdrvChild *bdrv_attach_child_noperm(BlockDriverState *parent_bs,
|
|
|
|
BlockDriverState *child_bs,
|
|
|
|
const char *child_name,
|
|
|
|
const BdrvChildClass *child_class,
|
|
|
|
BdrvChildRole child_role,
|
|
|
|
Transaction *tran,
|
|
|
|
Error **errp)
|
2021-04-28 18:17:47 +03:00
|
|
|
{
|
|
|
|
uint64_t perm, shared_perm;
|
|
|
|
|
|
|
|
assert(parent_bs->drv);
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-04-28 18:17:47 +03:00
|
|
|
|
2021-10-18 16:47:14 +03:00
|
|
|
if (bdrv_recurse_has_child(child_bs, parent_bs)) {
|
|
|
|
error_setg(errp, "Making '%s' a %s child of '%s' would create a cycle",
|
|
|
|
child_bs->node_name, child_name, parent_bs->node_name);
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
return NULL;
|
2021-10-18 16:47:14 +03:00
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:47 +03:00
|
|
|
bdrv_get_cumulative_perm(parent_bs, &perm, &shared_perm);
|
|
|
|
bdrv_child_perm(parent_bs, child_bs, NULL, child_role, NULL,
|
|
|
|
perm, shared_perm, &perm, &shared_perm);
|
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
return bdrv_attach_child_common(child_bs, child_name, child_class,
|
|
|
|
child_role, perm, shared_perm, parent_bs,
|
|
|
|
tran, errp);
|
2021-04-28 18:17:47 +03:00
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:46 +03:00
|
|
|
/*
|
|
|
|
* This function steals the reference to child_bs from the caller.
|
|
|
|
* That reference is later dropped by bdrv_root_unref_child().
|
|
|
|
*
|
|
|
|
* On failure NULL is returned, errp is set and the reference to
|
|
|
|
* child_bs is also dropped.
|
|
|
|
*
|
|
|
|
* The caller must hold the AioContext lock @child_bs, but not that of @ctx
|
|
|
|
* (unless @child_bs is already in @ctx).
|
|
|
|
*/
|
|
|
|
BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
|
|
|
|
const char *child_name,
|
|
|
|
const BdrvChildClass *child_class,
|
|
|
|
BdrvChildRole child_role,
|
|
|
|
uint64_t perm, uint64_t shared_perm,
|
|
|
|
void *opaque, Error **errp)
|
|
|
|
{
|
|
|
|
int ret;
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
BdrvChild *child;
|
2021-04-28 18:17:46 +03:00
|
|
|
Transaction *tran = tran_new();
|
|
|
|
|
2022-03-03 18:15:57 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
child = bdrv_attach_child_common(child_bs, child_name, child_class,
|
2021-04-28 18:17:46 +03:00
|
|
|
child_role, perm, shared_perm, opaque,
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
tran, errp);
|
|
|
|
if (!child) {
|
|
|
|
ret = -EINVAL;
|
2021-05-03 14:05:54 +03:00
|
|
|
goto out;
|
2021-04-28 18:17:46 +03:00
|
|
|
}
|
|
|
|
|
2022-11-07 19:35:57 +03:00
|
|
|
ret = bdrv_refresh_perms(child_bs, tran, errp);
|
2021-04-28 18:17:46 +03:00
|
|
|
|
2021-05-03 14:05:54 +03:00
|
|
|
out:
|
|
|
|
tran_finalize(tran, ret);
|
2021-06-01 10:52:13 +03:00
|
|
|
|
2021-04-28 18:17:46 +03:00
|
|
|
bdrv_unref(child_bs);
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
|
|
|
|
return ret < 0 ? NULL : child;
|
2015-06-15 12:53:47 +03:00
|
|
|
}
|
|
|
|
|
2019-05-13 16:46:18 +03:00
|
|
|
/*
|
|
|
|
* This function transfers the reference to child_bs from the caller
|
|
|
|
* to parent_bs. That reference is later dropped by parent_bs on
|
|
|
|
* bdrv_close() or if someone calls bdrv_unref_child().
|
|
|
|
*
|
|
|
|
* On failure NULL is returned, errp is set and the reference to
|
|
|
|
* child_bs is also dropped.
|
2019-04-24 18:41:46 +03:00
|
|
|
*
|
|
|
|
* If @parent_bs and @child_bs are in different AioContexts, the caller must
|
|
|
|
* hold the AioContext lock for @child_bs, but not for @parent_bs.
|
2019-05-13 16:46:18 +03:00
|
|
|
*/
|
2016-05-10 10:36:38 +03:00
|
|
|
BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
|
|
|
|
BlockDriverState *child_bs,
|
|
|
|
const char *child_name,
|
2020-05-13 14:05:13 +03:00
|
|
|
const BdrvChildClass *child_class,
|
2020-05-13 14:05:15 +03:00
|
|
|
BdrvChildRole child_role,
|
2016-12-21 00:21:17 +03:00
|
|
|
Error **errp)
|
2016-03-08 15:47:46 +03:00
|
|
|
{
|
2021-04-28 18:17:47 +03:00
|
|
|
int ret;
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
BdrvChild *child;
|
2021-04-28 18:17:47 +03:00
|
|
|
Transaction *tran = tran_new();
|
2016-12-20 17:51:12 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
child = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
|
|
|
|
child_class, child_role, tran, errp);
|
|
|
|
if (!child) {
|
|
|
|
ret = -EINVAL;
|
2021-04-28 18:17:47 +03:00
|
|
|
goto out;
|
|
|
|
}
|
2016-12-14 19:24:36 +03:00
|
|
|
|
2022-11-07 19:35:57 +03:00
|
|
|
ret = bdrv_refresh_perms(parent_bs, tran, errp);
|
2021-04-28 18:17:47 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
goto out;
|
2016-12-14 19:24:36 +03:00
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:47 +03:00
|
|
|
out:
|
|
|
|
tran_finalize(tran, ret);
|
|
|
|
|
|
|
|
bdrv_unref(child_bs);
|
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
return ret < 0 ? NULL : child;
|
2016-03-08 15:47:46 +03:00
|
|
|
}
|
|
|
|
|
2019-06-12 17:07:11 +03:00
|
|
|
/* Callers must ensure that child->frozen is false. */
|
2016-03-08 15:47:46 +03:00
|
|
|
void bdrv_root_unref_child(BdrvChild *child)
|
2015-06-15 14:51:04 +03:00
|
|
|
{
|
2022-11-07 19:35:55 +03:00
|
|
|
BlockDriverState *child_bs = child->bs;
|
2015-10-13 15:09:44 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2022-11-07 19:35:55 +03:00
|
|
|
bdrv_replace_child_noperm(child, NULL);
|
|
|
|
bdrv_child_free(child);
|
|
|
|
|
|
|
|
if (child_bs) {
|
|
|
|
/*
|
|
|
|
* Update permissions for old node. We're just taking a parent away, so
|
|
|
|
* we're loosening restrictions. Errors of permission update are not
|
|
|
|
* fatal in this case, ignore them.
|
|
|
|
*/
|
2022-11-07 19:35:57 +03:00
|
|
|
bdrv_refresh_perms(child_bs, NULL, NULL);
|
2022-11-07 19:35:55 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* When the parent requiring a non-default AioContext is removed, the
|
|
|
|
* node moves back to the main AioContext
|
|
|
|
*/
|
|
|
|
bdrv_try_change_aio_context(child_bs, qemu_get_aio_context(), NULL,
|
|
|
|
NULL);
|
|
|
|
}
|
2022-03-03 18:15:49 +03:00
|
|
|
|
2016-03-08 15:47:46 +03:00
|
|
|
bdrv_unref(child_bs);
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:54 +03:00
|
|
|
typedef struct BdrvSetInheritsFrom {
|
|
|
|
BlockDriverState *bs;
|
|
|
|
BlockDriverState *old_inherits_from;
|
|
|
|
} BdrvSetInheritsFrom;
|
|
|
|
|
|
|
|
static void bdrv_set_inherits_from_abort(void *opaque)
|
|
|
|
{
|
|
|
|
BdrvSetInheritsFrom *s = opaque;
|
|
|
|
|
|
|
|
s->bs->inherits_from = s->old_inherits_from;
|
|
|
|
}
|
|
|
|
|
|
|
|
static TransactionActionDrv bdrv_set_inherits_from_drv = {
|
|
|
|
.abort = bdrv_set_inherits_from_abort,
|
|
|
|
.clean = g_free,
|
|
|
|
};
|
|
|
|
|
|
|
|
/* @tran is allowed to be NULL. In this case no rollback is possible */
|
|
|
|
static void bdrv_set_inherits_from(BlockDriverState *bs,
|
|
|
|
BlockDriverState *new_inherits_from,
|
|
|
|
Transaction *tran)
|
|
|
|
{
|
|
|
|
if (tran) {
|
|
|
|
BdrvSetInheritsFrom *s = g_new(BdrvSetInheritsFrom, 1);
|
|
|
|
|
|
|
|
*s = (BdrvSetInheritsFrom) {
|
|
|
|
.bs = bs,
|
|
|
|
.old_inherits_from = bs->inherits_from,
|
|
|
|
};
|
|
|
|
|
|
|
|
tran_add(tran, &bdrv_set_inherits_from_drv, s);
|
|
|
|
}
|
|
|
|
|
|
|
|
bs->inherits_from = new_inherits_from;
|
|
|
|
}
|
|
|
|
|
2019-07-03 20:28:07 +03:00
|
|
|
/**
|
|
|
|
* Clear all inherits_from pointers from children and grandchildren of
|
|
|
|
* @root that point to @root, where necessary.
|
2021-04-28 18:17:54 +03:00
|
|
|
* @tran is allowed to be NULL. In this case no rollback is possible
|
2019-07-03 20:28:07 +03:00
|
|
|
*/
|
2021-04-28 18:17:54 +03:00
|
|
|
static void bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
|
|
|
|
Transaction *tran)
|
2016-03-08 15:47:46 +03:00
|
|
|
{
|
2019-07-03 20:28:07 +03:00
|
|
|
BdrvChild *c;
|
2016-12-16 20:52:37 +03:00
|
|
|
|
2019-07-03 20:28:07 +03:00
|
|
|
if (child->bs->inherits_from == root) {
|
|
|
|
/*
|
|
|
|
* Remove inherits_from only when the last reference between root and
|
|
|
|
* child->bs goes away.
|
|
|
|
*/
|
|
|
|
QLIST_FOREACH(c, &root->children, next) {
|
2016-12-16 20:52:37 +03:00
|
|
|
if (c != child && c->bs == child->bs) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (c == NULL) {
|
2021-04-28 18:17:54 +03:00
|
|
|
bdrv_set_inherits_from(child->bs, NULL, tran);
|
2016-12-16 20:52:37 +03:00
|
|
|
}
|
2015-06-15 14:51:04 +03:00
|
|
|
}
|
|
|
|
|
2019-07-03 20:28:07 +03:00
|
|
|
QLIST_FOREACH(c, &child->bs->children, next) {
|
2021-04-28 18:17:54 +03:00
|
|
|
bdrv_unset_inherits_from(root, c, tran);
|
2019-07-03 20:28:07 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-06-12 17:07:11 +03:00
|
|
|
/* Callers must ensure that child->frozen is false. */
|
2019-07-03 20:28:07 +03:00
|
|
|
void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2019-07-03 20:28:07 +03:00
|
|
|
if (child == NULL) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:54 +03:00
|
|
|
bdrv_unset_inherits_from(parent, child, NULL);
|
2016-03-08 15:47:46 +03:00
|
|
|
bdrv_root_unref_child(child);
|
2015-06-15 14:51:04 +03:00
|
|
|
}
|
|
|
|
|
2016-02-24 17:13:35 +03:00
|
|
|
|
|
|
|
static void bdrv_parent_cb_change_media(BlockDriverState *bs, bool load)
|
|
|
|
{
|
|
|
|
BdrvChild *c;
|
2022-03-03 18:16:13 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2016-02-24 17:13:35 +03:00
|
|
|
QLIST_FOREACH(c, &bs->parents, next_parent) {
|
2020-05-13 14:05:13 +03:00
|
|
|
if (c->klass->change_media) {
|
|
|
|
c->klass->change_media(c, load);
|
2016-02-24 17:13:35 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
block: Update BlockDriverState.inherits_from on bdrv_set_backing_hd()
When a BlockDriverState's child is opened (be it a backing file, the
protocol layer, or any other) inherits_from is set to point to the
parent node. Children opened separately and then attached to a parent
don't have this pointer set.
bdrv_reopen_queue_child() uses this to determine whether a node's
children must also be reopened inheriting the options from the parent
or not. If inherits_from points to the parent then the child is
reopened and its options can be changed, like in this example:
$ qemu-img create -f qcow2 hd0.qcow2 1M
$ qemu-img create -f qcow2 hd1.qcow2 1M
$ $QEMU -drive if=none,node-name=hd0,file=hd0.qcow2,\
backing.driver=qcow2,backing.file.filename=hd1.qcow2
(qemu) qemu-io hd0 "reopen -o backing.l2-cache-size=2M"
If the child does not inherit from the parent then it does not get
reopened and its options cannot be changed:
$ $QEMU -drive if=none,node-name=hd1,file=hd1.qcow2
-drive if=none,node-name=hd0,file=hd0.qcow2,backing=hd1
(qemu) qemu-io hd0 "reopen -o backing.l2-cache-size=2M"
Cannot change the option 'backing.l2-cache-size'
If a disk image has a chain of backing files then all of them are also
connected through their inherits_from pointers (i.e. it's possible to
walk the chain in reverse order from base to top).
However this is broken if the intermediate nodes are removed using
e.g. block-stream because the inherits_from pointer from the base node
becomes NULL:
$ qemu-img create -f qcow2 hd0.qcow2 1M
$ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
$ qemu-img create -f qcow2 -b hd1.qcow2 hd2.qcow2
$ $QEMU -drive if=none,file=hd2.qcow2
(qemu) qemu-io none0 "reopen -o backing.l2-cache-size=2M"
(qemu) block_stream none0 0 hd0.qcow2
(qemu) qemu-io none0 "reopen -o backing.l2-cache-size=2M"
Cannot change the option 'backing.l2-cache-size'
This patch updates the inherits_from pointer if the intermediate nodes
of a backing chain are removed using bdrv_set_backing_hd(), and adds a
test case for this scenario.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-31 19:16:37 +03:00
|
|
|
/* Return true if you can reach parent going through child->inherits_from
|
|
|
|
* recursively. If parent or child are NULL, return false */
|
|
|
|
static bool bdrv_inherits_from_recursive(BlockDriverState *child,
|
|
|
|
BlockDriverState *parent)
|
|
|
|
{
|
|
|
|
while (child && child != parent) {
|
|
|
|
child = child->inherits_from;
|
|
|
|
}
|
|
|
|
|
|
|
|
return child != NULL;
|
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:33 +03:00
|
|
|
/*
|
|
|
|
* Return the BdrvChildRole for @bs's backing child. bs->backing is
|
|
|
|
* mostly used for COW backing children (role = COW), but also for
|
|
|
|
* filtered children (role = FILTERED | PRIMARY).
|
|
|
|
*/
|
|
|
|
static BdrvChildRole bdrv_backing_role(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
if (bs->drv && bs->drv->is_filter) {
|
|
|
|
return BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY;
|
|
|
|
} else {
|
|
|
|
return BDRV_CHILD_COW;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-09-14 16:33:33 +03:00
|
|
|
/*
|
2021-06-10 15:05:30 +03:00
|
|
|
* Sets the bs->backing or bs->file link of a BDS. A new reference is created;
|
|
|
|
* callers which don't need their own reference any more must call bdrv_unref().
|
2021-06-10 14:25:45 +03:00
|
|
|
*
|
|
|
|
* Function doesn't update permissions, caller is responsible for this.
|
2015-09-14 16:33:33 +03:00
|
|
|
*/
|
2021-06-10 15:05:30 +03:00
|
|
|
static int bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
|
|
|
|
BlockDriverState *child_bs,
|
|
|
|
bool is_backing,
|
|
|
|
Transaction *tran, Error **errp)
|
2014-05-23 17:29:45 +04:00
|
|
|
{
|
2021-06-10 15:05:30 +03:00
|
|
|
bool update_inherits_from =
|
|
|
|
bdrv_inherits_from_recursive(child_bs, parent_bs);
|
|
|
|
BdrvChild *child = is_backing ? parent_bs->backing : parent_bs->file;
|
|
|
|
BdrvChildRole role;
|
block: Update BlockDriverState.inherits_from on bdrv_set_backing_hd()
When a BlockDriverState's child is opened (be it a backing file, the
protocol layer, or any other) inherits_from is set to point to the
parent node. Children opened separately and then attached to a parent
don't have this pointer set.
bdrv_reopen_queue_child() uses this to determine whether a node's
children must also be reopened inheriting the options from the parent
or not. If inherits_from points to the parent then the child is
reopened and its options can be changed, like in this example:
$ qemu-img create -f qcow2 hd0.qcow2 1M
$ qemu-img create -f qcow2 hd1.qcow2 1M
$ $QEMU -drive if=none,node-name=hd0,file=hd0.qcow2,\
backing.driver=qcow2,backing.file.filename=hd1.qcow2
(qemu) qemu-io hd0 "reopen -o backing.l2-cache-size=2M"
If the child does not inherit from the parent then it does not get
reopened and its options cannot be changed:
$ $QEMU -drive if=none,node-name=hd1,file=hd1.qcow2
-drive if=none,node-name=hd0,file=hd0.qcow2,backing=hd1
(qemu) qemu-io hd0 "reopen -o backing.l2-cache-size=2M"
Cannot change the option 'backing.l2-cache-size'
If a disk image has a chain of backing files then all of them are also
connected through their inherits_from pointers (i.e. it's possible to
walk the chain in reverse order from base to top).
However this is broken if the intermediate nodes are removed using
e.g. block-stream because the inherits_from pointer from the base node
becomes NULL:
$ qemu-img create -f qcow2 hd0.qcow2 1M
$ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
$ qemu-img create -f qcow2 -b hd1.qcow2 hd2.qcow2
$ $QEMU -drive if=none,file=hd2.qcow2
(qemu) qemu-io none0 "reopen -o backing.l2-cache-size=2M"
(qemu) block_stream none0 0 hd0.qcow2
(qemu) qemu-io none0 "reopen -o backing.l2-cache-size=2M"
Cannot change the option 'backing.l2-cache-size'
This patch updates the inherits_from pointer if the intermediate nodes
of a backing chain are removed using bdrv_set_backing_hd(), and adds a
test case for this scenario.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-31 19:16:37 +03:00
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2021-06-10 15:05:30 +03:00
|
|
|
if (!parent_bs->drv) {
|
|
|
|
/*
|
|
|
|
* Node without drv is an object without a class :/. TODO: finally fix
|
|
|
|
* qcow2 driver to never clear bs->drv and implement format corruption
|
|
|
|
* handling in other way.
|
|
|
|
*/
|
|
|
|
error_setg(errp, "Node corrupted");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (child && child->frozen) {
|
|
|
|
error_setg(errp, "Cannot change frozen '%s' link from '%s' to '%s'",
|
|
|
|
child->name, parent_bs->node_name, child->bs->node_name);
|
2021-02-02 15:49:43 +03:00
|
|
|
return -EPERM;
|
2019-03-12 19:48:40 +03:00
|
|
|
}
|
|
|
|
|
2021-06-10 15:05:34 +03:00
|
|
|
if (is_backing && !parent_bs->drv->is_filter &&
|
|
|
|
!parent_bs->drv->supports_backing)
|
|
|
|
{
|
|
|
|
error_setg(errp, "Driver '%s' of node '%s' does not support backing "
|
|
|
|
"files", parent_bs->drv->format_name, parent_bs->node_name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2021-06-10 15:05:30 +03:00
|
|
|
if (parent_bs->drv->is_filter) {
|
|
|
|
role = BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY;
|
|
|
|
} else if (is_backing) {
|
|
|
|
role = BDRV_CHILD_COW;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* We only can use same role as it is in existing child. We don't have
|
|
|
|
* infrastructure to determine role of file child in generic way
|
|
|
|
*/
|
|
|
|
if (!child) {
|
|
|
|
error_setg(errp, "Cannot set file child to format node without "
|
|
|
|
"file child");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
role = child->role;
|
2014-05-23 17:29:47 +04:00
|
|
|
}
|
|
|
|
|
2021-06-10 15:05:30 +03:00
|
|
|
if (child) {
|
|
|
|
bdrv_unset_inherits_from(parent_bs, child, tran);
|
2022-07-26 23:11:34 +03:00
|
|
|
bdrv_remove_child(child, tran);
|
2021-06-10 15:05:30 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!child_bs) {
|
2014-05-23 17:29:45 +04:00
|
|
|
goto out;
|
|
|
|
}
|
2017-02-17 22:42:32 +03:00
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
child = bdrv_attach_child_noperm(parent_bs, child_bs,
|
|
|
|
is_backing ? "backing" : "file",
|
|
|
|
&child_of_bds, role,
|
|
|
|
tran, errp);
|
|
|
|
if (!child) {
|
|
|
|
return -EINVAL;
|
2021-02-02 15:49:43 +03:00
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:56 +03:00
|
|
|
|
|
|
|
/*
|
2021-06-10 15:05:30 +03:00
|
|
|
* If inherits_from pointed recursively to bs then let's update it to
|
2021-04-28 18:17:56 +03:00
|
|
|
* point directly to bs (else it will become NULL).
|
|
|
|
*/
|
2021-02-02 15:49:43 +03:00
|
|
|
if (update_inherits_from) {
|
2021-06-10 15:05:30 +03:00
|
|
|
bdrv_set_inherits_from(child_bs, parent_bs, tran);
|
block: Update BlockDriverState.inherits_from on bdrv_set_backing_hd()
When a BlockDriverState's child is opened (be it a backing file, the
protocol layer, or any other) inherits_from is set to point to the
parent node. Children opened separately and then attached to a parent
don't have this pointer set.
bdrv_reopen_queue_child() uses this to determine whether a node's
children must also be reopened inheriting the options from the parent
or not. If inherits_from points to the parent then the child is
reopened and its options can be changed, like in this example:
$ qemu-img create -f qcow2 hd0.qcow2 1M
$ qemu-img create -f qcow2 hd1.qcow2 1M
$ $QEMU -drive if=none,node-name=hd0,file=hd0.qcow2,\
backing.driver=qcow2,backing.file.filename=hd1.qcow2
(qemu) qemu-io hd0 "reopen -o backing.l2-cache-size=2M"
If the child does not inherit from the parent then it does not get
reopened and its options cannot be changed:
$ $QEMU -drive if=none,node-name=hd1,file=hd1.qcow2
-drive if=none,node-name=hd0,file=hd0.qcow2,backing=hd1
(qemu) qemu-io hd0 "reopen -o backing.l2-cache-size=2M"
Cannot change the option 'backing.l2-cache-size'
If a disk image has a chain of backing files then all of them are also
connected through their inherits_from pointers (i.e. it's possible to
walk the chain in reverse order from base to top).
However this is broken if the intermediate nodes are removed using
e.g. block-stream because the inherits_from pointer from the base node
becomes NULL:
$ qemu-img create -f qcow2 hd0.qcow2 1M
$ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
$ qemu-img create -f qcow2 -b hd1.qcow2 hd2.qcow2
$ $QEMU -drive if=none,file=hd2.qcow2
(qemu) qemu-io none0 "reopen -o backing.l2-cache-size=2M"
(qemu) block_stream none0 0 hd0.qcow2
(qemu) qemu-io none0 "reopen -o backing.l2-cache-size=2M"
Cannot change the option 'backing.l2-cache-size'
This patch updates the inherits_from pointer if the intermediate nodes
of a backing chain are removed using bdrv_set_backing_hd(), and adds a
test case for this scenario.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-31 19:16:37 +03:00
|
|
|
}
|
2014-05-23 17:29:47 +04:00
|
|
|
|
2014-05-23 17:29:45 +04:00
|
|
|
out:
|
2023-05-04 14:57:50 +03:00
|
|
|
bdrv_graph_rdlock_main_loop();
|
2021-06-10 15:05:30 +03:00
|
|
|
bdrv_refresh_limits(parent_bs, tran, NULL);
|
2023-05-04 14:57:50 +03:00
|
|
|
bdrv_graph_rdunlock_main_loop();
|
2021-04-28 18:17:56 +03:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-06-10 15:05:30 +03:00
|
|
|
static int bdrv_set_backing_noperm(BlockDriverState *bs,
|
|
|
|
BlockDriverState *backing_hd,
|
|
|
|
Transaction *tran, Error **errp)
|
|
|
|
{
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-06-10 15:05:30 +03:00
|
|
|
return bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
|
|
|
|
}
|
|
|
|
|
2022-11-18 20:41:04 +03:00
|
|
|
int bdrv_set_backing_hd_drained(BlockDriverState *bs,
|
|
|
|
BlockDriverState *backing_hd,
|
|
|
|
Error **errp)
|
2021-04-28 18:17:56 +03:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
Transaction *tran = tran_new();
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2022-11-18 20:41:04 +03:00
|
|
|
assert(bs->quiesce_counter > 0);
|
2022-01-24 20:37:41 +03:00
|
|
|
|
2021-04-28 18:17:56 +03:00
|
|
|
ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2022-11-07 19:35:57 +03:00
|
|
|
ret = bdrv_refresh_perms(bs, tran, errp);
|
2021-04-28 18:17:56 +03:00
|
|
|
out:
|
|
|
|
tran_finalize(tran, ret);
|
2022-11-18 20:41:04 +03:00
|
|
|
return ret;
|
|
|
|
}
|
2021-02-02 15:49:43 +03:00
|
|
|
|
2022-11-18 20:41:04 +03:00
|
|
|
int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
|
|
|
bdrv_drained_begin(bs);
|
|
|
|
ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
|
2022-01-24 20:37:41 +03:00
|
|
|
bdrv_drained_end(bs);
|
|
|
|
|
2021-02-02 15:49:43 +03:00
|
|
|
return ret;
|
2014-05-23 17:29:45 +04:00
|
|
|
}
|
|
|
|
|
2013-03-28 18:29:24 +04:00
|
|
|
/*
|
|
|
|
* Opens the backing file for a BlockDriverState if not yet open
|
|
|
|
*
|
2015-01-16 20:23:41 +03:00
|
|
|
* bdref_key specifies the key for the image's BlockdevRef in the options QDict.
|
|
|
|
* That QDict has to be flattened; therefore, if the BlockdevRef is a QDict
|
|
|
|
* itself, all options starting with "${bdref_key}." are considered part of the
|
|
|
|
* BlockdevRef.
|
|
|
|
*
|
|
|
|
* TODO Can this be unified with bdrv_open_image()?
|
2013-03-28 18:29:24 +04:00
|
|
|
*/
|
2015-01-16 20:23:41 +03:00
|
|
|
int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
|
|
|
|
const char *bdref_key, Error **errp)
|
2012-10-18 18:49:17 +04:00
|
|
|
{
|
2019-02-01 22:29:15 +03:00
|
|
|
char *backing_filename = NULL;
|
2015-01-16 20:23:41 +03:00
|
|
|
char *bdref_key_dot;
|
|
|
|
const char *reference = NULL;
|
2014-04-25 15:27:34 +04:00
|
|
|
int ret = 0;
|
block: Add BDS.auto_backing_file
If the backing file is overridden, this most probably does change the
guest-visible data of a BDS. Therefore, we will need to consider this
in bdrv_refresh_filename().
To see whether it has been overridden, we might want to compare
bs->backing_file and bs->backing->bs->filename. However,
bs->backing_file is changed by bdrv_set_backing_hd() (which is just used
to change the backing child at runtime, without modifying the image
header), so bs->backing_file most of the time simply contains a copy of
bs->backing->bs->filename anyway, so it is useless for such a
comparison.
This patch adds an auto_backing_file BDS field which contains the
backing file path as indicated by the image header, which is not changed
by bdrv_set_backing_hd().
Because of bdrv_refresh_filename() magic, however, a BDS's filename may
differ from what has been specified during bdrv_open(). Then, the
comparison between bs->auto_backing_file and bs->backing->bs->filename
may fail even though bs->backing was opened from bs->auto_backing_file.
To mitigate this, we can copy the real BDS's filename (after the whole
bdrv_open() and bdrv_refresh_filename() process) into
bs->auto_backing_file, if we know the former has been opened based on
the latter. This is only possible if no options modifying the backing
file's behavior have been specified, though. To simplify things, this
patch only copies the filename from the backing file if no options have
been specified for it at all.
Furthermore, there are cases where an overlay is created by qemu which
already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We
do not need to worry about updating the overlay's bs->auto_backing_file
there, because we actually wrote a post-bdrv_refresh_filename() filename
into the image header.
So all in all, there will be false negatives where (as of a future
patch) bdrv_refresh_filename() will assume that the backing file differs
from what was specified in the image header, even though it really does
not. However, these cases should be limited to where (1) the user
actually did override something in the backing chain (e.g. by specifying
options for the backing file), or (2) the user executed a QMP command to
change some node's backing file (e.g. change-backing-file or
block-commit with @backing-file given) where the given filename does not
happen to coincide with qemu's idea of the backing BDS's filename.
Then again, (1) really is limited to -drive. With -blockdev or
blockdev-add, you have to adhere to the schema, so a user cannot give
partial "unimportant" options (e.g. by just setting backing.node-name
and leaving the rest to the image header). Therefore, trying to fix
this would mean trying to fix something for -drive only.
To improve on (2), we would need a full infrastructure to "canonicalize"
an arbitrary filename (+ options), so it can be compared against
another. That seems a bit over the top, considering that filenames
nowadays are there mostly for the user's entertainment.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:08 +03:00
|
|
|
bool implicit_backing = false;
|
2014-05-23 17:29:45 +04:00
|
|
|
BlockDriverState *backing_hd;
|
2015-01-16 20:23:41 +03:00
|
|
|
QDict *options;
|
|
|
|
QDict *tmp_parent_options = NULL;
|
2013-09-05 16:45:29 +04:00
|
|
|
Error *local_err = NULL;
|
2012-10-18 18:49:17 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2015-06-17 15:55:21 +03:00
|
|
|
if (bs->backing != NULL) {
|
2014-04-22 19:05:27 +04:00
|
|
|
goto free_exit;
|
2012-10-18 18:49:17 +04:00
|
|
|
}
|
|
|
|
|
2013-03-28 18:29:24 +04:00
|
|
|
/* NULL means an empty set of options */
|
2015-01-16 20:23:41 +03:00
|
|
|
if (parent_options == NULL) {
|
|
|
|
tmp_parent_options = qdict_new();
|
|
|
|
parent_options = tmp_parent_options;
|
2013-03-28 18:29:24 +04:00
|
|
|
}
|
|
|
|
|
2012-10-18 18:49:17 +04:00
|
|
|
bs->open_flags &= ~BDRV_O_NO_BACKING;
|
2015-01-16 20:23:41 +03:00
|
|
|
|
|
|
|
bdref_key_dot = g_strdup_printf("%s.", bdref_key);
|
|
|
|
qdict_extract_subqdict(parent_options, &options, bdref_key_dot);
|
|
|
|
g_free(bdref_key_dot);
|
|
|
|
|
block: Document -drive problematic code and bugs
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-30 20:43:12 +03:00
|
|
|
/*
|
|
|
|
* Caution: while qdict_get_try_str() is fine, getting non-string
|
|
|
|
* types would require more care. When @parent_options come from
|
|
|
|
* -blockdev or blockdev_add, its members are typed according to
|
|
|
|
* the QAPI schema, but when they come from -drive, they're all
|
|
|
|
* QString.
|
|
|
|
*/
|
2015-01-16 20:23:41 +03:00
|
|
|
reference = qdict_get_try_str(parent_options, bdref_key);
|
|
|
|
if (reference || qdict_haskey(options, "file.filename")) {
|
2019-02-01 22:29:15 +03:00
|
|
|
/* keep backing_filename NULL */
|
2013-04-12 22:27:07 +04:00
|
|
|
} else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(options);
|
2014-04-22 19:05:27 +04:00
|
|
|
goto free_exit;
|
2013-09-22 16:05:06 +04:00
|
|
|
} else {
|
block: Add BDS.auto_backing_file
If the backing file is overridden, this most probably does change the
guest-visible data of a BDS. Therefore, we will need to consider this
in bdrv_refresh_filename().
To see whether it has been overridden, we might want to compare
bs->backing_file and bs->backing->bs->filename. However,
bs->backing_file is changed by bdrv_set_backing_hd() (which is just used
to change the backing child at runtime, without modifying the image
header), so bs->backing_file most of the time simply contains a copy of
bs->backing->bs->filename anyway, so it is useless for such a
comparison.
This patch adds an auto_backing_file BDS field which contains the
backing file path as indicated by the image header, which is not changed
by bdrv_set_backing_hd().
Because of bdrv_refresh_filename() magic, however, a BDS's filename may
differ from what has been specified during bdrv_open(). Then, the
comparison between bs->auto_backing_file and bs->backing->bs->filename
may fail even though bs->backing was opened from bs->auto_backing_file.
To mitigate this, we can copy the real BDS's filename (after the whole
bdrv_open() and bdrv_refresh_filename() process) into
bs->auto_backing_file, if we know the former has been opened based on
the latter. This is only possible if no options modifying the backing
file's behavior have been specified, though. To simplify things, this
patch only copies the filename from the backing file if no options have
been specified for it at all.
Furthermore, there are cases where an overlay is created by qemu which
already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We
do not need to worry about updating the overlay's bs->auto_backing_file
there, because we actually wrote a post-bdrv_refresh_filename() filename
into the image header.
So all in all, there will be false negatives where (as of a future
patch) bdrv_refresh_filename() will assume that the backing file differs
from what was specified in the image header, even though it really does
not. However, these cases should be limited to where (1) the user
actually did override something in the backing chain (e.g. by specifying
options for the backing file), or (2) the user executed a QMP command to
change some node's backing file (e.g. change-backing-file or
block-commit with @backing-file given) where the given filename does not
happen to coincide with qemu's idea of the backing BDS's filename.
Then again, (1) really is limited to -drive. With -blockdev or
blockdev-add, you have to adhere to the schema, so a user cannot give
partial "unimportant" options (e.g. by just setting backing.node-name
and leaving the rest to the image header). Therefore, trying to fix
this would mean trying to fix something for -drive only.
To improve on (2), we would need a full infrastructure to "canonicalize"
an arbitrary filename (+ options), so it can be compared against
another. That seems a bit over the top, considering that filenames
nowadays are there mostly for the user's entertainment.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:08 +03:00
|
|
|
if (qdict_size(options) == 0) {
|
|
|
|
/* If the user specifies options that do not modify the
|
|
|
|
* backing file's behavior, we might still consider it the
|
|
|
|
* implicit backing file. But it's easier this way, and
|
|
|
|
* just specifying some of the backing BDS's options is
|
|
|
|
* only possible with -drive anyway (otherwise the QAPI
|
|
|
|
* schema forces the user to specify everything). */
|
|
|
|
implicit_backing = !strcmp(bs->auto_backing_file, bs->backing_file);
|
|
|
|
}
|
|
|
|
|
2019-02-01 22:29:15 +03:00
|
|
|
backing_filename = bdrv_get_full_backing_filename(bs, &local_err);
|
2014-11-26 19:20:26 +03:00
|
|
|
if (local_err) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
error_propagate(errp, local_err);
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(options);
|
2014-11-26 19:20:26 +03:00
|
|
|
goto free_exit;
|
|
|
|
}
|
2012-10-18 18:49:17 +04:00
|
|
|
}
|
|
|
|
|
2014-06-04 17:09:35 +04:00
|
|
|
if (!bs->drv || !bs->drv->supports_backing) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
error_setg(errp, "Driver doesn't support backing files");
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(options);
|
2014-06-04 17:09:35 +04:00
|
|
|
goto free_exit;
|
|
|
|
}
|
|
|
|
|
block: don't add 'driver' to options when referring to backing via node name
When referring to a backing file of an image via node name
bdrv_open_backing_file would add the 'driver' option to the option list
filling it with the backing format driver. This breaks construction of
the backing chain via -blockdev, as bdrv_open_inherit reports an error
if both 'reference' and 'options' are provided.
$ qemu-img create -f raw /tmp/backing.raw 64M
$ qemu-img create -f qcow2 -F raw -b /tmp/backing.raw /tmp/test.qcow2
$ qemu-system-x86_64 \
-blockdev driver=file,filename=/tmp/backing.raw,node-name=backing \
-blockdev driver=qcow2,file.driver=file,file.filename=/tmp/test.qcow2,node-name=root,backing=backing
qemu-system-x86_64: -blockdev driver=qcow2,file.driver=file,file.filename=/tmp/test.qcow2,node-name=root,backing=backing: Could not open backing file: Cannot reference an existing block device with additional options or a new filename
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-12 17:14:10 +03:00
|
|
|
if (!reference &&
|
|
|
|
bs->backing_format[0] != '\0' && !qdict_haskey(options, "driver")) {
|
2017-04-28 00:58:17 +03:00
|
|
|
qdict_put_str(options, "driver", bs->backing_format);
|
2012-10-18 18:49:17 +04:00
|
|
|
}
|
|
|
|
|
2019-02-01 22:29:15 +03:00
|
|
|
backing_hd = bdrv_open_inherit(backing_filename, reference, options, 0, bs,
|
2020-05-13 14:05:33 +03:00
|
|
|
&child_of_bds, bdrv_backing_role(bs), errp);
|
2016-05-17 17:41:31 +03:00
|
|
|
if (!backing_hd) {
|
2012-10-18 18:49:17 +04:00
|
|
|
bs->open_flags |= BDRV_O_NO_BACKING;
|
error: Use error_prepend() where it makes obvious sense
Done with this Coccinelle semantic patch
@@
expression FMT, E1, E2;
expression list ARGS;
@@
- error_setg(E1, FMT, ARGS, error_get_pretty(E2));
+ error_propagate(E1, E2);/*###*/
+ error_prepend(E1, FMT/*@@@*/, ARGS);
followed by manual cleanup, first because I can't figure out how to
make Coccinelle transform strings, and second to get rid of now
superfluous error_propagate().
We now use or propagate the original error whole instead of just its
message obtained with error_get_pretty(). This avoids suppressing its
hint (see commit 50b7b00), but I can't see how the errors touched in
this commit could come with hints. It also improves the message
printed with &error_abort when we screw up (see commit 1e9b65b).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-12-18 18:35:15 +03:00
|
|
|
error_prepend(errp, "Could not open backing file: ");
|
2016-05-17 17:41:31 +03:00
|
|
|
ret = -EINVAL;
|
2014-04-22 19:05:27 +04:00
|
|
|
goto free_exit;
|
2012-10-18 18:49:17 +04:00
|
|
|
}
|
2015-06-15 12:53:47 +03:00
|
|
|
|
block: Add BDS.auto_backing_file
If the backing file is overridden, this most probably does change the
guest-visible data of a BDS. Therefore, we will need to consider this
in bdrv_refresh_filename().
To see whether it has been overridden, we might want to compare
bs->backing_file and bs->backing->bs->filename. However,
bs->backing_file is changed by bdrv_set_backing_hd() (which is just used
to change the backing child at runtime, without modifying the image
header), so bs->backing_file most of the time simply contains a copy of
bs->backing->bs->filename anyway, so it is useless for such a
comparison.
This patch adds an auto_backing_file BDS field which contains the
backing file path as indicated by the image header, which is not changed
by bdrv_set_backing_hd().
Because of bdrv_refresh_filename() magic, however, a BDS's filename may
differ from what has been specified during bdrv_open(). Then, the
comparison between bs->auto_backing_file and bs->backing->bs->filename
may fail even though bs->backing was opened from bs->auto_backing_file.
To mitigate this, we can copy the real BDS's filename (after the whole
bdrv_open() and bdrv_refresh_filename() process) into
bs->auto_backing_file, if we know the former has been opened based on
the latter. This is only possible if no options modifying the backing
file's behavior have been specified, though. To simplify things, this
patch only copies the filename from the backing file if no options have
been specified for it at all.
Furthermore, there are cases where an overlay is created by qemu which
already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We
do not need to worry about updating the overlay's bs->auto_backing_file
there, because we actually wrote a post-bdrv_refresh_filename() filename
into the image header.
So all in all, there will be false negatives where (as of a future
patch) bdrv_refresh_filename() will assume that the backing file differs
from what was specified in the image header, even though it really does
not. However, these cases should be limited to where (1) the user
actually did override something in the backing chain (e.g. by specifying
options for the backing file), or (2) the user executed a QMP command to
change some node's backing file (e.g. change-backing-file or
block-commit with @backing-file given) where the given filename does not
happen to coincide with qemu's idea of the backing BDS's filename.
Then again, (1) really is limited to -drive. With -blockdev or
blockdev-add, you have to adhere to the schema, so a user cannot give
partial "unimportant" options (e.g. by just setting backing.node-name
and leaving the rest to the image header). Therefore, trying to fix
this would mean trying to fix something for -drive only.
To improve on (2), we would need a full infrastructure to "canonicalize"
an arbitrary filename (+ options), so it can be compared against
another. That seems a bit over the top, considering that filenames
nowadays are there mostly for the user's entertainment.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:08 +03:00
|
|
|
if (implicit_backing) {
|
|
|
|
bdrv_refresh_filename(backing_hd);
|
|
|
|
pstrcpy(bs->auto_backing_file, sizeof(bs->auto_backing_file),
|
|
|
|
backing_hd->filename);
|
|
|
|
}
|
|
|
|
|
2015-09-14 16:33:33 +03:00
|
|
|
/* Hook up the backing file link; drop our reference, bs owns the
|
|
|
|
* backing_hd reference now */
|
2021-02-02 15:49:47 +03:00
|
|
|
ret = bdrv_set_backing_hd(bs, backing_hd, errp);
|
2015-09-14 16:33:33 +03:00
|
|
|
bdrv_unref(backing_hd);
|
2021-02-02 15:49:47 +03:00
|
|
|
if (ret < 0) {
|
2017-02-17 22:42:32 +03:00
|
|
|
goto free_exit;
|
|
|
|
}
|
2014-01-08 23:43:25 +04:00
|
|
|
|
2015-01-16 20:23:41 +03:00
|
|
|
qdict_del(parent_options, bdref_key);
|
|
|
|
|
2014-04-22 19:05:27 +04:00
|
|
|
free_exit:
|
|
|
|
g_free(backing_filename);
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(tmp_parent_options);
|
2014-04-22 19:05:27 +04:00
|
|
|
return ret;
|
2012-10-18 18:49:17 +04:00
|
|
|
}
|
|
|
|
|
2017-02-17 19:43:59 +03:00
|
|
|
static BlockDriverState *
|
|
|
|
bdrv_open_child_bs(const char *filename, QDict *options, const char *bdref_key,
|
2020-05-13 14:05:13 +03:00
|
|
|
BlockDriverState *parent, const BdrvChildClass *child_class,
|
2020-05-13 14:05:17 +03:00
|
|
|
BdrvChildRole child_role, bool allow_none, Error **errp)
|
2013-12-20 22:28:11 +04:00
|
|
|
{
|
2017-02-17 19:43:59 +03:00
|
|
|
BlockDriverState *bs = NULL;
|
2013-12-20 22:28:11 +04:00
|
|
|
QDict *image_options;
|
|
|
|
char *bdref_key_dot;
|
|
|
|
const char *reference;
|
|
|
|
|
2020-05-13 14:05:13 +03:00
|
|
|
assert(child_class != NULL);
|
2014-02-18 21:33:05 +04:00
|
|
|
|
2013-12-20 22:28:11 +04:00
|
|
|
bdref_key_dot = g_strdup_printf("%s.", bdref_key);
|
|
|
|
qdict_extract_subqdict(options, &image_options, bdref_key_dot);
|
|
|
|
g_free(bdref_key_dot);
|
|
|
|
|
block: Document -drive problematic code and bugs
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-30 20:43:12 +03:00
|
|
|
/*
|
|
|
|
* Caution: while qdict_get_try_str() is fine, getting non-string
|
|
|
|
* types would require more care. When @options come from
|
|
|
|
* -blockdev or blockdev_add, its members are typed according to
|
|
|
|
* the QAPI schema, but when they come from -drive, they're all
|
|
|
|
* QString.
|
|
|
|
*/
|
2013-12-20 22:28:11 +04:00
|
|
|
reference = qdict_get_try_str(options, bdref_key);
|
|
|
|
if (!filename && !reference && !qdict_size(image_options)) {
|
2015-06-15 14:24:19 +03:00
|
|
|
if (!allow_none) {
|
2013-12-20 22:28:11 +04:00
|
|
|
error_setg(errp, "A block device must be specified for \"%s\"",
|
|
|
|
bdref_key);
|
|
|
|
}
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(image_options);
|
2013-12-20 22:28:11 +04:00
|
|
|
goto done;
|
|
|
|
}
|
|
|
|
|
2016-05-17 17:41:31 +03:00
|
|
|
bs = bdrv_open_inherit(filename, reference, image_options, 0,
|
2020-05-13 14:05:17 +03:00
|
|
|
parent, child_class, child_role, errp);
|
2016-05-17 17:41:31 +03:00
|
|
|
if (!bs) {
|
2015-06-15 12:53:47 +03:00
|
|
|
goto done;
|
|
|
|
}
|
|
|
|
|
2013-12-20 22:28:11 +04:00
|
|
|
done:
|
|
|
|
qdict_del(options, bdref_key);
|
2017-02-17 19:43:59 +03:00
|
|
|
return bs;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Opens a disk image whose options are given as BlockdevRef in another block
|
|
|
|
* device's options.
|
|
|
|
*
|
|
|
|
* If allow_none is true, no image will be opened if filename is false and no
|
|
|
|
* BlockdevRef is given. NULL will be returned, but errp remains unset.
|
|
|
|
*
|
|
|
|
* bdrev_key specifies the key for the image's BlockdevRef in the options QDict.
|
|
|
|
* That QDict has to be flattened; therefore, if the BlockdevRef is a QDict
|
|
|
|
* itself, all options starting with "${bdref_key}." are considered part of the
|
|
|
|
* BlockdevRef.
|
|
|
|
*
|
|
|
|
* The BlockdevRef will be removed from the options QDict.
|
|
|
|
*/
|
|
|
|
BdrvChild *bdrv_open_child(const char *filename,
|
|
|
|
QDict *options, const char *bdref_key,
|
|
|
|
BlockDriverState *parent,
|
2020-05-13 14:05:13 +03:00
|
|
|
const BdrvChildClass *child_class,
|
2020-05-13 14:05:15 +03:00
|
|
|
BdrvChildRole child_role,
|
2017-02-17 19:43:59 +03:00
|
|
|
bool allow_none, Error **errp)
|
|
|
|
{
|
|
|
|
BlockDriverState *bs;
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2020-05-13 14:05:13 +03:00
|
|
|
bs = bdrv_open_child_bs(filename, options, bdref_key, parent, child_class,
|
2020-05-13 14:05:17 +03:00
|
|
|
child_role, allow_none, errp);
|
2017-02-17 19:43:59 +03:00
|
|
|
if (bs == NULL) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:15 +03:00
|
|
|
return bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
|
|
|
|
errp);
|
2015-06-15 14:24:19 +03:00
|
|
|
}
|
|
|
|
|
2022-07-26 23:11:21 +03:00
|
|
|
/*
|
|
|
|
* Wrapper on bdrv_open_child() for most popular case: open primary child of bs.
|
|
|
|
*/
|
|
|
|
int bdrv_open_file_child(const char *filename,
|
|
|
|
QDict *options, const char *bdref_key,
|
|
|
|
BlockDriverState *parent, Error **errp)
|
|
|
|
{
|
|
|
|
BdrvChildRole role;
|
|
|
|
|
|
|
|
/* commit_top and mirror_top don't use this function */
|
|
|
|
assert(!parent->drv->filtered_child_is_backing);
|
|
|
|
role = parent->drv->is_filter ?
|
|
|
|
(BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY) : BDRV_CHILD_IMAGE;
|
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
if (!bdrv_open_child(filename, options, bdref_key, parent,
|
|
|
|
&child_of_bds, role, false, errp))
|
|
|
|
{
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2022-07-26 23:11:21 +03:00
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
return 0;
|
2022-07-26 23:11:21 +03:00
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:13 +03:00
|
|
|
/*
|
|
|
|
* TODO Future callers may need to specify parent/child_class in order for
|
|
|
|
* option inheritance to work. Existing callers use it for the root node.
|
|
|
|
*/
|
2018-01-10 17:52:33 +03:00
|
|
|
BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp)
|
|
|
|
{
|
|
|
|
BlockDriverState *bs = NULL;
|
|
|
|
QObject *obj = NULL;
|
|
|
|
QDict *qdict = NULL;
|
|
|
|
const char *reference = NULL;
|
|
|
|
Visitor *v = NULL;
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2018-01-10 17:52:33 +03:00
|
|
|
if (ref->type == QTYPE_QSTRING) {
|
|
|
|
reference = ref->u.reference;
|
|
|
|
} else {
|
|
|
|
BlockdevOptions *options = &ref->u.definition;
|
|
|
|
assert(ref->type == QTYPE_QDICT);
|
|
|
|
|
|
|
|
v = qobject_output_visitor_new(&obj);
|
2020-04-24 11:43:35 +03:00
|
|
|
visit_type_BlockdevOptions(v, NULL, &options, &error_abort);
|
2018-01-10 17:52:33 +03:00
|
|
|
visit_complete(v, &obj);
|
|
|
|
|
2018-02-24 18:40:29 +03:00
|
|
|
qdict = qobject_to(QDict, obj);
|
2018-01-10 17:52:33 +03:00
|
|
|
qdict_flatten(qdict);
|
|
|
|
|
|
|
|
/* bdrv_open_inherit() defaults to the values in bdrv_flags (for
|
|
|
|
* compatibility with other callers) rather than what we want as the
|
|
|
|
* real defaults. Apply the defaults here instead. */
|
|
|
|
qdict_set_default_str(qdict, BDRV_OPT_CACHE_DIRECT, "off");
|
|
|
|
qdict_set_default_str(qdict, BDRV_OPT_CACHE_NO_FLUSH, "off");
|
|
|
|
qdict_set_default_str(qdict, BDRV_OPT_READ_ONLY, "off");
|
block: Add auto-read-only option
If a management application builds the block graph node by node, the
protocol layer doesn't inherit its read-only option from the format
layer any more, so it must be set explicitly.
Backing files should work on read-only storage, but at the same time, a
block job like commit should be able to reopen them read-write if they
are on read-write storage. However, without option inheritance, reopen
only changes the read-only option for the root node (typically the
format layer), but not the protocol layer, so reopening fails (the
format layer wants to get write permissions, but the protocol layer is
still read-only).
A simple workaround for the problem in the management tool would be to
open the protocol layer always read-write and to make only the format
layer read-only for backing files. However, sometimes the file is
actually stored on read-only storage and we don't know whether the image
can be opened read-write (for example, for NBD it depends on the server
we're trying to connect to). This adds an option that makes QEMU try to
open the image read-write, but allows it to degrade to a read-only mode
without returning an error.
The documentation for this option is consciously phrased in a way that
allows QEMU to switch to a better model eventually: Instead of trying
when the image is first opened, making the read-only flag dynamic and
changing it automatically whenever the first BLK_PERM_WRITE user is
attached or the last one is detached would be much more useful
behaviour.
Unfortunately, this more useful behaviour is also a lot harder to
implement, and libvirt needs a solution now before it can switch to
-blockdev, so let's start with this easier approach for now.
Instead of adding a new auto-read-only option, turning the existing
read-only into an enum (with a bool alternate for compatibility) was
considered, but it complicated the implementation to the point that it
didn't seem to be worth it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-05 19:57:40 +03:00
|
|
|
qdict_set_default_str(qdict, BDRV_OPT_AUTO_READ_ONLY, "off");
|
|
|
|
|
2018-01-10 17:52:33 +03:00
|
|
|
}
|
|
|
|
|
2020-05-13 14:05:17 +03:00
|
|
|
bs = bdrv_open_inherit(NULL, reference, qdict, 0, NULL, NULL, 0, errp);
|
2018-01-10 17:52:33 +03:00
|
|
|
obj = NULL;
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(obj);
|
2018-01-10 17:52:33 +03:00
|
|
|
visit_free(v);
|
|
|
|
return bs;
|
|
|
|
}
|
|
|
|
|
block: Let bdrv_open_inherit() return the snapshot
If bdrv_open_inherit() creates a snapshot BDS and *pbs is NULL, that
snapshot BDS should be returned instead of the BDS under it.
This has worked so far because (nearly) all users of BDRV_O_SNAPSHOT use
blk_new_open() to create the BDS tree. bdrv_append() (which is called by
bdrv_append_temp_snapshot()) redirects pointers from parents (i.e. the
BB in this case) to the newly appended child (i.e. the overlay),
therefore, while bdrv_open_inherit() did not return the root BDS, the BB
still pointed to it.
The only instance where BDRV_O_SNAPSHOT is used but blk_new_open() is
not is in blockdev_init() if no BDS tree is created, and instead
blk_new() is used and the flags are stored in the BB root state.
However, qmp_blockdev_change_medium() filters the BDRV_O_SNAPSHOT flag
before invoking bdrv_open(), so it will not have any effect.
In any case, it would be nicer if bdrv_open_inherit() could just always
return the root of the BDS tree that has been created.
To this end, bdrv_append_temp_snapshot() now returns the snapshot BDS
instead of just appending it on top of the snapshotted BDS. Also, it
calls bdrv_ref() before bdrv_append() (which bdrv_open_inherit() has to
undo if not returning the overlay).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-17 17:41:27 +03:00
|
|
|
static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
|
|
|
|
int flags,
|
|
|
|
QDict *snapshot_options,
|
|
|
|
Error **errp)
|
2014-04-03 14:09:34 +04:00
|
|
|
{
|
2022-10-10 07:04:31 +03:00
|
|
|
g_autofree char *tmp_filename = NULL;
|
2014-04-03 14:09:34 +04:00
|
|
|
int64_t total_size;
|
2014-06-05 13:20:51 +04:00
|
|
|
QemuOpts *opts = NULL;
|
2017-04-28 00:58:18 +03:00
|
|
|
BlockDriverState *bs_snapshot = NULL;
|
2014-04-03 14:09:34 +04:00
|
|
|
int ret;
|
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2014-04-03 14:09:34 +04:00
|
|
|
/* if snapshot, we create a temporary backing file and open it
|
|
|
|
instead of opening 'filename' directly */
|
|
|
|
|
|
|
|
/* Get the required size from the image */
|
2014-04-04 19:07:19 +04:00
|
|
|
total_size = bdrv_getlength(bs);
|
|
|
|
if (total_size < 0) {
|
|
|
|
error_setg_errno(errp, -total_size, "Could not get image size");
|
2014-04-22 19:05:27 +04:00
|
|
|
goto out;
|
2014-04-04 19:07:19 +04:00
|
|
|
}
|
2014-04-03 14:09:34 +04:00
|
|
|
|
|
|
|
/* Create the temporary image */
|
2022-10-10 07:04:31 +03:00
|
|
|
tmp_filename = create_tmp_file(errp);
|
|
|
|
if (!tmp_filename) {
|
2014-04-22 19:05:27 +04:00
|
|
|
goto out;
|
2014-04-03 14:09:34 +04:00
|
|
|
}
|
|
|
|
|
2014-12-02 20:32:42 +03:00
|
|
|
opts = qemu_opts_create(bdrv_qcow2.create_opts, NULL, 0,
|
2014-06-05 13:21:11 +04:00
|
|
|
&error_abort);
|
2015-02-12 18:46:36 +03:00
|
|
|
qemu_opt_set_number(opts, BLOCK_OPT_SIZE, total_size, &error_abort);
|
error: Use error_prepend() where it makes obvious sense
Done with this Coccinelle semantic patch
@@
expression FMT, E1, E2;
expression list ARGS;
@@
- error_setg(E1, FMT, ARGS, error_get_pretty(E2));
+ error_propagate(E1, E2);/*###*/
+ error_prepend(E1, FMT/*@@@*/, ARGS);
followed by manual cleanup, first because I can't figure out how to
make Coccinelle transform strings, and second to get rid of now
superfluous error_propagate().
We now use or propagate the original error whole instead of just its
message obtained with error_get_pretty(). This avoids suppressing its
hint (see commit 50b7b00), but I can't see how the errors touched in
this commit could come with hints. It also improves the message
printed with &error_abort when we screw up (see commit 1e9b65b).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-12-18 18:35:15 +03:00
|
|
|
ret = bdrv_create(&bdrv_qcow2, tmp_filename, opts, errp);
|
2014-06-05 13:20:51 +04:00
|
|
|
qemu_opts_del(opts);
|
2014-04-03 14:09:34 +04:00
|
|
|
if (ret < 0) {
|
error: Use error_prepend() where it makes obvious sense
Done with this Coccinelle semantic patch
@@
expression FMT, E1, E2;
expression list ARGS;
@@
- error_setg(E1, FMT, ARGS, error_get_pretty(E2));
+ error_propagate(E1, E2);/*###*/
+ error_prepend(E1, FMT/*@@@*/, ARGS);
followed by manual cleanup, first because I can't figure out how to
make Coccinelle transform strings, and second to get rid of now
superfluous error_propagate().
We now use or propagate the original error whole instead of just its
message obtained with error_get_pretty(). This avoids suppressing its
hint (see commit 50b7b00), but I can't see how the errors touched in
this commit could come with hints. It also improves the message
printed with &error_abort when we screw up (see commit 1e9b65b).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-12-18 18:35:15 +03:00
|
|
|
error_prepend(errp, "Could not create temporary overlay '%s': ",
|
|
|
|
tmp_filename);
|
2014-04-22 19:05:27 +04:00
|
|
|
goto out;
|
2014-04-03 14:09:34 +04:00
|
|
|
}
|
|
|
|
|
2016-03-07 15:02:15 +03:00
|
|
|
/* Prepare options QDict for the temporary file */
|
2017-04-28 00:58:17 +03:00
|
|
|
qdict_put_str(snapshot_options, "file.driver", "file");
|
|
|
|
qdict_put_str(snapshot_options, "file.filename", tmp_filename);
|
|
|
|
qdict_put_str(snapshot_options, "driver", "qcow2");
|
2014-04-03 14:09:34 +04:00
|
|
|
|
2016-05-17 17:41:31 +03:00
|
|
|
bs_snapshot = bdrv_open(NULL, NULL, snapshot_options, flags, errp);
|
2016-03-07 15:02:15 +03:00
|
|
|
snapshot_options = NULL;
|
2016-05-17 17:41:31 +03:00
|
|
|
if (!bs_snapshot) {
|
2014-04-22 19:05:27 +04:00
|
|
|
goto out;
|
2014-04-03 14:09:34 +04:00
|
|
|
}
|
|
|
|
|
2021-02-02 15:49:44 +03:00
|
|
|
ret = bdrv_append(bs_snapshot, bs, errp);
|
|
|
|
if (ret < 0) {
|
2017-04-28 00:58:18 +03:00
|
|
|
bs_snapshot = NULL;
|
2017-02-20 14:46:42 +03:00
|
|
|
goto out;
|
|
|
|
}
|
2014-04-22 19:05:27 +04:00
|
|
|
|
|
|
|
out:
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(snapshot_options);
|
2017-04-28 00:58:18 +03:00
|
|
|
return bs_snapshot;
|
2014-04-03 14:09:34 +04:00
|
|
|
}
|
|
|
|
|
2010-04-12 18:37:13 +04:00
|
|
|
/*
|
|
|
|
* Opens a disk image (raw, qcow2, vmdk, ...)
|
2013-03-15 13:35:02 +04:00
|
|
|
*
|
|
|
|
* options is a QDict of options to pass to the block drivers, or NULL for an
|
|
|
|
* empty set of options. The reference to the QDict belongs to the block layer
|
|
|
|
* after the call (even on failure), so if the caller intends to reuse the
|
2018-04-19 18:01:43 +03:00
|
|
|
* dictionary, it needs to use qobject_ref() before calling bdrv_open.
|
2014-02-18 21:33:05 +04:00
|
|
|
*
|
|
|
|
* If *pbs is NULL, a new BDS will be created with a pointer to it stored there.
|
|
|
|
* If it is not NULL, the referenced BDS will be reused.
|
2014-02-18 21:33:06 +04:00
|
|
|
*
|
|
|
|
* The reference parameter may be used to specify an existing block device which
|
|
|
|
* should be opened. If specified, neither options nor a filename may be given,
|
|
|
|
* nor can an existing BDS be reused (that is, *pbs has to be NULL).
|
2023-01-13 23:42:04 +03:00
|
|
|
*
|
|
|
|
* The caller must always hold @filename AioContext lock, because this
|
|
|
|
* function eventually calls bdrv_refresh_total_sectors() which polls
|
|
|
|
* when called from non-coroutine context.
|
2010-04-12 18:37:13 +04:00
|
|
|
*/
|
2023-01-26 20:24:32 +03:00
|
|
|
static BlockDriverState * no_coroutine_fn
|
|
|
|
bdrv_open_inherit(const char *filename, const char *reference, QDict *options,
|
|
|
|
int flags, BlockDriverState *parent,
|
|
|
|
const BdrvChildClass *child_class, BdrvChildRole child_role,
|
|
|
|
Error **errp)
|
2004-08-02 01:59:26 +04:00
|
|
|
{
|
2010-04-12 18:37:13 +04:00
|
|
|
int ret;
|
2017-02-17 20:39:24 +03:00
|
|
|
BlockBackend *file = NULL;
|
2015-06-16 15:19:22 +03:00
|
|
|
BlockDriverState *bs;
|
2015-08-26 20:47:50 +03:00
|
|
|
BlockDriver *drv = NULL;
|
2018-06-29 14:37:00 +03:00
|
|
|
BdrvChild *child;
|
2013-07-09 13:09:02 +04:00
|
|
|
const char *drvname;
|
2015-10-26 15:27:15 +03:00
|
|
|
const char *backing;
|
2013-09-05 16:45:29 +04:00
|
|
|
Error *local_err = NULL;
|
2016-03-07 15:02:15 +03:00
|
|
|
QDict *snapshot_options = NULL;
|
2014-05-06 14:11:42 +04:00
|
|
|
int snapshot_flags = 0;
|
2005-04-29 01:09:32 +04:00
|
|
|
|
2020-05-13 14:05:13 +03:00
|
|
|
assert(!child_class || !flags);
|
|
|
|
assert(!child_class == !parent);
|
2022-03-03 18:16:13 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2023-01-26 20:24:32 +03:00
|
|
|
assert(!qemu_in_coroutine());
|
2014-02-18 21:33:05 +04:00
|
|
|
|
2014-02-18 21:33:06 +04:00
|
|
|
if (reference) {
|
|
|
|
bool options_non_empty = options ? qdict_size(options) : false;
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(options);
|
2014-02-18 21:33:06 +04:00
|
|
|
|
|
|
|
if (filename || options_non_empty) {
|
|
|
|
error_setg(errp, "Cannot reference an existing block device with "
|
|
|
|
"additional options or a new filename");
|
2016-05-17 17:41:31 +03:00
|
|
|
return NULL;
|
2014-02-18 21:33:06 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
bs = bdrv_lookup_bs(reference, reference, errp);
|
|
|
|
if (!bs) {
|
2016-05-17 17:41:31 +03:00
|
|
|
return NULL;
|
2014-02-18 21:33:06 +04:00
|
|
|
}
|
2016-04-04 18:11:13 +03:00
|
|
|
|
2014-02-18 21:33:06 +04:00
|
|
|
bdrv_ref(bs);
|
2016-05-17 17:41:31 +03:00
|
|
|
return bs;
|
2014-02-18 21:33:06 +04:00
|
|
|
}
|
|
|
|
|
2016-05-17 17:41:31 +03:00
|
|
|
bs = bdrv_new();
|
2014-02-18 21:33:05 +04:00
|
|
|
|
2013-03-15 13:35:02 +04:00
|
|
|
/* NULL means an empty set of options */
|
|
|
|
if (options == NULL) {
|
|
|
|
options = qdict_new();
|
|
|
|
}
|
|
|
|
|
2015-05-08 17:15:03 +03:00
|
|
|
/* json: syntax counts as explicit options, as if in the QDict */
|
2015-10-29 17:24:41 +03:00
|
|
|
parse_json_protocol(options, &filename, &local_err);
|
|
|
|
if (local_err) {
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
2015-05-08 17:15:03 +03:00
|
|
|
bs->explicit_options = qdict_clone_shallow(options);
|
|
|
|
|
2020-05-13 14:05:13 +03:00
|
|
|
if (child_class) {
|
2020-05-13 14:05:18 +03:00
|
|
|
bool parent_is_format;
|
|
|
|
|
|
|
|
if (parent->drv) {
|
|
|
|
parent_is_format = parent->drv->is_format;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* parent->drv is not set yet because this node is opened for
|
|
|
|
* (potential) format probing. That means that @parent is going
|
|
|
|
* to be a format node.
|
|
|
|
*/
|
|
|
|
parent_is_format = true;
|
|
|
|
}
|
|
|
|
|
2015-04-09 19:47:50 +03:00
|
|
|
bs->inherits_from = parent;
|
2020-05-13 14:05:18 +03:00
|
|
|
child_class->inherit_options(child_role, parent_is_format,
|
|
|
|
&flags, options,
|
2020-05-13 14:05:13 +03:00
|
|
|
parent->open_flags, parent->options);
|
2015-04-08 14:43:47 +03:00
|
|
|
}
|
|
|
|
|
2015-10-29 17:24:41 +03:00
|
|
|
ret = bdrv_fill_options(&options, filename, &flags, &local_err);
|
2020-04-22 16:31:44 +03:00
|
|
|
if (ret < 0) {
|
2014-05-26 13:39:55 +04:00
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
block: Document -drive problematic code and bugs
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-30 20:43:12 +03:00
|
|
|
/*
|
|
|
|
* Set the BDRV_O_RDWR and BDRV_O_ALLOW_RDWR flags.
|
|
|
|
* Caution: getting a boolean member of @options requires care.
|
|
|
|
* When @options come from -blockdev or blockdev_add, members are
|
|
|
|
* typed according to the QAPI schema, but when they come from
|
|
|
|
* -drive, they're all QString.
|
|
|
|
*/
|
2016-09-15 17:53:02 +03:00
|
|
|
if (g_strcmp0(qdict_get_try_str(options, BDRV_OPT_READ_ONLY), "on") &&
|
|
|
|
!qdict_get_try_bool(options, BDRV_OPT_READ_ONLY, false)) {
|
|
|
|
flags |= (BDRV_O_RDWR | BDRV_O_ALLOW_RDWR);
|
|
|
|
} else {
|
|
|
|
flags &= ~BDRV_O_RDWR;
|
2016-09-15 17:53:00 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (flags & BDRV_O_SNAPSHOT) {
|
|
|
|
snapshot_options = qdict_new();
|
|
|
|
bdrv_temp_snapshot_options(&snapshot_flags, snapshot_options,
|
|
|
|
flags, options);
|
2016-09-15 17:53:02 +03:00
|
|
|
/* Let bdrv_backing_options() override "read-only" */
|
|
|
|
qdict_del(options, BDRV_OPT_READ_ONLY);
|
block: Use bdrv_inherited_options()
Let child_file's, child_format's, and child_backing's .inherit_options()
implementations fall back to bdrv_inherited_options() to show that it
would really work for all of these cases, if only the parents passed the
appropriate BdrvChildRole and parent_is_format values.
(Also, make bdrv_open_inherit(), the only place to explicitly call
bdrv_backing_options(), call bdrv_inherited_options() instead.)
This patch should incur only two visible changes, both for child_format
children, both of which are effectively bug fixes:
First, they no longer have discard=unmap set by default. This reason it
was set is because bdrv_inherited_fmt_options() fell through to
bdrv_protocol_options(), and that set it because "format drivers take
care to send flushes and respect unmap policy". None of the drivers
that use child_format for their children (quorum and blkverify) are
format drivers, though, so this reasoning does not apply here.
Second, they no longer have BDRV_O_NO_IO force-cleared. child_format
was used solely for children that do not store any metadata and as such
will not be accessed by their parents as long as those parents do not
receive I/O themselves. Thus, such children should inherit
BDRV_O_NO_IO.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-12-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-13 14:05:21 +03:00
|
|
|
bdrv_inherited_options(BDRV_CHILD_COW, true,
|
|
|
|
&flags, options, flags, options);
|
2016-09-15 17:53:00 +03:00
|
|
|
}
|
|
|
|
|
2015-04-24 17:38:02 +03:00
|
|
|
bs->open_flags = flags;
|
|
|
|
bs->options = options;
|
|
|
|
options = qdict_clone_shallow(options);
|
|
|
|
|
2014-06-04 16:19:44 +04:00
|
|
|
/* Find the right image format driver */
|
block: Document -drive problematic code and bugs
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-30 20:43:12 +03:00
|
|
|
/* See cautionary note on accessing @options above */
|
2014-06-04 16:19:44 +04:00
|
|
|
drvname = qdict_get_try_str(options, "driver");
|
|
|
|
if (drvname) {
|
|
|
|
drv = bdrv_find_format(drvname);
|
|
|
|
if (!drv) {
|
|
|
|
error_setg(errp, "Unknown driver: '%s'", drvname);
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(drvname || !(flags & BDRV_O_PROTOCOL));
|
|
|
|
|
block: Document -drive problematic code and bugs
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-30 20:43:12 +03:00
|
|
|
/* See cautionary note on accessing @options above */
|
2015-10-26 15:27:15 +03:00
|
|
|
backing = qdict_get_try_str(options, "backing");
|
2018-02-24 18:40:32 +03:00
|
|
|
if (qobject_to(QNull, qdict_get(options, "backing")) != NULL ||
|
|
|
|
(backing && *backing == '\0'))
|
|
|
|
{
|
2018-02-24 18:40:33 +03:00
|
|
|
if (backing) {
|
|
|
|
warn_report("Use of \"backing\": \"\" is deprecated; "
|
|
|
|
"use \"backing\": null instead");
|
|
|
|
}
|
2015-10-26 15:27:15 +03:00
|
|
|
flags |= BDRV_O_NO_BACKING;
|
2019-11-08 11:36:35 +03:00
|
|
|
qdict_del(bs->explicit_options, "backing");
|
|
|
|
qdict_del(bs->options, "backing");
|
2015-10-26 15:27:15 +03:00
|
|
|
qdict_del(options, "backing");
|
|
|
|
}
|
|
|
|
|
2017-02-17 20:39:24 +03:00
|
|
|
/* Open image file without format layer. This BlockBackend is only used for
|
2016-12-16 20:52:37 +03:00
|
|
|
* probing, the block drivers will do their own bdrv_open_child() for the
|
|
|
|
* same BDS, which is why we put the node name back into options. */
|
2014-06-03 18:44:19 +04:00
|
|
|
if ((flags & BDRV_O_PROTOCOL) == 0) {
|
2017-02-17 20:39:24 +03:00
|
|
|
BlockDriverState *file_bs;
|
|
|
|
|
|
|
|
file_bs = bdrv_open_child_bs(filename, options, "file", bs,
|
2020-05-13 14:05:37 +03:00
|
|
|
&child_of_bds, BDRV_CHILD_IMAGE,
|
|
|
|
true, &local_err);
|
2015-06-15 15:11:51 +03:00
|
|
|
if (local_err) {
|
2014-06-03 18:44:19 +04:00
|
|
|
goto fail;
|
|
|
|
}
|
2017-02-17 20:39:24 +03:00
|
|
|
if (file_bs != NULL) {
|
2017-11-20 16:59:13 +03:00
|
|
|
/* Not requesting BLK_PERM_CONSISTENT_READ because we're only
|
|
|
|
* looking at the header to guess the image format. This works even
|
|
|
|
* in cases where a guest would not see a consistent state. */
|
2019-04-25 15:25:10 +03:00
|
|
|
file = blk_new(bdrv_get_aio_context(file_bs), 0, BLK_PERM_ALL);
|
2017-01-13 21:02:32 +03:00
|
|
|
blk_insert_bs(file, file_bs, &local_err);
|
2017-02-17 20:39:24 +03:00
|
|
|
bdrv_unref(file_bs);
|
2017-01-13 21:02:32 +03:00
|
|
|
if (local_err) {
|
|
|
|
goto fail;
|
|
|
|
}
|
2017-02-17 20:39:24 +03:00
|
|
|
|
2017-04-28 00:58:17 +03:00
|
|
|
qdict_put_str(options, "file", bdrv_get_node_name(file_bs));
|
2016-12-16 20:52:37 +03:00
|
|
|
}
|
2012-11-12 20:35:27 +04:00
|
|
|
}
|
|
|
|
|
2014-06-04 16:19:44 +04:00
|
|
|
/* Image format probing */
|
raw: Prohibit dangerous writes for probed images
If the user neglects to specify the image format, QEMU probes the
image to guess it automatically, for convenience.
Relying on format probing is insecure for raw images (CVE-2008-2004).
If the guest writes a suitable header to the device, the next probe
will recognize a format chosen by the guest. A malicious guest can
abuse this to gain access to host files, e.g. by crafting a QCOW2
header with backing file /etc/shadow.
Commit 1e72d3b (April 2008) provided -drive parameter format to let
users disable probing. Commit f965509 (March 2009) extended QCOW2 to
optionally store the backing file format, to let users disable backing
file probing. QED has had a flag to suppress probing since the
beginning (2010), set whenever a raw backing file is assigned.
All of these additions that allow to avoid format probing have to be
specified explicitly. The default still allows the attack.
In order to fix this, commit 79368c8 (July 2010) put probed raw images
in a restricted mode, in which they wouldn't be able to overwrite the
first few bytes of the image so that they would identify as a different
image. If a write to the first sector would write one of the signatures
of another driver, qemu would instead zero out the first four bytes.
This patch was later reverted in commit 8b33d9e (September 2010) because
it didn't get the handling of unaligned qiov members right.
Today's block layer that is based on coroutines and has qiov utility
functions makes it much easier to get this functionality right, so this
patch implements it.
The other differences of this patch to the old one are that it doesn't
silently write something different than the guest requested by zeroing
out some bytes (it fails the request instead) and that it doesn't
maintain a list of signatures in the raw driver (it calls the usual
probe function instead).
Note that this change doesn't introduce new breakage for false positive
cases where the guest legitimately writes data into the first sector
that matches the signatures of an image format (e.g. for nested virt):
These cases were broken before, only the failure mode changes from
corruption after the next restart (when the wrong format is probed) to
failing the problematic write request.
Also note that like in the original patch, the restrictions only apply
if the image format has been guessed by probing. Explicitly specifying a
format allows guests to write anything they like.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-11-20 18:27:12 +03:00
|
|
|
bs->probed = !drv;
|
2014-06-04 16:19:44 +04:00
|
|
|
if (!drv && file) {
|
2016-06-20 19:24:02 +03:00
|
|
|
ret = find_image_format(file, filename, &drv, &local_err);
|
2014-05-27 12:50:29 +04:00
|
|
|
if (ret < 0) {
|
2014-04-11 21:16:36 +04:00
|
|
|
goto fail;
|
2013-12-20 22:28:10 +04:00
|
|
|
}
|
2015-04-24 17:38:02 +03:00
|
|
|
/*
|
|
|
|
* This option update would logically belong in bdrv_fill_options(),
|
|
|
|
* but we first need to open bs->file for the probing to work, while
|
|
|
|
* opening bs->file already requires the (mostly) final set of options
|
|
|
|
* so that cache mode etc. can be inherited.
|
|
|
|
*
|
|
|
|
* Adding the driver later is somewhat ugly, but it's not an option
|
|
|
|
* that would ever be inherited, so it's correct. We just need to make
|
|
|
|
* sure to update both bs->options (which has the full effective
|
|
|
|
* options for bs) and options (which has file.* already removed).
|
|
|
|
*/
|
2017-04-28 00:58:17 +03:00
|
|
|
qdict_put_str(bs->options, "driver", drv->format_name);
|
|
|
|
qdict_put_str(options, "driver", drv->format_name);
|
2014-06-04 16:19:44 +04:00
|
|
|
} else if (!drv) {
|
2014-05-27 12:50:29 +04:00
|
|
|
error_setg(errp, "Must specify either driver or file");
|
2014-04-11 21:16:36 +04:00
|
|
|
goto fail;
|
2004-08-02 01:59:26 +04:00
|
|
|
}
|
2010-04-12 18:37:13 +04:00
|
|
|
|
block: driver should override flags in bdrv_open()
The BDRV_O_PROTOCOL flag should have an impact only if no driver is
specified explicitly. Therefore, if bdrv_open() is called with an
explicit block driver argument (either through the options QDict or
through the drv parameter) and that block driver is a protocol block
driver, BDRV_O_PROTOCOL should be set; if it is a format block driver,
BDRV_O_PROTOCOL should be unset.
While there was code to unset the flag in case a format block driver
has been selected, it only followed the bdrv_fill_options() function
call whereas the flag in fact needs to be adjusted before it is used
there.
With that change, BDRV_O_PROTOCOL will always be set if the BDS should
be a protocol driver; if the driver has been specified explicitly, the
new code will set it; and bdrv_fill_options() will only "probe" a
protocol driver if BDRV_O_PROTOCOL is set. The probing after
bdrv_fill_options() cannot select a protocol driver.
Thus, bdrv_open_image() to open BDS.file is never called if a protocol
BDS is about to be created. With that change in turn it is impossible to
call bdrv_open_common() with a protocol drv and file != NULL, which
allows us to remove the bdrv_swap() call.
This change breaks a test case in qemu-iotest 051:
"-drive file=t.qcow2,file.driver=qcow2" now works because the explicitly
specified "qcow2" overrides the BDRV_O_PROTOCOL which is automatically
set for the "file" BDS (and the filename is just passed down).
Therefore, this patch removes that test case.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-19 21:53:16 +03:00
|
|
|
/* BDRV_O_PROTOCOL must be set iff a protocol BDS is about to be created */
|
|
|
|
assert(!!(flags & BDRV_O_PROTOCOL) == !!drv->bdrv_file_open);
|
|
|
|
/* file must be NULL if a protocol BDS is about to be created
|
|
|
|
* (the inverse results in an error message from bdrv_open_common()) */
|
|
|
|
assert(!(flags & BDRV_O_PROTOCOL) || !file);
|
|
|
|
|
2010-04-12 18:37:13 +04:00
|
|
|
/* Open the image */
|
2016-01-11 21:07:50 +03:00
|
|
|
ret = bdrv_open_common(bs, file, options, &local_err);
|
2010-04-12 18:37:13 +04:00
|
|
|
if (ret < 0) {
|
2014-04-11 21:16:36 +04:00
|
|
|
goto fail;
|
2010-01-20 20:13:25 +03:00
|
|
|
}
|
|
|
|
|
2016-12-16 20:52:37 +03:00
|
|
|
if (file) {
|
2017-02-17 20:39:24 +03:00
|
|
|
blk_unref(file);
|
2012-11-12 20:35:27 +04:00
|
|
|
file = NULL;
|
|
|
|
}
|
|
|
|
|
2010-04-12 18:37:13 +04:00
|
|
|
/* If there is a backing file, use it */
|
2012-10-18 18:49:17 +04:00
|
|
|
if ((flags & BDRV_O_NO_BACKING) == 0) {
|
2015-01-16 20:23:41 +03:00
|
|
|
ret = bdrv_open_backing_file(bs, options, "backing", &local_err);
|
2010-04-12 18:37:13 +04:00
|
|
|
if (ret < 0) {
|
2013-03-15 13:35:04 +04:00
|
|
|
goto close_and_fail;
|
2010-04-12 18:37:13 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
block: Remove child references from bs->{options,explicit_options}
Block drivers allow opening their children using a reference to an
existing BlockDriverState. These references remain stored in the
'options' and 'explicit_options' QDicts, but we don't need to keep
them once everything is open.
What is more important, these values can become wrong if the children
change:
$ qemu-img create -f qcow2 hd0.qcow2 10M
$ qemu-img create -f qcow2 hd1.qcow2 10M
$ qemu-img create -f qcow2 hd2.qcow2 10M
$ $QEMU -drive if=none,file=hd0.qcow2,node-name=hd0 \
-drive if=none,file=hd1.qcow2,node-name=hd1,backing=hd0 \
-drive file=hd2.qcow2,node-name=hd2,backing=hd1
After this hd2 has hd1 as its backing file. Now let's remove it using
block_stream:
(qemu) block_stream hd2 0 hd0.qcow2
Now hd0 is the backing file of hd2, but hd2's options QDicts still
contain backing=hd1.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-09-06 12:37:03 +03:00
|
|
|
/* Remove all children options and references
|
|
|
|
* from bs->options and bs->explicit_options */
|
2018-06-29 14:37:00 +03:00
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
|
|
|
char *child_key_dot;
|
|
|
|
child_key_dot = g_strdup_printf("%s.", child->name);
|
|
|
|
qdict_extract_subqdict(bs->explicit_options, NULL, child_key_dot);
|
|
|
|
qdict_extract_subqdict(bs->options, NULL, child_key_dot);
|
block: Remove child references from bs->{options,explicit_options}
Block drivers allow opening their children using a reference to an
existing BlockDriverState. These references remain stored in the
'options' and 'explicit_options' QDicts, but we don't need to keep
them once everything is open.
What is more important, these values can become wrong if the children
change:
$ qemu-img create -f qcow2 hd0.qcow2 10M
$ qemu-img create -f qcow2 hd1.qcow2 10M
$ qemu-img create -f qcow2 hd2.qcow2 10M
$ $QEMU -drive if=none,file=hd0.qcow2,node-name=hd0 \
-drive if=none,file=hd1.qcow2,node-name=hd1,backing=hd0 \
-drive file=hd2.qcow2,node-name=hd2,backing=hd1
After this hd2 has hd1 as its backing file. Now let's remove it using
block_stream:
(qemu) block_stream hd2 0 hd0.qcow2
Now hd0 is the backing file of hd2, but hd2's options QDicts still
contain backing=hd1.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-09-06 12:37:03 +03:00
|
|
|
qdict_del(bs->explicit_options, child->name);
|
|
|
|
qdict_del(bs->options, child->name);
|
2018-06-29 14:37:00 +03:00
|
|
|
g_free(child_key_dot);
|
|
|
|
}
|
|
|
|
|
2013-03-15 13:35:04 +04:00
|
|
|
/* Check if any unknown options were used */
|
2017-01-04 17:59:14 +03:00
|
|
|
if (qdict_size(options) != 0) {
|
2013-03-15 13:35:04 +04:00
|
|
|
const QDictEntry *entry = qdict_first(options);
|
2014-02-18 21:33:11 +04:00
|
|
|
if (flags & BDRV_O_PROTOCOL) {
|
|
|
|
error_setg(errp, "Block protocol '%s' doesn't support the option "
|
|
|
|
"'%s'", drv->format_name, entry->key);
|
|
|
|
} else {
|
2016-03-16 21:54:34 +03:00
|
|
|
error_setg(errp,
|
|
|
|
"Block format '%s' does not support the option '%s'",
|
|
|
|
drv->format_name, entry->key);
|
2014-02-18 21:33:11 +04:00
|
|
|
}
|
2013-03-15 13:35:04 +04:00
|
|
|
|
|
|
|
goto close_and_fail;
|
|
|
|
}
|
|
|
|
|
2017-06-23 19:24:16 +03:00
|
|
|
bdrv_parent_cb_change_media(bs, true);
|
2010-04-12 18:37:13 +04:00
|
|
|
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(options);
|
2018-09-06 17:25:41 +03:00
|
|
|
options = NULL;
|
2015-06-18 15:09:57 +03:00
|
|
|
|
|
|
|
/* For snapshot=on, create a temporary qcow2 overlay. bs points to the
|
|
|
|
* temporary snapshot afterwards. */
|
|
|
|
if (snapshot_flags) {
|
block: Let bdrv_open_inherit() return the snapshot
If bdrv_open_inherit() creates a snapshot BDS and *pbs is NULL, that
snapshot BDS should be returned instead of the BDS under it.
This has worked so far because (nearly) all users of BDRV_O_SNAPSHOT use
blk_new_open() to create the BDS tree. bdrv_append() (which is called by
bdrv_append_temp_snapshot()) redirects pointers from parents (i.e. the
BB in this case) to the newly appended child (i.e. the overlay),
therefore, while bdrv_open_inherit() did not return the root BDS, the BB
still pointed to it.
The only instance where BDRV_O_SNAPSHOT is used but blk_new_open() is
not is in blockdev_init() if no BDS tree is created, and instead
blk_new() is used and the flags are stored in the BB root state.
However, qmp_blockdev_change_medium() filters the BDRV_O_SNAPSHOT flag
before invoking bdrv_open(), so it will not have any effect.
In any case, it would be nicer if bdrv_open_inherit() could just always
return the root of the BDS tree that has been created.
To this end, bdrv_append_temp_snapshot() now returns the snapshot BDS
instead of just appending it on top of the snapshotted BDS. Also, it
calls bdrv_ref() before bdrv_append() (which bdrv_open_inherit() has to
undo if not returning the overlay).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-17 17:41:27 +03:00
|
|
|
BlockDriverState *snapshot_bs;
|
|
|
|
snapshot_bs = bdrv_append_temp_snapshot(bs, snapshot_flags,
|
|
|
|
snapshot_options, &local_err);
|
2016-03-07 15:02:15 +03:00
|
|
|
snapshot_options = NULL;
|
2015-06-18 15:09:57 +03:00
|
|
|
if (local_err) {
|
|
|
|
goto close_and_fail;
|
|
|
|
}
|
2016-05-17 17:41:31 +03:00
|
|
|
/* We are not going to return bs but the overlay on top of it
|
|
|
|
* (snapshot_bs); thus, we have to drop the strong reference to bs
|
|
|
|
* (which we obtained by calling bdrv_new()). bs will not be deleted,
|
|
|
|
* though, because the overlay still has a reference to it. */
|
|
|
|
bdrv_unref(bs);
|
|
|
|
bs = snapshot_bs;
|
2015-06-18 15:09:57 +03:00
|
|
|
}
|
|
|
|
|
2016-05-17 17:41:31 +03:00
|
|
|
return bs;
|
2010-04-12 18:37:13 +04:00
|
|
|
|
2014-04-11 21:16:36 +04:00
|
|
|
fail:
|
2017-02-17 20:39:24 +03:00
|
|
|
blk_unref(file);
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(snapshot_options);
|
|
|
|
qobject_unref(bs->explicit_options);
|
|
|
|
qobject_unref(bs->options);
|
|
|
|
qobject_unref(options);
|
2013-03-15 13:35:02 +04:00
|
|
|
bs->options = NULL;
|
2017-07-14 17:35:47 +03:00
|
|
|
bs->explicit_options = NULL;
|
2016-05-17 17:41:31 +03:00
|
|
|
bdrv_unref(bs);
|
2016-06-14 00:57:56 +03:00
|
|
|
error_propagate(errp, local_err);
|
2016-05-17 17:41:31 +03:00
|
|
|
return NULL;
|
2013-03-15 13:35:02 +04:00
|
|
|
|
2013-03-15 13:35:04 +04:00
|
|
|
close_and_fail:
|
2016-05-17 17:41:31 +03:00
|
|
|
bdrv_unref(bs);
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(snapshot_options);
|
|
|
|
qobject_unref(options);
|
2016-06-14 00:57:56 +03:00
|
|
|
error_propagate(errp, local_err);
|
2016-05-17 17:41:31 +03:00
|
|
|
return NULL;
|
2010-04-12 18:37:13 +04:00
|
|
|
}
|
|
|
|
|
2023-01-13 23:42:04 +03:00
|
|
|
/*
|
|
|
|
* The caller must always hold @filename AioContext lock, because this
|
|
|
|
* function eventually calls bdrv_refresh_total_sectors() which polls
|
|
|
|
* when called from non-coroutine context.
|
|
|
|
*/
|
2016-05-17 17:41:31 +03:00
|
|
|
BlockDriverState *bdrv_open(const char *filename, const char *reference,
|
|
|
|
QDict *options, int flags, Error **errp)
|
2015-04-08 14:43:47 +03:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2016-05-17 17:41:31 +03:00
|
|
|
return bdrv_open_inherit(filename, reference, options, flags, NULL,
|
2020-05-13 14:05:17 +03:00
|
|
|
NULL, 0, errp);
|
2015-04-08 14:43:47 +03:00
|
|
|
}
|
|
|
|
|
2019-03-12 19:48:49 +03:00
|
|
|
/* Return true if the NULL-terminated @list contains @str */
|
|
|
|
static bool is_str_in_list(const char *str, const char *const *list)
|
|
|
|
{
|
|
|
|
if (str && list) {
|
|
|
|
int i;
|
|
|
|
for (i = 0; list[i] != NULL; i++) {
|
|
|
|
if (!strcmp(str, list[i])) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check that every option set in @bs->options is also set in
|
|
|
|
* @new_opts.
|
|
|
|
*
|
|
|
|
* Options listed in the common_options list and in
|
|
|
|
* @bs->drv->mutable_opts are skipped.
|
|
|
|
*
|
|
|
|
* Return 0 on success, otherwise return -EINVAL and set @errp.
|
|
|
|
*/
|
|
|
|
static int bdrv_reset_options_allowed(BlockDriverState *bs,
|
|
|
|
const QDict *new_opts, Error **errp)
|
|
|
|
{
|
|
|
|
const QDictEntry *e;
|
|
|
|
/* These options are common to all block drivers and are handled
|
|
|
|
* in bdrv_reopen_prepare() so they can be left out of @new_opts */
|
|
|
|
const char *const common_options[] = {
|
|
|
|
"node-name", "discard", "cache.direct", "cache.no-flush",
|
|
|
|
"read-only", "auto-read-only", "detect-zeroes", NULL
|
|
|
|
};
|
|
|
|
|
|
|
|
for (e = qdict_first(bs->options); e; e = qdict_next(bs->options, e)) {
|
|
|
|
if (!qdict_haskey(new_opts, e->key) &&
|
|
|
|
!is_str_in_list(e->key, common_options) &&
|
|
|
|
!is_str_in_list(e->key, bs->drv->mutable_opts)) {
|
|
|
|
error_setg(errp, "Option '%s' cannot be reset "
|
|
|
|
"to its default value", e->key);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-03-12 19:48:47 +03:00
|
|
|
/*
|
|
|
|
* Returns true if @child can be reached recursively from @bs
|
|
|
|
*/
|
|
|
|
static bool bdrv_recurse_has_child(BlockDriverState *bs,
|
|
|
|
BlockDriverState *child)
|
|
|
|
{
|
|
|
|
BdrvChild *c;
|
|
|
|
|
|
|
|
if (bs == child) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
QLIST_FOREACH(c, &bs->children, next) {
|
|
|
|
if (bdrv_recurse_has_child(c->bs, child)) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2012-09-20 23:13:19 +04:00
|
|
|
/*
|
|
|
|
* Adds a BlockDriverState to a simple queue for an atomic, transactional
|
|
|
|
* reopen of multiple devices.
|
|
|
|
*
|
2019-09-27 15:23:47 +03:00
|
|
|
* bs_queue can either be an existing BlockReopenQueue that has had QTAILQ_INIT
|
2012-09-20 23:13:19 +04:00
|
|
|
* already performed, or alternatively may be NULL a new BlockReopenQueue will
|
|
|
|
* be created and initialized. This newly created BlockReopenQueue should be
|
|
|
|
* passed back in for subsequent calls that are intended to be of the same
|
|
|
|
* atomic 'set'.
|
|
|
|
*
|
|
|
|
* bs is the BlockDriverState to add to the reopen queue.
|
|
|
|
*
|
2015-04-10 18:50:50 +03:00
|
|
|
* options contains the changed options for the associated bs
|
|
|
|
* (the BlockReopenQueue takes ownership)
|
|
|
|
*
|
2012-09-20 23:13:19 +04:00
|
|
|
* flags contains the open flags for the associated bs
|
|
|
|
*
|
|
|
|
* returns a pointer to bs_queue, which is either the newly allocated
|
|
|
|
* bs_queue, or the existing bs_queue being used.
|
|
|
|
*
|
2022-11-18 20:41:02 +03:00
|
|
|
* bs is drained here and undrained by bdrv_reopen_queue_free().
|
2022-11-18 20:41:01 +03:00
|
|
|
*
|
|
|
|
* To be called with bs->aio_context locked.
|
2012-09-20 23:13:19 +04:00
|
|
|
*/
|
2015-05-08 18:07:31 +03:00
|
|
|
static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
|
|
|
|
BlockDriverState *bs,
|
|
|
|
QDict *options,
|
2020-05-13 14:05:13 +03:00
|
|
|
const BdrvChildClass *klass,
|
2020-05-13 14:05:17 +03:00
|
|
|
BdrvChildRole role,
|
2020-05-13 14:05:18 +03:00
|
|
|
bool parent_is_format,
|
2015-05-08 18:07:31 +03:00
|
|
|
QDict *parent_options,
|
2019-03-12 19:48:44 +03:00
|
|
|
int parent_flags,
|
|
|
|
bool keep_old_opts)
|
2012-09-20 23:13:19 +04:00
|
|
|
{
|
|
|
|
assert(bs != NULL);
|
|
|
|
|
|
|
|
BlockReopenQueueEntry *bs_entry;
|
2015-04-09 19:54:04 +03:00
|
|
|
BdrvChild *child;
|
2018-11-12 17:00:45 +03:00
|
|
|
QDict *old_options, *explicit_options, *options_copy;
|
|
|
|
int flags;
|
|
|
|
QemuOpts *opts;
|
2015-04-09 19:54:04 +03:00
|
|
|
|
2022-03-03 18:16:13 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2017-12-06 22:24:44 +03:00
|
|
|
|
2022-11-18 20:41:02 +03:00
|
|
|
bdrv_drained_begin(bs);
|
|
|
|
|
2012-09-20 23:13:19 +04:00
|
|
|
if (bs_queue == NULL) {
|
|
|
|
bs_queue = g_new0(BlockReopenQueue, 1);
|
2019-09-27 15:23:47 +03:00
|
|
|
QTAILQ_INIT(bs_queue);
|
2012-09-20 23:13:19 +04:00
|
|
|
}
|
|
|
|
|
2015-04-10 18:50:50 +03:00
|
|
|
if (!options) {
|
|
|
|
options = qdict_new();
|
|
|
|
}
|
|
|
|
|
2016-09-15 17:53:03 +03:00
|
|
|
/* Check if this BlockDriverState is already in the queue */
|
2019-09-27 15:23:47 +03:00
|
|
|
QTAILQ_FOREACH(bs_entry, bs_queue, entry) {
|
2016-09-15 17:53:03 +03:00
|
|
|
if (bs == bs_entry->state.bs) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-05-08 18:07:31 +03:00
|
|
|
/*
|
|
|
|
* Precedence of options:
|
|
|
|
* 1. Explicitly passed in options (highest)
|
2018-11-12 17:00:45 +03:00
|
|
|
* 2. Retained from explicitly set options of bs
|
|
|
|
* 3. Inherited from parent node
|
|
|
|
* 4. Retained from effective options of bs
|
2015-05-08 18:07:31 +03:00
|
|
|
*/
|
|
|
|
|
2015-05-08 17:15:03 +03:00
|
|
|
/* Old explicitly set values (don't overwrite by inherited value) */
|
2019-03-12 19:48:44 +03:00
|
|
|
if (bs_entry || keep_old_opts) {
|
|
|
|
old_options = qdict_clone_shallow(bs_entry ?
|
|
|
|
bs_entry->state.explicit_options :
|
|
|
|
bs->explicit_options);
|
|
|
|
bdrv_join_options(bs, options, old_options);
|
|
|
|
qobject_unref(old_options);
|
2016-09-15 17:53:03 +03:00
|
|
|
}
|
2015-05-08 17:15:03 +03:00
|
|
|
|
|
|
|
explicit_options = qdict_clone_shallow(options);
|
|
|
|
|
2015-05-08 18:07:31 +03:00
|
|
|
/* Inherit from parent node */
|
|
|
|
if (parent_options) {
|
2018-11-12 17:00:45 +03:00
|
|
|
flags = 0;
|
2020-05-13 14:05:18 +03:00
|
|
|
klass->inherit_options(role, parent_is_format, &flags, options,
|
2020-05-13 14:05:17 +03:00
|
|
|
parent_flags, parent_options);
|
2018-11-12 17:00:45 +03:00
|
|
|
} else {
|
|
|
|
flags = bdrv_get_flags(bs);
|
2015-05-08 18:07:31 +03:00
|
|
|
}
|
|
|
|
|
2019-03-12 19:48:44 +03:00
|
|
|
if (keep_old_opts) {
|
|
|
|
/* Old values are used for options that aren't set yet */
|
|
|
|
old_options = qdict_clone_shallow(bs->options);
|
|
|
|
bdrv_join_options(bs, options, old_options);
|
|
|
|
qobject_unref(old_options);
|
|
|
|
}
|
2015-04-10 18:50:50 +03:00
|
|
|
|
2018-11-12 17:00:45 +03:00
|
|
|
/* We have the final set of options so let's update the flags */
|
|
|
|
options_copy = qdict_clone_shallow(options);
|
|
|
|
opts = qemu_opts_create(&bdrv_runtime_opts, NULL, 0, &error_abort);
|
|
|
|
qemu_opts_absorb_qdict(opts, options_copy, NULL);
|
|
|
|
update_flags_from_options(&flags, opts);
|
|
|
|
qemu_opts_del(opts);
|
|
|
|
qobject_unref(options_copy);
|
|
|
|
|
2017-08-03 18:02:59 +03:00
|
|
|
/* bdrv_open_inherit() sets and clears some additional flags internally */
|
2014-04-25 21:04:55 +04:00
|
|
|
flags &= ~BDRV_O_PROTOCOL;
|
2017-08-03 18:02:59 +03:00
|
|
|
if (flags & BDRV_O_RDWR) {
|
|
|
|
flags |= BDRV_O_ALLOW_RDWR;
|
|
|
|
}
|
2014-04-25 21:04:55 +04:00
|
|
|
|
2017-09-14 15:53:46 +03:00
|
|
|
if (!bs_entry) {
|
|
|
|
bs_entry = g_new0(BlockReopenQueueEntry, 1);
|
2019-09-27 15:23:47 +03:00
|
|
|
QTAILQ_INSERT_TAIL(bs_queue, bs_entry, entry);
|
2017-09-14 15:53:46 +03:00
|
|
|
} else {
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(bs_entry->state.options);
|
|
|
|
qobject_unref(bs_entry->state.explicit_options);
|
2017-09-14 15:53:46 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
bs_entry->state.bs = bs;
|
|
|
|
bs_entry->state.options = options;
|
|
|
|
bs_entry->state.explicit_options = explicit_options;
|
|
|
|
bs_entry->state.flags = flags;
|
|
|
|
|
2019-03-12 19:48:45 +03:00
|
|
|
/*
|
|
|
|
* If keep_old_opts is false then it means that unspecified
|
|
|
|
* options must be reset to their original value. We don't allow
|
|
|
|
* resetting 'backing' but we need to know if the option is
|
|
|
|
* missing in order to decide if we have to return an error.
|
|
|
|
*/
|
|
|
|
if (!keep_old_opts) {
|
|
|
|
bs_entry->state.backing_missing =
|
|
|
|
!qdict_haskey(options, "backing") &&
|
|
|
|
!qdict_haskey(options, "backing.driver");
|
|
|
|
}
|
|
|
|
|
2015-04-09 19:54:04 +03:00
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
2019-03-12 19:48:45 +03:00
|
|
|
QDict *new_child_options = NULL;
|
|
|
|
bool child_keep_old = keep_old_opts;
|
2015-04-09 19:54:04 +03:00
|
|
|
|
2015-05-08 16:14:15 +03:00
|
|
|
/* reopen can only change the options of block devices that were
|
|
|
|
* implicitly created and inherited options. For other (referenced)
|
|
|
|
* block devices, a syntax like "backing.foo" results in an error. */
|
2015-04-09 19:54:04 +03:00
|
|
|
if (child->bs->inherits_from != bs) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2019-03-12 19:48:45 +03:00
|
|
|
/* Check if the options contain a child reference */
|
|
|
|
if (qdict_haskey(options, child->name)) {
|
|
|
|
const char *childref = qdict_get_try_str(options, child->name);
|
|
|
|
/*
|
|
|
|
* The current child must not be reopened if the child
|
|
|
|
* reference is null or points to a different node.
|
|
|
|
*/
|
|
|
|
if (g_strcmp0(childref, child->bs->node_name)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* If the child reference points to the current child then
|
|
|
|
* reopen it with its existing set of options (note that
|
|
|
|
* it can still inherit new options from the parent).
|
|
|
|
*/
|
|
|
|
child_keep_old = true;
|
|
|
|
} else {
|
|
|
|
/* Extract child options ("child-name.*") */
|
|
|
|
char *child_key_dot = g_strdup_printf("%s.", child->name);
|
|
|
|
qdict_extract_subqdict(explicit_options, NULL, child_key_dot);
|
|
|
|
qdict_extract_subqdict(options, &new_child_options, child_key_dot);
|
|
|
|
g_free(child_key_dot);
|
|
|
|
}
|
2015-05-08 16:14:15 +03:00
|
|
|
|
2018-11-12 17:00:45 +03:00
|
|
|
bdrv_reopen_queue_child(bs_queue, child->bs, new_child_options,
|
2020-05-13 14:05:18 +03:00
|
|
|
child->klass, child->role, bs->drv->is_format,
|
|
|
|
options, flags, child_keep_old);
|
2012-09-20 23:13:19 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
return bs_queue;
|
|
|
|
}
|
|
|
|
|
2022-11-18 20:41:01 +03:00
|
|
|
/* To be called with bs->aio_context locked */
|
2015-05-08 18:07:31 +03:00
|
|
|
BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
|
|
|
|
BlockDriverState *bs,
|
2019-03-12 19:48:44 +03:00
|
|
|
QDict *options, bool keep_old_opts)
|
2015-05-08 18:07:31 +03:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2020-05-13 14:05:18 +03:00
|
|
|
return bdrv_reopen_queue_child(bs_queue, bs, options, NULL, 0, false,
|
|
|
|
NULL, 0, keep_old_opts);
|
2015-05-08 18:07:31 +03:00
|
|
|
}
|
|
|
|
|
2021-07-08 14:47:05 +03:00
|
|
|
void bdrv_reopen_queue_free(BlockReopenQueue *bs_queue)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2021-07-08 14:47:05 +03:00
|
|
|
if (bs_queue) {
|
|
|
|
BlockReopenQueueEntry *bs_entry, *next;
|
|
|
|
QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
|
2022-11-18 20:41:02 +03:00
|
|
|
AioContext *ctx = bdrv_get_aio_context(bs_entry->state.bs);
|
|
|
|
|
|
|
|
aio_context_acquire(ctx);
|
|
|
|
bdrv_drained_end(bs_entry->state.bs);
|
|
|
|
aio_context_release(ctx);
|
|
|
|
|
2021-07-08 14:47:05 +03:00
|
|
|
qobject_unref(bs_entry->state.explicit_options);
|
|
|
|
qobject_unref(bs_entry->state.options);
|
|
|
|
g_free(bs_entry);
|
|
|
|
}
|
|
|
|
g_free(bs_queue);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-09-20 23:13:19 +04:00
|
|
|
/*
|
|
|
|
* Reopen multiple BlockDriverStates atomically & transactionally.
|
|
|
|
*
|
|
|
|
* The queue passed in (bs_queue) must have been built up previous
|
|
|
|
* via bdrv_reopen_queue().
|
|
|
|
*
|
|
|
|
* Reopens all BDS specified in the queue, with the appropriate
|
|
|
|
* flags. All devices are prepared for reopen, and failure of any
|
2018-07-12 22:51:20 +03:00
|
|
|
* device will cause all device changes to be abandoned, and intermediate
|
2012-09-20 23:13:19 +04:00
|
|
|
* data cleaned up.
|
|
|
|
*
|
|
|
|
* If all devices prepare successfully, then the changes are committed
|
|
|
|
* to all devices.
|
|
|
|
*
|
2017-12-06 22:24:44 +03:00
|
|
|
* All affected nodes must be drained between bdrv_reopen_queue() and
|
|
|
|
* bdrv_reopen_multiple().
|
2021-07-08 14:47:06 +03:00
|
|
|
*
|
|
|
|
* To be called from the main thread, with all other AioContexts unlocked.
|
2012-09-20 23:13:19 +04:00
|
|
|
*/
|
2019-03-12 19:48:50 +03:00
|
|
|
int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
|
2012-09-20 23:13:19 +04:00
|
|
|
{
|
|
|
|
int ret = -1;
|
|
|
|
BlockReopenQueueEntry *bs_entry, *next;
|
2021-07-08 14:47:06 +03:00
|
|
|
AioContext *ctx;
|
2021-04-28 18:17:58 +03:00
|
|
|
Transaction *tran = tran_new();
|
|
|
|
g_autoptr(GSList) refresh_list = NULL;
|
2012-09-20 23:13:19 +04:00
|
|
|
|
2021-07-08 14:47:06 +03:00
|
|
|
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
|
2012-09-20 23:13:19 +04:00
|
|
|
assert(bs_queue != NULL);
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2012-09-20 23:13:19 +04:00
|
|
|
|
2021-04-28 18:17:57 +03:00
|
|
|
QTAILQ_FOREACH(bs_entry, bs_queue, entry) {
|
2021-07-08 14:47:06 +03:00
|
|
|
ctx = bdrv_get_aio_context(bs_entry->state.bs);
|
|
|
|
aio_context_acquire(ctx);
|
2021-04-28 18:17:57 +03:00
|
|
|
ret = bdrv_flush(bs_entry->state.bs);
|
2021-07-08 14:47:06 +03:00
|
|
|
aio_context_release(ctx);
|
2021-04-28 18:17:57 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
error_setg_errno(errp, -ret, "Error flushing drive");
|
2021-05-03 14:05:55 +03:00
|
|
|
goto abort;
|
2021-04-28 18:17:57 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-09-27 15:23:47 +03:00
|
|
|
QTAILQ_FOREACH(bs_entry, bs_queue, entry) {
|
2017-12-06 22:24:44 +03:00
|
|
|
assert(bs_entry->state.bs->quiesce_counter > 0);
|
2021-07-08 14:47:06 +03:00
|
|
|
ctx = bdrv_get_aio_context(bs_entry->state.bs);
|
|
|
|
aio_context_acquire(ctx);
|
2021-04-28 18:17:58 +03:00
|
|
|
ret = bdrv_reopen_prepare(&bs_entry->state, bs_queue, tran, errp);
|
2021-07-08 14:47:06 +03:00
|
|
|
aio_context_release(ctx);
|
2021-04-28 18:17:58 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
goto abort;
|
2012-09-20 23:13:19 +04:00
|
|
|
}
|
|
|
|
bs_entry->prepared = true;
|
|
|
|
}
|
|
|
|
|
2019-09-27 15:23:47 +03:00
|
|
|
QTAILQ_FOREACH(bs_entry, bs_queue, entry) {
|
2019-03-05 19:18:22 +03:00
|
|
|
BDRVReopenState *state = &bs_entry->state;
|
2021-04-28 18:17:58 +03:00
|
|
|
|
2022-11-07 19:35:58 +03:00
|
|
|
refresh_list = g_slist_prepend(refresh_list, state->bs);
|
2021-04-28 18:17:58 +03:00
|
|
|
if (state->old_backing_bs) {
|
2022-11-07 19:35:58 +03:00
|
|
|
refresh_list = g_slist_prepend(refresh_list, state->old_backing_bs);
|
2019-03-12 19:48:47 +03:00
|
|
|
}
|
2021-06-10 15:05:36 +03:00
|
|
|
if (state->old_file_bs) {
|
2022-11-07 19:35:58 +03:00
|
|
|
refresh_list = g_slist_prepend(refresh_list, state->old_file_bs);
|
2021-06-10 15:05:36 +03:00
|
|
|
}
|
2021-04-28 18:17:58 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Note that file-posix driver rely on permission update done during reopen
|
|
|
|
* (even if no permission changed), because it wants "new" permissions for
|
|
|
|
* reconfiguring the fd and that's why it does it in raw_check_perm(), not
|
|
|
|
* in raw_reopen_prepare() which is called with "old" permissions.
|
|
|
|
*/
|
|
|
|
ret = bdrv_list_refresh_perms(refresh_list, bs_queue, tran, errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
goto abort;
|
2019-03-05 19:18:22 +03:00
|
|
|
}
|
|
|
|
|
2019-09-27 15:23:48 +03:00
|
|
|
/*
|
|
|
|
* If we reach this point, we have success and just need to apply the
|
|
|
|
* changes.
|
|
|
|
*
|
|
|
|
* Reverse order is used to comfort qcow2 driver: on commit it need to write
|
|
|
|
* IN_USE flag to the image, to mark bitmaps in the image as invalid. But
|
|
|
|
* children are usually goes after parents in reopen-queue, so go from last
|
|
|
|
* to first element.
|
2012-09-20 23:13:19 +04:00
|
|
|
*/
|
2019-09-27 15:23:48 +03:00
|
|
|
QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) {
|
2021-07-08 14:47:06 +03:00
|
|
|
ctx = bdrv_get_aio_context(bs_entry->state.bs);
|
|
|
|
aio_context_acquire(ctx);
|
2012-09-20 23:13:19 +04:00
|
|
|
bdrv_reopen_commit(&bs_entry->state);
|
2021-07-08 14:47:06 +03:00
|
|
|
aio_context_release(ctx);
|
2012-09-20 23:13:19 +04:00
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:58 +03:00
|
|
|
tran_commit(tran);
|
2019-03-05 19:18:22 +03:00
|
|
|
|
2021-04-28 18:17:58 +03:00
|
|
|
QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) {
|
|
|
|
BlockDriverState *bs = bs_entry->state.bs;
|
2020-11-06 15:42:39 +03:00
|
|
|
|
2021-04-28 18:17:58 +03:00
|
|
|
if (bs->drv->bdrv_reopen_commit_post) {
|
2021-07-08 14:47:06 +03:00
|
|
|
ctx = bdrv_get_aio_context(bs);
|
|
|
|
aio_context_acquire(ctx);
|
2021-04-28 18:17:58 +03:00
|
|
|
bs->drv->bdrv_reopen_commit_post(&bs_entry->state);
|
2021-07-08 14:47:06 +03:00
|
|
|
aio_context_release(ctx);
|
2019-03-05 19:18:22 +03:00
|
|
|
}
|
|
|
|
}
|
2020-02-28 15:44:46 +03:00
|
|
|
|
2021-04-28 18:17:58 +03:00
|
|
|
ret = 0;
|
|
|
|
goto cleanup;
|
2020-02-28 15:44:46 +03:00
|
|
|
|
2021-04-28 18:17:58 +03:00
|
|
|
abort:
|
|
|
|
tran_abort(tran);
|
|
|
|
QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
|
|
|
|
if (bs_entry->prepared) {
|
2021-07-08 14:47:06 +03:00
|
|
|
ctx = bdrv_get_aio_context(bs_entry->state.bs);
|
|
|
|
aio_context_acquire(ctx);
|
2021-04-28 18:17:58 +03:00
|
|
|
bdrv_reopen_abort(&bs_entry->state);
|
2021-07-08 14:47:06 +03:00
|
|
|
aio_context_release(ctx);
|
2020-02-28 15:44:46 +03:00
|
|
|
}
|
|
|
|
}
|
2021-04-28 18:17:58 +03:00
|
|
|
|
2012-09-20 23:13:19 +04:00
|
|
|
cleanup:
|
2021-07-08 14:47:05 +03:00
|
|
|
bdrv_reopen_queue_free(bs_queue);
|
2016-10-28 10:08:03 +03:00
|
|
|
|
2012-09-20 23:13:19 +04:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-07-08 14:47:06 +03:00
|
|
|
int bdrv_reopen(BlockDriverState *bs, QDict *opts, bool keep_old_opts,
|
|
|
|
Error **errp)
|
2018-11-12 17:00:33 +03:00
|
|
|
{
|
2021-07-08 14:47:06 +03:00
|
|
|
AioContext *ctx = bdrv_get_aio_context(bs);
|
2018-11-12 17:00:33 +03:00
|
|
|
BlockReopenQueue *queue;
|
2021-07-08 14:47:06 +03:00
|
|
|
int ret;
|
2018-11-12 17:00:33 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2022-11-18 20:41:01 +03:00
|
|
|
queue = bdrv_reopen_queue(NULL, bs, opts, keep_old_opts);
|
|
|
|
|
2021-07-08 14:47:06 +03:00
|
|
|
if (ctx != qemu_get_aio_context()) {
|
|
|
|
aio_context_release(ctx);
|
|
|
|
}
|
2019-03-12 19:48:50 +03:00
|
|
|
ret = bdrv_reopen_multiple(queue, errp);
|
2021-07-08 14:47:06 +03:00
|
|
|
|
|
|
|
if (ctx != qemu_get_aio_context()) {
|
|
|
|
aio_context_acquire(ctx);
|
|
|
|
}
|
2018-11-12 17:00:33 +03:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-07-08 14:47:06 +03:00
|
|
|
int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
QDict *opts = qdict_new();
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2021-07-08 14:47:06 +03:00
|
|
|
qdict_put_bool(opts, BDRV_OPT_READ_ONLY, read_only);
|
|
|
|
|
|
|
|
return bdrv_reopen(bs, opts, true, errp);
|
|
|
|
}
|
|
|
|
|
2019-03-12 19:48:47 +03:00
|
|
|
/*
|
|
|
|
* Take a BDRVReopenState and check if the value of 'backing' in the
|
|
|
|
* reopen_state->options QDict is valid or not.
|
|
|
|
*
|
|
|
|
* If 'backing' is missing from the QDict then return 0.
|
|
|
|
*
|
|
|
|
* If 'backing' contains the node name of the backing file of
|
|
|
|
* reopen_state->bs then return 0.
|
|
|
|
*
|
|
|
|
* If 'backing' contains a different node name (or is null) then check
|
|
|
|
* whether the current backing file can be replaced with the new one.
|
|
|
|
* If that's the case then reopen_state->replace_backing_bs is set to
|
|
|
|
* true and reopen_state->new_backing_bs contains a pointer to the new
|
|
|
|
* backing BlockDriverState (or NULL).
|
|
|
|
*
|
|
|
|
* Return 0 on success, otherwise return < 0 and set @errp.
|
|
|
|
*/
|
2021-06-10 15:05:36 +03:00
|
|
|
static int bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
|
|
|
|
bool is_backing, Transaction *tran,
|
|
|
|
Error **errp)
|
2019-03-12 19:48:47 +03:00
|
|
|
{
|
|
|
|
BlockDriverState *bs = reopen_state->bs;
|
2021-06-10 15:05:36 +03:00
|
|
|
BlockDriverState *new_child_bs;
|
|
|
|
BlockDriverState *old_child_bs = is_backing ? child_bs(bs->backing) :
|
|
|
|
child_bs(bs->file);
|
|
|
|
const char *child_name = is_backing ? "backing" : "file";
|
2019-03-12 19:48:47 +03:00
|
|
|
QObject *value;
|
|
|
|
const char *str;
|
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2021-06-10 15:05:36 +03:00
|
|
|
value = qdict_get(reopen_state->options, child_name);
|
2019-03-12 19:48:47 +03:00
|
|
|
if (value == NULL) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (qobject_type(value)) {
|
|
|
|
case QTYPE_QNULL:
|
2021-06-10 15:05:36 +03:00
|
|
|
assert(is_backing); /* The 'file' option does not allow a null value */
|
|
|
|
new_child_bs = NULL;
|
2019-03-12 19:48:47 +03:00
|
|
|
break;
|
|
|
|
case QTYPE_QSTRING:
|
2020-12-11 20:11:42 +03:00
|
|
|
str = qstring_get_str(qobject_to(QString, value));
|
2021-06-10 15:05:36 +03:00
|
|
|
new_child_bs = bdrv_lookup_bs(NULL, str, errp);
|
|
|
|
if (new_child_bs == NULL) {
|
2019-03-12 19:48:47 +03:00
|
|
|
return -EINVAL;
|
2021-06-10 15:05:36 +03:00
|
|
|
} else if (bdrv_recurse_has_child(new_child_bs, bs)) {
|
|
|
|
error_setg(errp, "Making '%s' a %s child of '%s' would create a "
|
|
|
|
"cycle", str, child_name, bs->node_name);
|
2019-03-12 19:48:47 +03:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
default:
|
2021-06-10 15:05:36 +03:00
|
|
|
/*
|
|
|
|
* The options QDict has been flattened, so 'backing' and 'file'
|
|
|
|
* do not allow any other data type here.
|
|
|
|
*/
|
2019-03-12 19:48:47 +03:00
|
|
|
g_assert_not_reached();
|
|
|
|
}
|
|
|
|
|
2021-06-10 15:05:36 +03:00
|
|
|
if (old_child_bs == new_child_bs) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (old_child_bs) {
|
|
|
|
if (bdrv_skip_implicit_filters(old_child_bs) == new_child_bs) {
|
2021-06-10 15:05:33 +03:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-06-10 15:05:36 +03:00
|
|
|
if (old_child_bs->implicit) {
|
|
|
|
error_setg(errp, "Cannot replace implicit %s child of %s",
|
|
|
|
child_name, bs->node_name);
|
2021-06-10 15:05:33 +03:00
|
|
|
return -EPERM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-06-10 15:05:36 +03:00
|
|
|
if (bs->drv->is_filter && !old_child_bs) {
|
2021-06-10 15:05:34 +03:00
|
|
|
/*
|
|
|
|
* Filters always have a file or a backing child, so we are trying to
|
|
|
|
* change wrong child
|
|
|
|
*/
|
|
|
|
error_setg(errp, "'%s' is a %s filter node that does not support a "
|
2021-06-10 15:05:36 +03:00
|
|
|
"%s child", bs->node_name, bs->drv->format_name, child_name);
|
2019-06-12 18:24:39 +03:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2021-06-10 15:05:36 +03:00
|
|
|
if (is_backing) {
|
|
|
|
reopen_state->old_backing_bs = old_child_bs;
|
|
|
|
} else {
|
|
|
|
reopen_state->old_file_bs = old_child_bs;
|
|
|
|
}
|
|
|
|
|
|
|
|
return bdrv_set_file_or_backing_noperm(bs, new_child_bs, is_backing,
|
|
|
|
tran, errp);
|
2019-03-12 19:48:47 +03:00
|
|
|
}
|
|
|
|
|
2012-09-20 23:13:19 +04:00
|
|
|
/*
|
|
|
|
* Prepares a BlockDriverState for reopen. All changes are staged in the
|
|
|
|
* 'opaque' field of the BDRVReopenState, which is used and allocated by
|
|
|
|
* the block driver layer .bdrv_reopen_prepare()
|
|
|
|
*
|
|
|
|
* bs is the BlockDriverState to reopen
|
|
|
|
* flags are the new open flags
|
|
|
|
* queue is the reopen queue
|
|
|
|
*
|
|
|
|
* Returns 0 on success, non-zero on error. On error errp will be set
|
|
|
|
* as well.
|
|
|
|
*
|
|
|
|
* On failure, bdrv_reopen_abort() will be called to clean up any data.
|
|
|
|
* It is the responsibility of the caller to then call the abort() or
|
|
|
|
* commit() for any other BDS that have been left in a prepare() state
|
|
|
|
*
|
|
|
|
*/
|
2021-04-28 18:17:35 +03:00
|
|
|
static int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
|
2021-04-28 18:17:58 +03:00
|
|
|
BlockReopenQueue *queue,
|
2021-06-10 15:05:36 +03:00
|
|
|
Transaction *change_child_tran, Error **errp)
|
2012-09-20 23:13:19 +04:00
|
|
|
{
|
|
|
|
int ret = -1;
|
2018-11-12 17:00:47 +03:00
|
|
|
int old_flags;
|
2012-09-20 23:13:19 +04:00
|
|
|
Error *local_err = NULL;
|
|
|
|
BlockDriver *drv;
|
2015-05-08 18:24:56 +03:00
|
|
|
QemuOpts *opts;
|
2018-06-29 14:37:02 +03:00
|
|
|
QDict *orig_reopen_opts;
|
2018-09-06 12:37:08 +03:00
|
|
|
char *discard = NULL;
|
2017-04-07 23:55:30 +03:00
|
|
|
bool read_only;
|
2018-11-16 19:45:24 +03:00
|
|
|
bool drv_prepared = false;
|
2012-09-20 23:13:19 +04:00
|
|
|
|
|
|
|
assert(reopen_state != NULL);
|
|
|
|
assert(reopen_state->bs->drv != NULL);
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2012-09-20 23:13:19 +04:00
|
|
|
drv = reopen_state->bs->drv;
|
|
|
|
|
2018-06-29 14:37:02 +03:00
|
|
|
/* This function and each driver's bdrv_reopen_prepare() remove
|
|
|
|
* entries from reopen_state->options as they are processed, so
|
|
|
|
* we need to make a copy of the original QDict. */
|
|
|
|
orig_reopen_opts = qdict_clone_shallow(reopen_state->options);
|
|
|
|
|
2015-05-08 18:24:56 +03:00
|
|
|
/* Process generic block layer options */
|
|
|
|
opts = qemu_opts_create(&bdrv_runtime_opts, NULL, 0, &error_abort);
|
2020-07-07 19:06:03 +03:00
|
|
|
if (!qemu_opts_absorb_qdict(opts, reopen_state->options, errp)) {
|
2015-05-08 18:24:56 +03:00
|
|
|
ret = -EINVAL;
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2018-11-12 17:00:47 +03:00
|
|
|
/* This was already called in bdrv_reopen_queue_child() so the flags
|
|
|
|
* are up-to-date. This time we simply want to remove the options from
|
|
|
|
* QemuOpts in order to indicate that they have been processed. */
|
|
|
|
old_flags = reopen_state->flags;
|
2015-05-08 18:49:53 +03:00
|
|
|
update_flags_from_options(&reopen_state->flags, opts);
|
2018-11-12 17:00:47 +03:00
|
|
|
assert(old_flags == reopen_state->flags);
|
2015-05-08 18:49:53 +03:00
|
|
|
|
2018-10-03 13:23:13 +03:00
|
|
|
discard = qemu_opt_get_del(opts, BDRV_OPT_DISCARD);
|
2018-09-06 12:37:08 +03:00
|
|
|
if (discard != NULL) {
|
|
|
|
if (bdrv_parse_discard_flags(discard, &reopen_state->flags) != 0) {
|
|
|
|
error_setg(errp, "Invalid discard option");
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-09-06 12:37:09 +03:00
|
|
|
reopen_state->detect_zeroes =
|
|
|
|
bdrv_parse_detect_zeroes(opts, reopen_state->flags, &local_err);
|
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2018-09-06 12:37:06 +03:00
|
|
|
/* All other options (including node-name and driver) must be unchanged.
|
|
|
|
* Put them back into the QDict, so that they are checked at the end
|
|
|
|
* of this function. */
|
|
|
|
qemu_opts_to_qdict(opts, reopen_state->options);
|
2015-05-08 18:24:56 +03:00
|
|
|
|
2017-04-07 23:55:30 +03:00
|
|
|
/* If we are to stay read-only, do not allow permission change
|
|
|
|
* to r/w. Attempting to set to r/w may fail if either BDRV_O_ALLOW_RDWR is
|
|
|
|
* not set, or if the BDS still has copy_on_read enabled */
|
|
|
|
read_only = !(reopen_state->flags & BDRV_O_RDWR);
|
2017-08-03 18:02:58 +03:00
|
|
|
ret = bdrv_can_set_read_only(reopen_state->bs, read_only, true, &local_err);
|
2017-04-07 23:55:30 +03:00
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
2012-09-20 23:13:19 +04:00
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (drv->bdrv_reopen_prepare) {
|
2019-03-12 19:48:49 +03:00
|
|
|
/*
|
|
|
|
* If a driver-specific option is missing, it means that we
|
|
|
|
* should reset it to its default value.
|
|
|
|
* But not all options allow that, so we need to check it first.
|
|
|
|
*/
|
|
|
|
ret = bdrv_reset_options_allowed(reopen_state->bs,
|
|
|
|
reopen_state->options, errp);
|
|
|
|
if (ret) {
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2012-09-20 23:13:19 +04:00
|
|
|
ret = drv->bdrv_reopen_prepare(reopen_state, queue, &local_err);
|
|
|
|
if (ret) {
|
|
|
|
if (local_err != NULL) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
} else {
|
block: Use bdrv_refresh_filename() to pull
Before this patch, bdrv_refresh_filename() is used in a pushing manner:
Whenever the BDS graph is modified, the parents of the modified edges
are supposed to be updated (recursively upwards). However, that is
nonviable, considering that we want child changes not to concern
parents.
Also, in the long run we want a pull model anyway: Here, we would have a
bdrv_filename() function which returns a BDS's filename, freshly
constructed.
This patch is an intermediate step. It adds bdrv_refresh_filename()
calls before every place a BDS.filename value is used. The only
exceptions are protocol drivers that use their own filename, which
clearly would not profit from refreshing that filename before.
Also, bdrv_get_encrypted_filename() is removed along the way (as a user
of BDS.filename), since it is completely unused.
In turn, all of the calls to bdrv_refresh_filename() before this patch
are removed, because we no longer have to call this function on graph
changes.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:05 +03:00
|
|
|
bdrv_refresh_filename(reopen_state->bs);
|
2013-06-10 19:29:27 +04:00
|
|
|
error_setg(errp, "failed while preparing to reopen image '%s'",
|
|
|
|
reopen_state->bs->filename);
|
2012-09-20 23:13:19 +04:00
|
|
|
}
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
/* It is currently mandatory to have a bdrv_reopen_prepare()
|
|
|
|
* handler for each supported drv. */
|
2015-04-08 12:29:19 +03:00
|
|
|
error_setg(errp, "Block format '%s' used by node '%s' "
|
|
|
|
"does not support reopening files", drv->format_name,
|
|
|
|
bdrv_get_device_or_node_name(reopen_state->bs));
|
2012-09-20 23:13:19 +04:00
|
|
|
ret = -1;
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2018-11-16 19:45:24 +03:00
|
|
|
drv_prepared = true;
|
|
|
|
|
2019-03-12 19:48:46 +03:00
|
|
|
/*
|
|
|
|
* We must provide the 'backing' option if the BDS has a backing
|
|
|
|
* file or if the image file has a backing file name as part of
|
|
|
|
* its metadata. Otherwise the 'backing' option can be omitted.
|
|
|
|
*/
|
|
|
|
if (drv->supports_backing && reopen_state->backing_missing &&
|
2019-06-12 18:24:39 +03:00
|
|
|
(reopen_state->bs->backing || reopen_state->bs->backing_file[0])) {
|
2019-03-12 19:48:45 +03:00
|
|
|
error_setg(errp, "backing is missing for '%s'",
|
|
|
|
reopen_state->bs->node_name);
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2019-03-12 19:48:47 +03:00
|
|
|
/*
|
|
|
|
* Allow changing the 'backing' option. The new value can be
|
|
|
|
* either a reference to an existing node (using its node name)
|
|
|
|
* or NULL to simply detach the current backing file.
|
|
|
|
*/
|
2021-06-10 15:05:36 +03:00
|
|
|
ret = bdrv_reopen_parse_file_or_backing(reopen_state, true,
|
|
|
|
change_child_tran, errp);
|
2019-03-12 19:48:47 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
qdict_del(reopen_state->options, "backing");
|
|
|
|
|
2021-06-10 15:05:36 +03:00
|
|
|
/* Allow changing the 'file' option. In this case NULL is not allowed */
|
|
|
|
ret = bdrv_reopen_parse_file_or_backing(reopen_state, false,
|
|
|
|
change_child_tran, errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
qdict_del(reopen_state->options, "file");
|
|
|
|
|
2015-04-10 18:50:50 +03:00
|
|
|
/* Options that are not handled are only okay if they are unchanged
|
|
|
|
* compared to the old state. It is expected that some options are only
|
|
|
|
* used for the initial open, but not reopen (e.g. filename) */
|
|
|
|
if (qdict_size(reopen_state->options)) {
|
|
|
|
const QDictEntry *entry = qdict_first(reopen_state->options);
|
|
|
|
|
|
|
|
do {
|
2017-11-14 21:01:26 +03:00
|
|
|
QObject *new = entry->value;
|
|
|
|
QObject *old = qdict_get(reopen_state->bs->options, entry->key);
|
|
|
|
|
2018-09-06 12:37:05 +03:00
|
|
|
/* Allow child references (child_name=node_name) as long as they
|
|
|
|
* point to the current child (i.e. everything stays the same). */
|
|
|
|
if (qobject_type(new) == QTYPE_QSTRING) {
|
|
|
|
BdrvChild *child;
|
|
|
|
QLIST_FOREACH(child, &reopen_state->bs->children, next) {
|
|
|
|
if (!strcmp(child->name, entry->key)) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (child) {
|
2020-12-11 20:11:42 +03:00
|
|
|
if (!strcmp(child->bs->node_name,
|
|
|
|
qstring_get_str(qobject_to(QString, new)))) {
|
2018-09-06 12:37:05 +03:00
|
|
|
continue; /* Found child with this name, skip option */
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
block: Document -drive problematic code and bugs
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-30 20:43:12 +03:00
|
|
|
/*
|
2017-11-14 21:01:26 +03:00
|
|
|
* TODO: When using -drive to specify blockdev options, all values
|
|
|
|
* will be strings; however, when using -blockdev, blockdev-add or
|
|
|
|
* filenames using the json:{} pseudo-protocol, they will be
|
|
|
|
* correctly typed.
|
|
|
|
* In contrast, reopening options are (currently) always strings
|
|
|
|
* (because you can only specify them through qemu-io; all other
|
|
|
|
* callers do not specify any options).
|
|
|
|
* Therefore, when using anything other than -drive to create a BDS,
|
|
|
|
* this cannot detect non-string options as unchanged, because
|
|
|
|
* qobject_is_equal() always returns false for objects of different
|
|
|
|
* type. In the future, this should be remedied by correctly typing
|
|
|
|
* all options. For now, this is not too big of an issue because
|
|
|
|
* the user can simply omit options which cannot be changed anyway,
|
|
|
|
* so they will stay unchanged.
|
block: Document -drive problematic code and bugs
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-30 20:43:12 +03:00
|
|
|
*/
|
2017-11-14 21:01:26 +03:00
|
|
|
if (!qobject_is_equal(new, old)) {
|
2015-04-10 18:50:50 +03:00
|
|
|
error_setg(errp, "Cannot change the option '%s'", entry->key);
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
} while ((entry = qdict_next(reopen_state->options, entry)));
|
|
|
|
}
|
|
|
|
|
2012-09-20 23:13:19 +04:00
|
|
|
ret = 0;
|
|
|
|
|
2018-06-29 14:37:02 +03:00
|
|
|
/* Restore the original reopen_state->options QDict */
|
|
|
|
qobject_unref(reopen_state->options);
|
|
|
|
reopen_state->options = qobject_ref(orig_reopen_opts);
|
|
|
|
|
2012-09-20 23:13:19 +04:00
|
|
|
error:
|
2018-11-16 19:45:24 +03:00
|
|
|
if (ret < 0 && drv_prepared) {
|
|
|
|
/* drv->bdrv_reopen_prepare() has succeeded, so we need to
|
|
|
|
* call drv->bdrv_reopen_abort() before signaling an error
|
|
|
|
* (bdrv_reopen_multiple() will not call bdrv_reopen_abort()
|
|
|
|
* when the respective bdrv_reopen_prepare() has failed) */
|
|
|
|
if (drv->bdrv_reopen_abort) {
|
|
|
|
drv->bdrv_reopen_abort(reopen_state);
|
|
|
|
}
|
|
|
|
}
|
2015-05-08 18:24:56 +03:00
|
|
|
qemu_opts_del(opts);
|
2018-06-29 14:37:02 +03:00
|
|
|
qobject_unref(orig_reopen_opts);
|
2018-09-06 12:37:08 +03:00
|
|
|
g_free(discard);
|
2012-09-20 23:13:19 +04:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Takes the staged changes for the reopen from bdrv_reopen_prepare(), and
|
|
|
|
* makes them final by swapping the staging BlockDriverState contents into
|
|
|
|
* the active BlockDriverState contents.
|
|
|
|
*/
|
2021-04-28 18:17:35 +03:00
|
|
|
static void bdrv_reopen_commit(BDRVReopenState *reopen_state)
|
2012-09-20 23:13:19 +04:00
|
|
|
{
|
|
|
|
BlockDriver *drv;
|
2017-06-28 15:05:12 +03:00
|
|
|
BlockDriverState *bs;
|
block: Remove child references from bs->{options,explicit_options}
Block drivers allow opening their children using a reference to an
existing BlockDriverState. These references remain stored in the
'options' and 'explicit_options' QDicts, but we don't need to keep
them once everything is open.
What is more important, these values can become wrong if the children
change:
$ qemu-img create -f qcow2 hd0.qcow2 10M
$ qemu-img create -f qcow2 hd1.qcow2 10M
$ qemu-img create -f qcow2 hd2.qcow2 10M
$ $QEMU -drive if=none,file=hd0.qcow2,node-name=hd0 \
-drive if=none,file=hd1.qcow2,node-name=hd1,backing=hd0 \
-drive file=hd2.qcow2,node-name=hd2,backing=hd1
After this hd2 has hd1 as its backing file. Now let's remove it using
block_stream:
(qemu) block_stream hd2 0 hd0.qcow2
Now hd0 is the backing file of hd2, but hd2's options QDicts still
contain backing=hd1.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-09-06 12:37:03 +03:00
|
|
|
BdrvChild *child;
|
2012-09-20 23:13:19 +04:00
|
|
|
|
|
|
|
assert(reopen_state != NULL);
|
2017-06-28 15:05:12 +03:00
|
|
|
bs = reopen_state->bs;
|
|
|
|
drv = bs->drv;
|
2012-09-20 23:13:19 +04:00
|
|
|
assert(drv != NULL);
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2012-09-20 23:13:19 +04:00
|
|
|
|
|
|
|
/* If there are any driver level actions to take */
|
|
|
|
if (drv->bdrv_reopen_commit) {
|
|
|
|
drv->bdrv_reopen_commit(reopen_state);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* set BDS specific flags now */
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(bs->explicit_options);
|
2018-06-29 14:37:02 +03:00
|
|
|
qobject_unref(bs->options);
|
2021-07-08 14:47:05 +03:00
|
|
|
qobject_ref(reopen_state->explicit_options);
|
|
|
|
qobject_ref(reopen_state->options);
|
2015-05-08 17:15:03 +03:00
|
|
|
|
2017-06-28 15:05:12 +03:00
|
|
|
bs->explicit_options = reopen_state->explicit_options;
|
2018-06-29 14:37:02 +03:00
|
|
|
bs->options = reopen_state->options;
|
2017-06-28 15:05:12 +03:00
|
|
|
bs->open_flags = reopen_state->flags;
|
2018-09-06 12:37:09 +03:00
|
|
|
bs->detect_zeroes = reopen_state->detect_zeroes;
|
2013-12-11 23:14:09 +04:00
|
|
|
|
block: Remove child references from bs->{options,explicit_options}
Block drivers allow opening their children using a reference to an
existing BlockDriverState. These references remain stored in the
'options' and 'explicit_options' QDicts, but we don't need to keep
them once everything is open.
What is more important, these values can become wrong if the children
change:
$ qemu-img create -f qcow2 hd0.qcow2 10M
$ qemu-img create -f qcow2 hd1.qcow2 10M
$ qemu-img create -f qcow2 hd2.qcow2 10M
$ $QEMU -drive if=none,file=hd0.qcow2,node-name=hd0 \
-drive if=none,file=hd1.qcow2,node-name=hd1,backing=hd0 \
-drive file=hd2.qcow2,node-name=hd2,backing=hd1
After this hd2 has hd1 as its backing file. Now let's remove it using
block_stream:
(qemu) block_stream hd2 0 hd0.qcow2
Now hd0 is the backing file of hd2, but hd2's options QDicts still
contain backing=hd1.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-09-06 12:37:03 +03:00
|
|
|
/* Remove child references from bs->options and bs->explicit_options.
|
|
|
|
* Child options were already removed in bdrv_reopen_queue_child() */
|
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
|
|
|
qdict_del(bs->explicit_options, child->name);
|
|
|
|
qdict_del(bs->options, child->name);
|
|
|
|
}
|
2021-06-10 15:05:35 +03:00
|
|
|
/* backing is probably removed, so it's not handled by previous loop */
|
|
|
|
qdict_del(bs->explicit_options, "backing");
|
|
|
|
qdict_del(bs->options, "backing");
|
|
|
|
|
2023-05-04 14:57:50 +03:00
|
|
|
bdrv_graph_rdlock_main_loop();
|
2021-04-28 18:17:55 +03:00
|
|
|
bdrv_refresh_limits(bs, NULL, NULL);
|
2023-05-04 14:57:50 +03:00
|
|
|
bdrv_graph_rdunlock_main_loop();
|
2023-04-07 18:32:58 +03:00
|
|
|
bdrv_refresh_total_sectors(bs, bs->total_sectors);
|
2012-09-20 23:13:19 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Abort the reopen, and delete and free the staged changes in
|
|
|
|
* reopen_state
|
|
|
|
*/
|
2021-04-28 18:17:35 +03:00
|
|
|
static void bdrv_reopen_abort(BDRVReopenState *reopen_state)
|
2012-09-20 23:13:19 +04:00
|
|
|
{
|
|
|
|
BlockDriver *drv;
|
|
|
|
|
|
|
|
assert(reopen_state != NULL);
|
|
|
|
drv = reopen_state->bs->drv;
|
|
|
|
assert(drv != NULL);
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2012-09-20 23:13:19 +04:00
|
|
|
|
|
|
|
if (drv->bdrv_reopen_abort) {
|
|
|
|
drv->bdrv_reopen_abort(reopen_state);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2016-01-29 18:36:10 +03:00
|
|
|
static void bdrv_close(BlockDriverState *bs)
|
2003-06-30 14:03:06 +04:00
|
|
|
{
|
2014-06-20 23:57:33 +04:00
|
|
|
BdrvAioNotifier *ban, *ban_next;
|
2017-11-06 17:53:45 +03:00
|
|
|
BdrvChild *child, *next;
|
2014-06-20 23:57:33 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2016-05-17 17:41:32 +03:00
|
|
|
assert(!bs->refcnt);
|
2015-09-25 16:41:44 +03:00
|
|
|
|
2015-12-23 13:48:24 +03:00
|
|
|
bdrv_drained_begin(bs); /* complete I/O */
|
2013-07-02 17:36:25 +04:00
|
|
|
bdrv_flush(bs);
|
2015-05-29 13:53:14 +03:00
|
|
|
bdrv_drain(bs); /* in case flush left pending I/O */
|
2015-12-23 13:48:24 +03:00
|
|
|
|
2012-10-19 13:36:48 +04:00
|
|
|
if (bs->drv) {
|
2018-08-14 15:43:19 +03:00
|
|
|
if (bs->drv->bdrv_close) {
|
2019-06-12 17:07:11 +03:00
|
|
|
/* Must unfreeze all children, so bdrv_unref_child() works */
|
2018-08-14 15:43:19 +03:00
|
|
|
bs->drv->bdrv_close(bs);
|
|
|
|
}
|
2015-06-16 15:19:22 +03:00
|
|
|
bs->drv = NULL;
|
2017-11-06 17:53:45 +03:00
|
|
|
}
|
2015-06-16 11:58:20 +03:00
|
|
|
|
2017-11-06 17:53:45 +03:00
|
|
|
QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
|
2019-05-13 16:46:17 +03:00
|
|
|
bdrv_unref_child(bs, child);
|
2004-03-15 00:38:54 +03:00
|
|
|
}
|
2011-11-08 09:00:14 +04:00
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
assert(!bs->backing);
|
|
|
|
assert(!bs->file);
|
2017-11-06 17:53:45 +03:00
|
|
|
g_free(bs->opaque);
|
|
|
|
bs->opaque = NULL;
|
2020-09-23 13:56:46 +03:00
|
|
|
qatomic_set(&bs->copy_on_read, 0);
|
2017-11-06 17:53:45 +03:00
|
|
|
bs->backing_file[0] = '\0';
|
|
|
|
bs->backing_format[0] = '\0';
|
|
|
|
bs->total_sectors = 0;
|
|
|
|
bs->encrypted = false;
|
|
|
|
bs->sg = false;
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(bs->options);
|
|
|
|
qobject_unref(bs->explicit_options);
|
2017-11-06 17:53:45 +03:00
|
|
|
bs->options = NULL;
|
|
|
|
bs->explicit_options = NULL;
|
2018-04-19 18:01:43 +03:00
|
|
|
qobject_unref(bs->full_open_options);
|
2017-11-06 17:53:45 +03:00
|
|
|
bs->full_open_options = NULL;
|
block: block-status cache for data regions
As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform. The
main difference is that this time it is implemented as part of the
general block layer code.
The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the
implementation is outside of qemu, there is little we can do about its
performance.
We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts. So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.
To address this, we want to cache data regions. Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.
(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine. While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)
We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-08-12 11:41:44 +03:00
|
|
|
g_free(bs->block_status_cache);
|
|
|
|
bs->block_status_cache = NULL;
|
2017-11-06 17:53:45 +03:00
|
|
|
|
2017-06-28 15:05:16 +03:00
|
|
|
bdrv_release_named_dirty_bitmaps(bs);
|
|
|
|
assert(QLIST_EMPTY(&bs->dirty_bitmaps));
|
|
|
|
|
2014-06-20 23:57:33 +04:00
|
|
|
QLIST_FOREACH_SAFE(ban, &bs->aio_notifiers, list, ban_next) {
|
|
|
|
g_free(ban);
|
|
|
|
}
|
|
|
|
QLIST_INIT(&bs->aio_notifiers);
|
2015-12-23 13:48:24 +03:00
|
|
|
bdrv_drained_end(bs);
|
2020-10-23 18:01:10 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we're still inside some bdrv_drain_all_begin()/end() sections, end
|
|
|
|
* them now since this BDS won't exist anymore when bdrv_drain_all_end()
|
|
|
|
* gets called.
|
|
|
|
*/
|
|
|
|
if (bs->quiesce_counter) {
|
|
|
|
bdrv_drain_all_end_quiesce(bs);
|
|
|
|
}
|
2004-03-15 00:38:54 +03:00
|
|
|
}
|
|
|
|
|
2010-05-28 06:44:57 +04:00
|
|
|
void bdrv_close_all(void)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2022-09-26 12:32:04 +03:00
|
|
|
assert(job_next(NULL) == NULL);
|
2016-01-29 18:36:14 +03:00
|
|
|
|
|
|
|
/* Drop references from requests still in flight, such as canceled block
|
|
|
|
* jobs whose AIO context has not been polled yet */
|
|
|
|
bdrv_drain_all();
|
2010-05-28 06:44:57 +04:00
|
|
|
|
2016-01-29 18:36:14 +03:00
|
|
|
blk_remove_all_bs();
|
|
|
|
blockdev_close_all_bdrv_states();
|
2014-05-08 18:34:35 +04:00
|
|
|
|
2016-04-08 19:26:37 +03:00
|
|
|
assert(QTAILQ_EMPTY(&all_bdrv_states));
|
2010-05-28 06:44:57 +04:00
|
|
|
}
|
|
|
|
|
2017-03-01 19:30:41 +03:00
|
|
|
static bool should_update_child(BdrvChild *c, BlockDriverState *to)
|
|
|
|
{
|
block: improve should_update_child
As it already said in the comment, we don't want to create loops in
parent->child relations. So, when we try to append @to to @c, we should
check that @c is not in @to children subtree, and we should check it
recursively, not only the first level. The patch provides BFS-based
search, to check the relations.
This is needed for further fleecing-hook filter usage: we need to
append it to source, when the hook is already a parent of target, and
source may be in a backing chain of target (fleecing-scheme). So, on
appending, the hook should not became a child (direct or through
children subtree) of the target.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-23 22:20:39 +03:00
|
|
|
GQueue *queue;
|
|
|
|
GHashTable *found;
|
|
|
|
bool ret;
|
2017-03-01 19:30:41 +03:00
|
|
|
|
2020-05-13 14:05:13 +03:00
|
|
|
if (c->klass->stay_at_node) {
|
2017-03-01 19:30:41 +03:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2018-06-13 21:18:15 +03:00
|
|
|
/* If the child @c belongs to the BDS @to, replacing the current
|
|
|
|
* c->bs by @to would mean to create a loop.
|
|
|
|
*
|
|
|
|
* Such a case occurs when appending a BDS to a backing chain.
|
|
|
|
* For instance, imagine the following chain:
|
|
|
|
*
|
|
|
|
* guest device -> node A -> further backing chain...
|
|
|
|
*
|
|
|
|
* Now we create a new BDS B which we want to put on top of this
|
|
|
|
* chain, so we first attach A as its backing node:
|
|
|
|
*
|
|
|
|
* node B
|
|
|
|
* |
|
|
|
|
* v
|
|
|
|
* guest device -> node A -> further backing chain...
|
|
|
|
*
|
|
|
|
* Finally we want to replace A by B. When doing that, we want to
|
|
|
|
* replace all pointers to A by pointers to B -- except for the
|
|
|
|
* pointer from B because (1) that would create a loop, and (2)
|
|
|
|
* that pointer should simply stay intact:
|
|
|
|
*
|
|
|
|
* guest device -> node B
|
|
|
|
* |
|
|
|
|
* v
|
|
|
|
* node A -> further backing chain...
|
|
|
|
*
|
|
|
|
* In general, when replacing a node A (c->bs) by a node B (@to),
|
|
|
|
* if A is a child of B, that means we cannot replace A by B there
|
|
|
|
* because that would create a loop. Silently detaching A from B
|
|
|
|
* is also not really an option. So overall just leaving A in
|
block: improve should_update_child
As it already said in the comment, we don't want to create loops in
parent->child relations. So, when we try to append @to to @c, we should
check that @c is not in @to children subtree, and we should check it
recursively, not only the first level. The patch provides BFS-based
search, to check the relations.
This is needed for further fleecing-hook filter usage: we need to
append it to source, when the hook is already a parent of target, and
source may be in a backing chain of target (fleecing-scheme). So, on
appending, the hook should not became a child (direct or through
children subtree) of the target.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-23 22:20:39 +03:00
|
|
|
* place there is the most sensible choice.
|
|
|
|
*
|
|
|
|
* We would also create a loop in any cases where @c is only
|
|
|
|
* indirectly referenced by @to. Prevent this by returning false
|
|
|
|
* if @c is found (by breadth-first search) anywhere in the whole
|
|
|
|
* subtree of @to.
|
|
|
|
*/
|
|
|
|
|
|
|
|
ret = true;
|
|
|
|
found = g_hash_table_new(NULL, NULL);
|
|
|
|
g_hash_table_add(found, to);
|
|
|
|
queue = g_queue_new();
|
|
|
|
g_queue_push_tail(queue, to);
|
|
|
|
|
|
|
|
while (!g_queue_is_empty(queue)) {
|
|
|
|
BlockDriverState *v = g_queue_pop_head(queue);
|
|
|
|
BdrvChild *c2;
|
|
|
|
|
|
|
|
QLIST_FOREACH(c2, &v->children, next) {
|
|
|
|
if (c2 == c) {
|
|
|
|
ret = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (g_hash_table_contains(found, c2->bs)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
g_queue_push_tail(queue, c2->bs);
|
|
|
|
g_hash_table_add(found, c2->bs);
|
2017-03-01 19:30:41 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
block: improve should_update_child
As it already said in the comment, we don't want to create loops in
parent->child relations. So, when we try to append @to to @c, we should
check that @c is not in @to children subtree, and we should check it
recursively, not only the first level. The patch provides BFS-based
search, to check the relations.
This is needed for further fleecing-hook filter usage: we need to
append it to source, when the hook is already a parent of target, and
source may be in a backing chain of target (fleecing-scheme). So, on
appending, the hook should not became a child (direct or through
children subtree) of the target.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-23 22:20:39 +03:00
|
|
|
g_queue_free(queue);
|
|
|
|
g_hash_table_destroy(found);
|
|
|
|
|
|
|
|
return ret;
|
2017-03-01 19:30:41 +03:00
|
|
|
}
|
|
|
|
|
2022-07-26 23:11:34 +03:00
|
|
|
static void bdrv_remove_child_commit(void *opaque)
|
2021-04-28 18:17:50 +03:00
|
|
|
{
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
bdrv_child_free(opaque);
|
block: Let replace_child_tran keep indirect pointer
As of a future commit, bdrv_replace_child_noperm() will clear the
indirect BdrvChild pointer passed to it if the new child BDS is NULL.
bdrv_replace_child_tran() will want to let it do that, but revert this
change in its abort handler. For that, we need to have it receive a
BdrvChild ** pointer, too, and keep it stored in the
BdrvReplaceChildState object that we attach to the transaction.
Note that we do not need to store it in the BdrvReplaceChildState when
new_bs is not NULL, because then there is nothing to revert. This is
important so that bdrv_replace_node_noperm() can pass a pointer to a
loop-local variable to bdrv_replace_child_tran() without worrying that
this pointer will outlive one loop iteration.
(Of course, for that to work, bdrv_replace_node_noperm() and in turn
bdrv_replace_node() and its relatives may not be called with a NULL @to
node. Luckily, they already are not, but now we should assert this.)
bdrv_remove_file_or_backing_child() on the other hand needs to ensure
that the indirect pointer it passes will stay valid for the duration of
the transaction. Ensure this by keeping a strong reference to the BDS
whose &bs->backing or &bs->file it passes to bdrv_replace_child_tran(),
and giving up that reference only in the transaction .clean() handler.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211111120829.81329-9-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-9-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-11-15 17:54:04 +03:00
|
|
|
}
|
|
|
|
|
2022-07-26 23:11:34 +03:00
|
|
|
static TransactionActionDrv bdrv_remove_child_drv = {
|
|
|
|
.commit = bdrv_remove_child_commit,
|
2021-04-28 18:17:50 +03:00
|
|
|
};
|
|
|
|
|
2022-07-26 23:11:34 +03:00
|
|
|
/* Function doesn't update permissions, caller is responsible for this. */
|
|
|
|
static void bdrv_remove_child(BdrvChild *child, Transaction *tran)
|
2021-04-28 18:17:50 +03:00
|
|
|
{
|
|
|
|
if (!child) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (child->bs) {
|
2022-11-18 20:41:09 +03:00
|
|
|
BlockDriverState *bs = child->bs;
|
|
|
|
bdrv_drained_begin(bs);
|
2022-07-26 23:11:30 +03:00
|
|
|
bdrv_replace_child_tran(child, NULL, tran);
|
2022-11-18 20:41:09 +03:00
|
|
|
bdrv_drained_end(bs);
|
2021-04-28 18:17:50 +03:00
|
|
|
}
|
|
|
|
|
2022-07-26 23:11:34 +03:00
|
|
|
tran_add(tran, &bdrv_remove_child_drv, child);
|
2021-04-28 18:17:50 +03:00
|
|
|
}
|
|
|
|
|
2022-11-18 20:41:09 +03:00
|
|
|
static void undrain_on_clean_cb(void *opaque)
|
|
|
|
{
|
|
|
|
bdrv_drained_end(opaque);
|
|
|
|
}
|
|
|
|
|
|
|
|
static TransactionActionDrv undrain_on_clean = {
|
|
|
|
.clean = undrain_on_clean_cb,
|
|
|
|
};
|
|
|
|
|
2021-04-28 18:17:48 +03:00
|
|
|
static int bdrv_replace_node_noperm(BlockDriverState *from,
|
|
|
|
BlockDriverState *to,
|
|
|
|
bool auto_skip, Transaction *tran,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
BdrvChild *c, *next;
|
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
block: Let replace_child_tran keep indirect pointer
As of a future commit, bdrv_replace_child_noperm() will clear the
indirect BdrvChild pointer passed to it if the new child BDS is NULL.
bdrv_replace_child_tran() will want to let it do that, but revert this
change in its abort handler. For that, we need to have it receive a
BdrvChild ** pointer, too, and keep it stored in the
BdrvReplaceChildState object that we attach to the transaction.
Note that we do not need to store it in the BdrvReplaceChildState when
new_bs is not NULL, because then there is nothing to revert. This is
important so that bdrv_replace_node_noperm() can pass a pointer to a
loop-local variable to bdrv_replace_child_tran() without worrying that
this pointer will outlive one loop iteration.
(Of course, for that to work, bdrv_replace_node_noperm() and in turn
bdrv_replace_node() and its relatives may not be called with a NULL @to
node. Luckily, they already are not, but now we should assert this.)
bdrv_remove_file_or_backing_child() on the other hand needs to ensure
that the indirect pointer it passes will stay valid for the duration of
the transaction. Ensure this by keeping a strong reference to the BDS
whose &bs->backing or &bs->file it passes to bdrv_replace_child_tran(),
and giving up that reference only in the transaction .clean() handler.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211111120829.81329-9-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-9-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-11-15 17:54:04 +03:00
|
|
|
|
2022-11-18 20:41:09 +03:00
|
|
|
bdrv_drained_begin(from);
|
|
|
|
bdrv_drained_begin(to);
|
|
|
|
tran_add(tran, &undrain_on_clean, from);
|
|
|
|
tran_add(tran, &undrain_on_clean, to);
|
|
|
|
|
2021-04-28 18:17:48 +03:00
|
|
|
QLIST_FOREACH_SAFE(c, &from->parents, next_parent, next) {
|
|
|
|
assert(c->bs == from);
|
|
|
|
if (!should_update_child(c, to)) {
|
|
|
|
if (auto_skip) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
error_setg(errp, "Should not change '%s' link to '%s'",
|
|
|
|
c->name, from->node_name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if (c->frozen) {
|
|
|
|
error_setg(errp, "Cannot change '%s' link to '%s'",
|
|
|
|
c->name, from->node_name);
|
|
|
|
return -EPERM;
|
|
|
|
}
|
2022-07-26 23:11:29 +03:00
|
|
|
bdrv_replace_child_tran(c, to, tran);
|
2021-04-28 18:17:48 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-11-06 15:42:36 +03:00
|
|
|
/*
|
|
|
|
* With auto_skip=true bdrv_replace_node_common skips updating from parents
|
|
|
|
* if it creates a parent-child relation loop or if parent is block-job.
|
|
|
|
*
|
|
|
|
* With auto_skip=false the error is returned if from has a parent which should
|
|
|
|
* not be updated.
|
2021-04-28 18:17:51 +03:00
|
|
|
*
|
|
|
|
* With @detach_subchain=true @to must be in a backing chain of @from. In this
|
|
|
|
* case backing link of the cow-parent of @to is removed.
|
2020-11-06 15:42:36 +03:00
|
|
|
*/
|
2021-02-02 15:49:43 +03:00
|
|
|
static int bdrv_replace_node_common(BlockDriverState *from,
|
|
|
|
BlockDriverState *to,
|
2021-04-28 18:17:51 +03:00
|
|
|
bool auto_skip, bool detach_subchain,
|
|
|
|
Error **errp)
|
2015-06-18 15:09:57 +03:00
|
|
|
{
|
2021-04-28 18:17:45 +03:00
|
|
|
Transaction *tran = tran_new();
|
|
|
|
g_autoptr(GSList) refresh_list = NULL;
|
2021-05-05 10:59:03 +03:00
|
|
|
BlockDriverState *to_cow_parent = NULL;
|
2017-03-02 20:43:00 +03:00
|
|
|
int ret;
|
|
|
|
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
block: Let replace_child_tran keep indirect pointer
As of a future commit, bdrv_replace_child_noperm() will clear the
indirect BdrvChild pointer passed to it if the new child BDS is NULL.
bdrv_replace_child_tran() will want to let it do that, but revert this
change in its abort handler. For that, we need to have it receive a
BdrvChild ** pointer, too, and keep it stored in the
BdrvReplaceChildState object that we attach to the transaction.
Note that we do not need to store it in the BdrvReplaceChildState when
new_bs is not NULL, because then there is nothing to revert. This is
important so that bdrv_replace_node_noperm() can pass a pointer to a
loop-local variable to bdrv_replace_child_tran() without worrying that
this pointer will outlive one loop iteration.
(Of course, for that to work, bdrv_replace_node_noperm() and in turn
bdrv_replace_node() and its relatives may not be called with a NULL @to
node. Luckily, they already are not, but now we should assert this.)
bdrv_remove_file_or_backing_child() on the other hand needs to ensure
that the indirect pointer it passes will stay valid for the duration of
the transaction. Ensure this by keeping a strong reference to the BDS
whose &bs->backing or &bs->file it passes to bdrv_replace_child_tran(),
and giving up that reference only in the transaction .clean() handler.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211111120829.81329-9-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-9-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-11-15 17:54:04 +03:00
|
|
|
|
2021-04-28 18:17:51 +03:00
|
|
|
if (detach_subchain) {
|
|
|
|
assert(bdrv_chain_contains(from, to));
|
|
|
|
assert(from != to);
|
|
|
|
for (to_cow_parent = from;
|
|
|
|
bdrv_filter_or_cow_bs(to_cow_parent) != to;
|
|
|
|
to_cow_parent = bdrv_filter_or_cow_bs(to_cow_parent))
|
|
|
|
{
|
|
|
|
;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-03-02 20:43:00 +03:00
|
|
|
/* Make sure that @from doesn't go away until we have successfully attached
|
|
|
|
* all of its parents to @to. */
|
|
|
|
bdrv_ref(from);
|
2015-06-18 15:09:57 +03:00
|
|
|
|
2019-05-21 20:00:25 +03:00
|
|
|
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
|
2020-03-10 14:38:29 +03:00
|
|
|
assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
|
2019-05-21 20:00:25 +03:00
|
|
|
bdrv_drained_begin(from);
|
|
|
|
|
2021-04-28 18:17:45 +03:00
|
|
|
/*
|
|
|
|
* Do the replacement without permission update.
|
|
|
|
* Replacement may influence the permissions, we should calculate new
|
|
|
|
* permissions based on new graph. If we fail, we'll roll-back the
|
|
|
|
* replacement.
|
|
|
|
*/
|
2021-04-28 18:17:48 +03:00
|
|
|
ret = bdrv_replace_node_noperm(from, to, auto_skip, tran, errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
goto out;
|
2017-03-02 20:43:00 +03:00
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:51 +03:00
|
|
|
if (detach_subchain) {
|
2022-11-07 19:35:56 +03:00
|
|
|
bdrv_remove_child(bdrv_filter_or_cow_child(to_cow_parent), tran);
|
2021-04-28 18:17:51 +03:00
|
|
|
}
|
|
|
|
|
2022-11-07 19:35:58 +03:00
|
|
|
refresh_list = g_slist_prepend(refresh_list, to);
|
|
|
|
refresh_list = g_slist_prepend(refresh_list, from);
|
2016-06-10 21:57:46 +03:00
|
|
|
|
2021-04-28 18:17:45 +03:00
|
|
|
ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
goto out;
|
2015-06-18 15:09:57 +03:00
|
|
|
}
|
2017-03-02 20:43:00 +03:00
|
|
|
|
2021-02-02 15:49:43 +03:00
|
|
|
ret = 0;
|
|
|
|
|
2017-03-02 20:43:00 +03:00
|
|
|
out:
|
2021-04-28 18:17:45 +03:00
|
|
|
tran_finalize(tran, ret);
|
|
|
|
|
2019-05-21 20:00:25 +03:00
|
|
|
bdrv_drained_end(from);
|
2017-03-02 20:43:00 +03:00
|
|
|
bdrv_unref(from);
|
2021-02-02 15:49:43 +03:00
|
|
|
|
|
|
|
return ret;
|
2015-06-18 15:09:57 +03:00
|
|
|
}
|
|
|
|
|
2021-02-02 15:49:43 +03:00
|
|
|
int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
|
|
|
|
Error **errp)
|
2020-11-06 15:42:36 +03:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2021-04-28 18:17:51 +03:00
|
|
|
return bdrv_replace_node_common(from, to, true, false, errp);
|
|
|
|
}
|
|
|
|
|
|
|
|
int bdrv_drop_filter(BlockDriverState *bs, Error **errp)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2021-04-28 18:17:51 +03:00
|
|
|
return bdrv_replace_node_common(bs, bdrv_filter_or_cow_bs(bs), true, true,
|
|
|
|
errp);
|
2020-11-06 15:42:36 +03:00
|
|
|
}
|
|
|
|
|
2012-06-14 18:55:02 +04:00
|
|
|
/*
|
|
|
|
* Add new bs contents at the top of an image chain while the chain is
|
|
|
|
* live, while keeping required fields on the top layer.
|
|
|
|
*
|
|
|
|
* This will modify the BlockDriverState fields, and swap contents
|
|
|
|
* between bs_new and bs_top. Both bs_new and bs_top are modified.
|
|
|
|
*
|
2021-04-28 18:17:49 +03:00
|
|
|
* bs_new must not be attached to a BlockBackend and must not have backing
|
|
|
|
* child.
|
2012-06-14 18:55:02 +04:00
|
|
|
*
|
|
|
|
* This function does not create any image files.
|
2023-02-14 20:16:21 +03:00
|
|
|
*
|
|
|
|
* The caller must hold the AioContext lock for @bs_top.
|
2012-06-14 18:55:02 +04:00
|
|
|
*/
|
2021-02-02 15:49:43 +03:00
|
|
|
int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
|
|
|
|
Error **errp)
|
2012-06-14 18:55:02 +04:00
|
|
|
{
|
2021-04-28 18:17:49 +03:00
|
|
|
int ret;
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
BdrvChild *child;
|
2021-04-28 18:17:49 +03:00
|
|
|
Transaction *tran = tran_new();
|
2023-02-14 20:16:21 +03:00
|
|
|
AioContext *old_context, *new_context = NULL;
|
2021-04-28 18:17:49 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2021-04-28 18:17:49 +03:00
|
|
|
assert(!bs_new->backing);
|
|
|
|
|
2023-02-14 20:16:21 +03:00
|
|
|
old_context = bdrv_get_aio_context(bs_top);
|
|
|
|
|
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
git grep '\->file ='
git grep '\->backing ='
git grep '&.*\<backing\>'
git grep '&.*\<file\>'
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220726201134.924743-14-vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-07-26 23:11:32 +03:00
|
|
|
child = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
|
|
|
|
&child_of_bds, bdrv_backing_role(bs_new),
|
|
|
|
tran, errp);
|
|
|
|
if (!child) {
|
|
|
|
ret = -EINVAL;
|
2021-04-28 18:17:49 +03:00
|
|
|
goto out;
|
2017-02-20 14:46:42 +03:00
|
|
|
}
|
2015-06-18 15:09:57 +03:00
|
|
|
|
2023-02-14 20:16:21 +03:00
|
|
|
/*
|
|
|
|
* bdrv_attach_child_noperm could change the AioContext of bs_top.
|
|
|
|
* bdrv_replace_node_noperm calls bdrv_drained_begin, so let's temporarily
|
|
|
|
* hold the new AioContext, since bdrv_drained_begin calls BDRV_POLL_WHILE
|
|
|
|
* that assumes the new lock is taken.
|
|
|
|
*/
|
|
|
|
new_context = bdrv_get_aio_context(bs_top);
|
|
|
|
|
|
|
|
if (old_context != new_context) {
|
|
|
|
aio_context_release(old_context);
|
|
|
|
aio_context_acquire(new_context);
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:49 +03:00
|
|
|
ret = bdrv_replace_node_noperm(bs_top, bs_new, true, tran, errp);
|
2021-02-02 15:49:43 +03:00
|
|
|
if (ret < 0) {
|
2021-04-28 18:17:49 +03:00
|
|
|
goto out;
|
2017-03-02 20:43:00 +03:00
|
|
|
}
|
2012-06-14 18:55:02 +04:00
|
|
|
|
2022-11-07 19:35:57 +03:00
|
|
|
ret = bdrv_refresh_perms(bs_new, tran, errp);
|
2021-04-28 18:17:49 +03:00
|
|
|
out:
|
|
|
|
tran_finalize(tran, ret);
|
|
|
|
|
2023-05-04 14:57:50 +03:00
|
|
|
bdrv_graph_rdlock_main_loop();
|
2021-04-28 18:17:55 +03:00
|
|
|
bdrv_refresh_limits(bs_top, NULL, NULL);
|
2023-05-04 14:57:50 +03:00
|
|
|
bdrv_graph_rdunlock_main_loop();
|
2021-04-28 18:17:49 +03:00
|
|
|
|
2023-02-14 20:16:21 +03:00
|
|
|
if (new_context && old_context != new_context) {
|
|
|
|
aio_context_release(new_context);
|
|
|
|
aio_context_acquire(old_context);
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:49 +03:00
|
|
|
return ret;
|
2012-02-29 00:54:06 +04:00
|
|
|
}
|
|
|
|
|
2021-08-24 11:38:23 +03:00
|
|
|
/* Not for empty child */
|
|
|
|
int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
Transaction *tran = tran_new();
|
|
|
|
g_autoptr(GSList) refresh_list = NULL;
|
|
|
|
BlockDriverState *old_bs = child->bs;
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2021-08-24 11:38:23 +03:00
|
|
|
bdrv_ref(old_bs);
|
|
|
|
bdrv_drained_begin(old_bs);
|
|
|
|
bdrv_drained_begin(new_bs);
|
|
|
|
|
2022-07-26 23:11:29 +03:00
|
|
|
bdrv_replace_child_tran(child, new_bs, tran);
|
2021-08-24 11:38:23 +03:00
|
|
|
|
2022-11-07 19:35:58 +03:00
|
|
|
refresh_list = g_slist_prepend(refresh_list, old_bs);
|
|
|
|
refresh_list = g_slist_prepend(refresh_list, new_bs);
|
2021-08-24 11:38:23 +03:00
|
|
|
|
|
|
|
ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);
|
|
|
|
|
|
|
|
tran_finalize(tran, ret);
|
|
|
|
|
|
|
|
bdrv_drained_end(old_bs);
|
|
|
|
bdrv_drained_end(new_bs);
|
|
|
|
bdrv_unref(old_bs);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2013-08-23 05:14:47 +04:00
|
|
|
static void bdrv_delete(BlockDriverState *bs)
|
2004-03-15 00:38:54 +03:00
|
|
|
{
|
2014-05-23 17:29:43 +04:00
|
|
|
assert(bdrv_op_blocker_is_empty(bs));
|
2013-08-23 05:14:47 +04:00
|
|
|
assert(!bs->refcnt);
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2010-06-29 18:58:30 +04:00
|
|
|
|
2010-04-10 10:02:42 +04:00
|
|
|
/* remove from list, if necessary */
|
2016-03-18 12:46:57 +03:00
|
|
|
if (bs->node_name[0] != '\0') {
|
|
|
|
QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
|
|
|
|
}
|
2016-01-29 18:36:11 +03:00
|
|
|
QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);
|
|
|
|
|
2019-05-07 11:12:56 +03:00
|
|
|
bdrv_close(bs);
|
|
|
|
|
2011-08-21 07:09:37 +04:00
|
|
|
g_free(bs);
|
2003-06-30 14:03:06 +04:00
|
|
|
}
|
|
|
|
|
2021-09-20 14:55:36 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Replace @bs by newly created block node.
|
|
|
|
*
|
|
|
|
* @options is a QDict of options to pass to the block drivers, or NULL for an
|
|
|
|
* empty set of options. The reference to the QDict belongs to the block layer
|
|
|
|
* after the call (even on failure), so if the caller intends to reuse the
|
|
|
|
* dictionary, it needs to use qobject_ref() before calling bdrv_open.
|
|
|
|
*/
|
|
|
|
BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *options,
|
2020-12-16 09:16:52 +03:00
|
|
|
int flags, Error **errp)
|
|
|
|
{
|
2021-09-20 14:55:35 +03:00
|
|
|
ERRP_GUARD();
|
|
|
|
int ret;
|
2021-09-20 14:55:37 +03:00
|
|
|
BlockDriverState *new_node_bs = NULL;
|
|
|
|
const char *drvname, *node_name;
|
|
|
|
BlockDriver *drv;
|
|
|
|
|
|
|
|
drvname = qdict_get_try_str(options, "driver");
|
|
|
|
if (!drvname) {
|
|
|
|
error_setg(errp, "driver is not specified");
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
|
|
|
drv = bdrv_find_format(drvname);
|
|
|
|
if (!drv) {
|
|
|
|
error_setg(errp, "Unknown driver: '%s'", drvname);
|
|
|
|
goto fail;
|
|
|
|
}
|
2020-12-16 09:16:52 +03:00
|
|
|
|
2021-09-20 14:55:37 +03:00
|
|
|
node_name = qdict_get_try_str(options, "node-name");
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2021-09-20 14:55:37 +03:00
|
|
|
new_node_bs = bdrv_new_open_driver_opts(drv, node_name, options, flags,
|
|
|
|
errp);
|
|
|
|
options = NULL; /* bdrv_new_open_driver() eats options */
|
|
|
|
if (!new_node_bs) {
|
2020-12-16 09:16:52 +03:00
|
|
|
error_prepend(errp, "Could not create node: ");
|
2021-09-20 14:55:37 +03:00
|
|
|
goto fail;
|
2020-12-16 09:16:52 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
bdrv_drained_begin(bs);
|
2021-09-20 14:55:35 +03:00
|
|
|
ret = bdrv_replace_node(bs, new_node_bs, errp);
|
2020-12-16 09:16:52 +03:00
|
|
|
bdrv_drained_end(bs);
|
|
|
|
|
2021-09-20 14:55:35 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
error_prepend(errp, "Could not replace node: ");
|
2021-09-20 14:55:37 +03:00
|
|
|
goto fail;
|
2020-12-16 09:16:52 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
return new_node_bs;
|
2021-09-20 14:55:37 +03:00
|
|
|
|
|
|
|
fail:
|
|
|
|
qobject_unref(options);
|
|
|
|
bdrv_unref(new_node_bs);
|
|
|
|
return NULL;
|
2020-12-16 09:16:52 +03:00
|
|
|
}
|
|
|
|
|
2009-04-22 03:11:50 +04:00
|
|
|
/*
|
|
|
|
* Run consistency checks on an image
|
|
|
|
*
|
2010-06-29 13:43:13 +04:00
|
|
|
* Returns 0 if the check could be completed (it doesn't mean that the image is
|
2011-04-28 19:20:38 +04:00
|
|
|
* free of errors) or -errno when an internal error occurred. The results of the
|
2010-06-29 13:43:13 +04:00
|
|
|
* check are stored in res.
|
2009-04-22 03:11:50 +04:00
|
|
|
*/
|
2020-09-24 21:54:10 +03:00
|
|
|
int coroutine_fn bdrv_co_check(BlockDriverState *bs,
|
|
|
|
BdrvCheckResult *res, BdrvCheckMode fix)
|
2009-04-22 03:11:50 +04:00
|
|
|
{
|
2022-03-03 18:16:09 +03:00
|
|
|
IO_CODE();
|
2022-12-07 16:18:38 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2014-08-08 00:47:55 +04:00
|
|
|
if (bs->drv == NULL) {
|
|
|
|
return -ENOMEDIUM;
|
|
|
|
}
|
2018-03-01 19:36:19 +03:00
|
|
|
if (bs->drv->bdrv_co_check == NULL) {
|
2009-04-22 03:11:50 +04:00
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
|
|
|
|
2010-06-29 13:43:13 +04:00
|
|
|
memset(res, 0, sizeof(*res));
|
2018-03-01 19:36:19 +03:00
|
|
|
return bs->drv->bdrv_co_check(bs, res, fix);
|
|
|
|
}
|
|
|
|
|
2010-01-12 14:55:17 +03:00
|
|
|
/*
|
|
|
|
* Return values:
|
|
|
|
* 0 - success
|
|
|
|
* -EINVAL - backing format specified, but no file
|
|
|
|
* -ENOSPC - can't update the backing file because no space is left in the
|
|
|
|
* image file header
|
|
|
|
* -ENOTSUP - format driver doesn't support changing the backing file
|
|
|
|
*/
|
2020-07-06 23:39:53 +03:00
|
|
|
int bdrv_change_backing_file(BlockDriverState *bs, const char *backing_file,
|
2021-05-04 00:36:00 +03:00
|
|
|
const char *backing_fmt, bool require)
|
2010-01-12 14:55:17 +03:00
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2012-04-12 16:01:02 +04:00
|
|
|
int ret;
|
2010-01-12 14:55:17 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2017-11-10 23:31:09 +03:00
|
|
|
if (!drv) {
|
|
|
|
return -ENOMEDIUM;
|
|
|
|
}
|
|
|
|
|
2012-04-12 16:01:01 +04:00
|
|
|
/* Backing file format doesn't make sense without a backing file */
|
|
|
|
if (backing_fmt && !backing_file) {
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2021-05-04 00:36:00 +03:00
|
|
|
if (require && backing_file && !backing_fmt) {
|
|
|
|
return -EINVAL;
|
2020-07-06 23:39:53 +03:00
|
|
|
}
|
|
|
|
|
2010-01-12 14:55:17 +03:00
|
|
|
if (drv->bdrv_change_backing_file != NULL) {
|
2012-04-12 16:01:02 +04:00
|
|
|
ret = drv->bdrv_change_backing_file(bs, backing_file, backing_fmt);
|
2010-01-12 14:55:17 +03:00
|
|
|
} else {
|
2012-04-12 16:01:02 +04:00
|
|
|
ret = -ENOTSUP;
|
2010-01-12 14:55:17 +03:00
|
|
|
}
|
2012-04-12 16:01:02 +04:00
|
|
|
|
|
|
|
if (ret == 0) {
|
|
|
|
pstrcpy(bs->backing_file, sizeof(bs->backing_file), backing_file ?: "");
|
|
|
|
pstrcpy(bs->backing_format, sizeof(bs->backing_format), backing_fmt ?: "");
|
block: Add BDS.auto_backing_file
If the backing file is overridden, this most probably does change the
guest-visible data of a BDS. Therefore, we will need to consider this
in bdrv_refresh_filename().
To see whether it has been overridden, we might want to compare
bs->backing_file and bs->backing->bs->filename. However,
bs->backing_file is changed by bdrv_set_backing_hd() (which is just used
to change the backing child at runtime, without modifying the image
header), so bs->backing_file most of the time simply contains a copy of
bs->backing->bs->filename anyway, so it is useless for such a
comparison.
This patch adds an auto_backing_file BDS field which contains the
backing file path as indicated by the image header, which is not changed
by bdrv_set_backing_hd().
Because of bdrv_refresh_filename() magic, however, a BDS's filename may
differ from what has been specified during bdrv_open(). Then, the
comparison between bs->auto_backing_file and bs->backing->bs->filename
may fail even though bs->backing was opened from bs->auto_backing_file.
To mitigate this, we can copy the real BDS's filename (after the whole
bdrv_open() and bdrv_refresh_filename() process) into
bs->auto_backing_file, if we know the former has been opened based on
the latter. This is only possible if no options modifying the backing
file's behavior have been specified, though. To simplify things, this
patch only copies the filename from the backing file if no options have
been specified for it at all.
Furthermore, there are cases where an overlay is created by qemu which
already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We
do not need to worry about updating the overlay's bs->auto_backing_file
there, because we actually wrote a post-bdrv_refresh_filename() filename
into the image header.
So all in all, there will be false negatives where (as of a future
patch) bdrv_refresh_filename() will assume that the backing file differs
from what was specified in the image header, even though it really does
not. However, these cases should be limited to where (1) the user
actually did override something in the backing chain (e.g. by specifying
options for the backing file), or (2) the user executed a QMP command to
change some node's backing file (e.g. change-backing-file or
block-commit with @backing-file given) where the given filename does not
happen to coincide with qemu's idea of the backing BDS's filename.
Then again, (1) really is limited to -drive. With -blockdev or
blockdev-add, you have to adhere to the schema, so a user cannot give
partial "unimportant" options (e.g. by just setting backing.node-name
and leaving the rest to the image header). Therefore, trying to fix
this would mean trying to fix something for -drive only.
To improve on (2), we would need a full infrastructure to "canonicalize"
an arbitrary filename (+ options), so it can be compared against
another. That seems a bit over the top, considering that filenames
nowadays are there mostly for the user's entertainment.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:08 +03:00
|
|
|
pstrcpy(bs->auto_backing_file, sizeof(bs->auto_backing_file),
|
|
|
|
backing_file ?: "");
|
2012-04-12 16:01:02 +04:00
|
|
|
}
|
|
|
|
return ret;
|
2010-01-12 14:55:17 +03:00
|
|
|
}
|
|
|
|
|
2012-09-27 21:29:12 +04:00
|
|
|
/*
|
2019-06-12 18:34:45 +03:00
|
|
|
* Finds the first non-filter node above bs in the chain between
|
|
|
|
* active and bs. The returned node is either an immediate parent of
|
|
|
|
* bs, or there are only filter nodes between the two.
|
2012-09-27 21:29:12 +04:00
|
|
|
*
|
|
|
|
* Returns NULL if bs is not found in active's image chain,
|
|
|
|
* or if active == bs.
|
2014-06-25 23:35:26 +04:00
|
|
|
*
|
|
|
|
* Returns the bottommost base image if bs == NULL.
|
2012-09-27 21:29:12 +04:00
|
|
|
*/
|
|
|
|
BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
|
|
|
|
BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
|
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2019-06-12 18:34:45 +03:00
|
|
|
bs = bdrv_skip_filters(bs);
|
|
|
|
active = bdrv_skip_filters(active);
|
|
|
|
|
|
|
|
while (active) {
|
|
|
|
BlockDriverState *next = bdrv_backing_chain_next(active);
|
|
|
|
if (bs == next) {
|
|
|
|
return active;
|
|
|
|
}
|
|
|
|
active = next;
|
2012-09-27 21:29:12 +04:00
|
|
|
}
|
|
|
|
|
2019-06-12 18:34:45 +03:00
|
|
|
return NULL;
|
2014-06-25 23:35:26 +04:00
|
|
|
}
|
2012-09-27 21:29:12 +04:00
|
|
|
|
2014-06-25 23:35:26 +04:00
|
|
|
/* Given a BDS, searches for the base layer. */
|
|
|
|
BlockDriverState *bdrv_find_base(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2014-06-25 23:35:26 +04:00
|
|
|
return bdrv_find_overlay(bs, NULL);
|
2012-09-27 21:29:12 +04:00
|
|
|
}
|
|
|
|
|
2019-03-12 19:48:40 +03:00
|
|
|
/*
|
2019-06-12 17:07:11 +03:00
|
|
|
* Return true if at least one of the COW (backing) and filter links
|
|
|
|
* between @bs and @base is frozen. @errp is set if that's the case.
|
2019-03-28 19:25:09 +03:00
|
|
|
* @base must be reachable from @bs, or NULL.
|
2019-03-12 19:48:40 +03:00
|
|
|
*/
|
|
|
|
bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
BlockDriverState *i;
|
2019-06-12 17:07:11 +03:00
|
|
|
BdrvChild *child;
|
2019-03-12 19:48:40 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2019-06-12 17:07:11 +03:00
|
|
|
for (i = bs; i != base; i = child_bs(child)) {
|
|
|
|
child = bdrv_filter_or_cow_child(i);
|
|
|
|
|
|
|
|
if (child && child->frozen) {
|
2019-03-12 19:48:40 +03:00
|
|
|
error_setg(errp, "Cannot change '%s' link from '%s' to '%s'",
|
2019-06-12 17:07:11 +03:00
|
|
|
child->name, i->node_name, child->bs->node_name);
|
2019-03-12 19:48:40 +03:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2019-06-12 17:07:11 +03:00
|
|
|
* Freeze all COW (backing) and filter links between @bs and @base.
|
2019-03-12 19:48:40 +03:00
|
|
|
* If any of the links is already frozen the operation is aborted and
|
|
|
|
* none of the links are modified.
|
2019-03-28 19:25:09 +03:00
|
|
|
* @base must be reachable from @bs, or NULL.
|
2019-03-12 19:48:40 +03:00
|
|
|
* Returns 0 on success. On failure returns < 0 and sets @errp.
|
|
|
|
*/
|
|
|
|
int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
BlockDriverState *i;
|
2019-06-12 17:07:11 +03:00
|
|
|
BdrvChild *child;
|
2019-03-12 19:48:40 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2019-03-12 19:48:40 +03:00
|
|
|
if (bdrv_is_backing_chain_frozen(bs, base, errp)) {
|
|
|
|
return -EPERM;
|
|
|
|
}
|
|
|
|
|
2019-06-12 17:07:11 +03:00
|
|
|
for (i = bs; i != base; i = child_bs(child)) {
|
|
|
|
child = bdrv_filter_or_cow_child(i);
|
|
|
|
if (child && child->bs->never_freeze) {
|
2019-07-03 20:28:02 +03:00
|
|
|
error_setg(errp, "Cannot freeze '%s' link to '%s'",
|
2019-06-12 17:07:11 +03:00
|
|
|
child->name, child->bs->node_name);
|
2019-07-03 20:28:02 +03:00
|
|
|
return -EPERM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-06-12 17:07:11 +03:00
|
|
|
for (i = bs; i != base; i = child_bs(child)) {
|
|
|
|
child = bdrv_filter_or_cow_child(i);
|
|
|
|
if (child) {
|
|
|
|
child->frozen = true;
|
2019-03-28 19:25:09 +03:00
|
|
|
}
|
2019-03-12 19:48:40 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2019-06-12 17:07:11 +03:00
|
|
|
* Unfreeze all COW (backing) and filter links between @bs and @base.
|
|
|
|
* The caller must ensure that all links are frozen before using this
|
|
|
|
* function.
|
2019-03-28 19:25:09 +03:00
|
|
|
* @base must be reachable from @bs, or NULL.
|
2019-03-12 19:48:40 +03:00
|
|
|
*/
|
|
|
|
void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState *base)
|
|
|
|
{
|
|
|
|
BlockDriverState *i;
|
2019-06-12 17:07:11 +03:00
|
|
|
BdrvChild *child;
|
2019-03-12 19:48:40 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2019-06-12 17:07:11 +03:00
|
|
|
for (i = bs; i != base; i = child_bs(child)) {
|
|
|
|
child = bdrv_filter_or_cow_child(i);
|
|
|
|
if (child) {
|
|
|
|
assert(child->frozen);
|
|
|
|
child->frozen = false;
|
2019-03-28 19:25:09 +03:00
|
|
|
}
|
2019-03-12 19:48:40 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-09-27 21:29:12 +04:00
|
|
|
/*
|
|
|
|
* Drops images above 'base' up to and including 'top', and sets the image
|
|
|
|
* above 'top' to have base as its backing file.
|
|
|
|
*
|
|
|
|
* Requires that the overlay to 'top' is opened r/w, so that the backing file
|
|
|
|
* information in 'bs' can be properly updated.
|
|
|
|
*
|
|
|
|
* E.g., this will convert the following chain:
|
|
|
|
* bottom <- base <- intermediate <- top <- active
|
|
|
|
*
|
|
|
|
* to
|
|
|
|
*
|
|
|
|
* bottom <- base <- active
|
|
|
|
*
|
|
|
|
* It is allowed for bottom==base, in which case it converts:
|
|
|
|
*
|
|
|
|
* base <- intermediate <- top <- active
|
|
|
|
*
|
|
|
|
* to
|
|
|
|
*
|
|
|
|
* base <- active
|
|
|
|
*
|
block: extend block-commit to accept a string for the backing file
On some image chains, QEMU may not always be able to resolve the
filenames properly, when updating the backing file of an image
after a block commit.
For instance, certain relative pathnames may fail, or drives may
have been specified originally by file descriptor (e.g. /dev/fd/???),
or a relative protocol pathname may have been used.
In these instances, QEMU may lack the information to be able to make
the correct choice, but the user or management layer most likely does
have that knowledge.
With this extension to the block-commit api, the user is able to change
the backing file of the overlay image as part of the block-commit
operation.
This allows the change to be 'safe', in the sense that if the attempt
to write the overlay image metadata fails, then the block-commit
operation returns failure, without disrupting the guest.
If the commit top is the active layer, then specifying the backing
file string will be treated as an error (there is no overlay image
to modify in that case).
If a backing file string is not specified in the command, the backing
file string to use is determined in the same manner as it was
previously.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-25 23:40:10 +04:00
|
|
|
* If backing_file_str is non-NULL, it will be used when modifying top's
|
|
|
|
* overlay image metadata.
|
|
|
|
*
|
2012-09-27 21:29:12 +04:00
|
|
|
* Error conditions:
|
|
|
|
* if active == top, that is considered an error
|
|
|
|
*
|
|
|
|
*/
|
2017-06-27 21:36:18 +03:00
|
|
|
int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
|
|
|
|
const char *backing_file_str)
|
2012-09-27 21:29:12 +04:00
|
|
|
{
|
2018-10-31 19:16:38 +03:00
|
|
|
BlockDriverState *explicit_top = top;
|
|
|
|
bool update_inherits_from;
|
2020-11-06 15:42:37 +03:00
|
|
|
BdrvChild *c;
|
2017-02-17 22:42:32 +03:00
|
|
|
Error *local_err = NULL;
|
2012-09-27 21:29:12 +04:00
|
|
|
int ret = -EIO;
|
2020-11-06 15:42:37 +03:00
|
|
|
g_autoptr(GSList) updated_children = NULL;
|
|
|
|
GSList *p;
|
2012-09-27 21:29:12 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2017-06-29 20:32:21 +03:00
|
|
|
bdrv_ref(top);
|
2022-11-18 20:41:03 +03:00
|
|
|
bdrv_drained_begin(base);
|
2017-06-29 20:32:21 +03:00
|
|
|
|
2012-09-27 21:29:12 +04:00
|
|
|
if (!top->drv || !base->drv) {
|
|
|
|
goto exit;
|
|
|
|
}
|
|
|
|
|
2015-09-14 16:33:33 +03:00
|
|
|
/* Make sure that base is in the backing chain of top */
|
|
|
|
if (!bdrv_chain_contains(top, base)) {
|
2012-09-27 21:29:12 +04:00
|
|
|
goto exit;
|
|
|
|
}
|
|
|
|
|
2018-10-31 19:16:38 +03:00
|
|
|
/* If 'base' recursively inherits from 'top' then we should set
|
|
|
|
* base->inherits_from to top->inherits_from after 'top' and all
|
|
|
|
* other intermediate nodes have been dropped.
|
|
|
|
* If 'top' is an implicit node (e.g. "commit_top") we should skip
|
|
|
|
* it because no one inherits from it. We use explicit_top for that. */
|
2019-06-12 18:34:45 +03:00
|
|
|
explicit_top = bdrv_skip_implicit_filters(explicit_top);
|
2018-10-31 19:16:38 +03:00
|
|
|
update_inherits_from = bdrv_inherits_from_recursive(base, explicit_top);
|
|
|
|
|
2012-09-27 21:29:12 +04:00
|
|
|
/* success - we can delete the intermediate states, and link top->base */
|
block: Use bdrv_refresh_filename() to pull
Before this patch, bdrv_refresh_filename() is used in a pushing manner:
Whenever the BDS graph is modified, the parents of the modified edges
are supposed to be updated (recursively upwards). However, that is
nonviable, considering that we want child changes not to concern
parents.
Also, in the long run we want a pull model anyway: Here, we would have a
bdrv_filename() function which returns a BDS's filename, freshly
constructed.
This patch is an intermediate step. It adds bdrv_refresh_filename()
calls before every place a BDS.filename value is used. The only
exceptions are protocol drivers that use their own filename, which
clearly would not profit from refreshing that filename before.
Also, bdrv_get_encrypted_filename() is removed along the way (as a user
of BDS.filename), since it is completely unused.
In turn, all of the calls to bdrv_refresh_filename() before this patch
are removed, because we no longer have to call this function on graph
changes.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:05 +03:00
|
|
|
if (!backing_file_str) {
|
|
|
|
bdrv_refresh_filename(base);
|
|
|
|
backing_file_str = base->filename;
|
|
|
|
}
|
2017-09-19 17:22:54 +03:00
|
|
|
|
2020-11-06 15:42:37 +03:00
|
|
|
QLIST_FOREACH(c, &top->parents, next_parent) {
|
|
|
|
updated_children = g_slist_prepend(updated_children, c);
|
|
|
|
}
|
|
|
|
|
2021-04-28 18:17:51 +03:00
|
|
|
/*
|
|
|
|
* It seems correct to pass detach_subchain=true here, but it triggers
|
|
|
|
* one more yet not fixed bug, when due to nested aio_poll loop we switch to
|
|
|
|
* another drained section, which modify the graph (for example, removing
|
|
|
|
* the child, which we keep in updated_children list). So, it's a TODO.
|
|
|
|
*
|
|
|
|
* Note, bug triggered if pass detach_subchain=true here and run
|
|
|
|
* test-bdrv-drain. test_drop_intermediate_poll() test-case will crash.
|
|
|
|
* That's a FIXME.
|
|
|
|
*/
|
|
|
|
bdrv_replace_node_common(top, base, false, false, &local_err);
|
2020-11-06 15:42:37 +03:00
|
|
|
if (local_err) {
|
|
|
|
error_report_err(local_err);
|
|
|
|
goto exit;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (p = updated_children; p; p = p->next) {
|
|
|
|
c = p->data;
|
2017-02-17 22:42:32 +03:00
|
|
|
|
2020-05-13 14:05:13 +03:00
|
|
|
if (c->klass->update_filename) {
|
|
|
|
ret = c->klass->update_filename(c, base, backing_file_str,
|
|
|
|
&local_err);
|
2017-09-19 17:22:54 +03:00
|
|
|
if (ret < 0) {
|
2020-11-06 15:42:37 +03:00
|
|
|
/*
|
|
|
|
* TODO: Actually, we want to rollback all previous iterations
|
|
|
|
* of this loop, and (which is almost impossible) previous
|
|
|
|
* bdrv_replace_node()...
|
|
|
|
*
|
|
|
|
* Note, that c->klass->update_filename may lead to permission
|
|
|
|
* update, so it's a bad idea to call it inside permission
|
|
|
|
* update transaction of bdrv_replace_node.
|
|
|
|
*/
|
2017-09-19 17:22:54 +03:00
|
|
|
error_report_err(local_err);
|
|
|
|
goto exit;
|
|
|
|
}
|
|
|
|
}
|
2017-02-17 22:42:32 +03:00
|
|
|
}
|
2012-09-27 21:29:12 +04:00
|
|
|
|
2018-10-31 19:16:38 +03:00
|
|
|
if (update_inherits_from) {
|
|
|
|
base->inherits_from = explicit_top->inherits_from;
|
|
|
|
}
|
|
|
|
|
2012-09-27 21:29:12 +04:00
|
|
|
ret = 0;
|
|
|
|
exit:
|
2022-11-18 20:41:03 +03:00
|
|
|
bdrv_drained_end(base);
|
2017-06-29 20:32:21 +03:00
|
|
|
bdrv_unref(top);
|
2012-09-27 21:29:12 +04:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2019-06-12 19:14:13 +03:00
|
|
|
/**
|
2023-01-13 23:42:07 +03:00
|
|
|
* Implementation of BlockDriver.bdrv_co_get_allocated_file_size() that
|
2019-06-12 19:14:13 +03:00
|
|
|
* sums the size of all data-bearing children. (This excludes backing
|
|
|
|
* children.)
|
|
|
|
*/
|
2023-05-04 14:57:43 +03:00
|
|
|
static int64_t coroutine_fn GRAPH_RDLOCK
|
|
|
|
bdrv_sum_allocated_file_size(BlockDriverState *bs)
|
2019-06-12 19:14:13 +03:00
|
|
|
{
|
|
|
|
BdrvChild *child;
|
|
|
|
int64_t child_size, sum = 0;
|
|
|
|
|
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
|
|
|
if (child->role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
|
|
|
|
BDRV_CHILD_FILTERED))
|
|
|
|
{
|
2023-01-13 23:42:07 +03:00
|
|
|
child_size = bdrv_co_get_allocated_file_size(child->bs);
|
2019-06-12 19:14:13 +03:00
|
|
|
if (child_size < 0) {
|
|
|
|
return child_size;
|
|
|
|
}
|
|
|
|
sum += child_size;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return sum;
|
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/**
|
|
|
|
* Length of a allocated file in bytes. Sparse files are counted by actual
|
|
|
|
* allocated space. Return < 0 if error or unknown.
|
|
|
|
*/
|
2023-01-13 23:42:07 +03:00
|
|
|
int64_t coroutine_fn bdrv_co_get_allocated_file_size(BlockDriverState *bs)
|
2009-03-03 20:37:16 +03:00
|
|
|
{
|
2015-04-28 16:27:52 +03:00
|
|
|
BlockDriver *drv = bs->drv;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2023-05-04 14:57:43 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2022-03-03 18:15:50 +03:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (!drv) {
|
|
|
|
return -ENOMEDIUM;
|
2014-03-26 16:06:02 +04:00
|
|
|
}
|
2023-01-13 23:42:07 +03:00
|
|
|
if (drv->bdrv_co_get_allocated_file_size) {
|
|
|
|
return drv->bdrv_co_get_allocated_file_size(bs);
|
2015-04-28 16:27:52 +03:00
|
|
|
}
|
2019-06-12 19:14:13 +03:00
|
|
|
|
|
|
|
if (drv->bdrv_file_open) {
|
|
|
|
/*
|
|
|
|
* Protocol drivers default to -ENOTSUP (most of their data is
|
|
|
|
* not stored in any of their children (if they even have any),
|
|
|
|
* so there is no generic way to figure it out).
|
|
|
|
*/
|
|
|
|
return -ENOTSUP;
|
|
|
|
} else if (drv->is_filter) {
|
|
|
|
/* Filter drivers default to the size of their filtered child */
|
2023-01-13 23:42:07 +03:00
|
|
|
return bdrv_co_get_allocated_file_size(bdrv_filter_bs(bs));
|
2019-06-12 19:14:13 +03:00
|
|
|
} else {
|
|
|
|
/* Other drivers default to summing their children's sizes */
|
|
|
|
return bdrv_sum_allocated_file_size(bs);
|
2011-10-13 16:08:22 +04:00
|
|
|
}
|
|
|
|
}
|
2011-07-15 18:05:00 +04:00
|
|
|
|
2017-07-05 15:57:30 +03:00
|
|
|
/*
|
|
|
|
* bdrv_measure:
|
|
|
|
* @drv: Format driver
|
|
|
|
* @opts: Creation options for new image
|
|
|
|
* @in_bs: Existing image containing data for new image (may be NULL)
|
|
|
|
* @errp: Error object
|
|
|
|
* Returns: A #BlockMeasureInfo (free using qapi_free_BlockMeasureInfo())
|
|
|
|
* or NULL on error
|
|
|
|
*
|
|
|
|
* Calculate file size required to create a new image.
|
|
|
|
*
|
|
|
|
* If @in_bs is given then space for allocated clusters and zero clusters
|
|
|
|
* from that image are included in the calculation. If @opts contains a
|
|
|
|
* backing file that is shared by @in_bs then backing clusters may be omitted
|
|
|
|
* from the calculation.
|
|
|
|
*
|
|
|
|
* If @in_bs is NULL then the calculation includes no allocated clusters
|
|
|
|
* unless a preallocation option is given in @opts.
|
|
|
|
*
|
|
|
|
* Note that @in_bs may use a different BlockDriver from @drv.
|
|
|
|
*
|
|
|
|
* If an error occurs the @errp pointer is set.
|
|
|
|
*/
|
|
|
|
BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts,
|
|
|
|
BlockDriverState *in_bs, Error **errp)
|
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2017-07-05 15:57:30 +03:00
|
|
|
if (!drv->bdrv_measure) {
|
|
|
|
error_setg(errp, "Block driver '%s' does not support size measurement",
|
|
|
|
drv->format_name);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return drv->bdrv_measure(opts, in_bs, errp);
|
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/**
|
|
|
|
* Return number of sectors on success, -errno on error.
|
2011-10-13 16:08:22 +04:00
|
|
|
*/
|
2023-01-13 23:42:04 +03:00
|
|
|
int64_t coroutine_fn bdrv_co_nb_sectors(BlockDriverState *bs)
|
2011-10-13 16:08:22 +04:00
|
|
|
{
|
2015-04-28 16:27:52 +03:00
|
|
|
BlockDriver *drv = bs->drv;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2023-02-03 18:22:02 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2012-04-02 14:59:34 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (!drv)
|
|
|
|
return -ENOMEDIUM;
|
2014-05-08 18:34:34 +04:00
|
|
|
|
2023-04-07 18:32:56 +03:00
|
|
|
if (bs->bl.has_variable_length) {
|
2023-01-13 23:42:04 +03:00
|
|
|
int ret = bdrv_co_refresh_total_sectors(bs, bs->total_sectors);
|
2015-04-28 16:27:52 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
2011-10-13 16:08:22 +04:00
|
|
|
}
|
|
|
|
}
|
2015-04-28 16:27:52 +03:00
|
|
|
return bs->total_sectors;
|
2011-10-13 16:08:22 +04:00
|
|
|
}
|
2004-03-15 00:38:54 +03:00
|
|
|
|
2023-04-07 18:33:03 +03:00
|
|
|
/*
|
|
|
|
* This wrapper is written by hand because this function is in the hot I/O path,
|
|
|
|
* via blk_get_geometry.
|
|
|
|
*/
|
|
|
|
int64_t coroutine_mixed_fn bdrv_nb_sectors(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
|
|
|
IO_CODE();
|
|
|
|
|
|
|
|
if (!drv)
|
|
|
|
return -ENOMEDIUM;
|
|
|
|
|
|
|
|
if (bs->bl.has_variable_length) {
|
|
|
|
int ret = bdrv_refresh_total_sectors(bs, bs->total_sectors);
|
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return bs->total_sectors;
|
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/**
|
|
|
|
* Return length in bytes on success, -errno on error.
|
|
|
|
* The length is always a multiple of BDRV_SECTOR_SIZE.
|
2013-04-05 23:27:55 +04:00
|
|
|
*/
|
2023-01-13 23:42:04 +03:00
|
|
|
int64_t coroutine_fn bdrv_co_getlength(BlockDriverState *bs)
|
2013-04-05 23:27:55 +04:00
|
|
|
{
|
2023-01-13 23:42:04 +03:00
|
|
|
int64_t ret;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2023-02-03 18:22:02 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2013-04-05 23:27:55 +04:00
|
|
|
|
2023-01-13 23:42:04 +03:00
|
|
|
ret = bdrv_co_nb_sectors(bs);
|
2020-11-05 18:51:22 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
if (ret > INT64_MAX / BDRV_SECTOR_SIZE) {
|
|
|
|
return -EFBIG;
|
|
|
|
}
|
|
|
|
return ret * BDRV_SECTOR_SIZE;
|
2003-06-30 14:03:06 +04:00
|
|
|
}
|
|
|
|
|
2016-06-24 01:37:26 +03:00
|
|
|
bool bdrv_is_sg(BlockDriverState *bs)
|
2010-06-16 18:38:15 +04:00
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
return bs->sg;
|
2010-06-16 18:38:15 +04:00
|
|
|
}
|
|
|
|
|
2019-06-12 23:57:15 +03:00
|
|
|
/**
|
|
|
|
* Return whether the given node supports compressed writes.
|
|
|
|
*/
|
|
|
|
bool bdrv_supports_compressed_writes(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
BlockDriverState *filtered;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2019-06-12 23:57:15 +03:00
|
|
|
|
|
|
|
if (!bs->drv || !block_driver_can_compress(bs->drv)) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
filtered = bdrv_filter_bs(bs);
|
|
|
|
if (filtered) {
|
|
|
|
/*
|
|
|
|
* Filters can only forward compressed writes, so we have to
|
|
|
|
* check the child.
|
|
|
|
*/
|
|
|
|
return bdrv_supports_compressed_writes(filtered);
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
const char *bdrv_get_format_name(BlockDriverState *bs)
|
2009-09-09 19:53:37 +04:00
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
return bs->drv ? bs->drv->format_name : NULL;
|
2009-09-09 19:53:37 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
static int qsort_strcmp(const void *a, const void *b)
|
2009-09-09 19:53:37 +04:00
|
|
|
{
|
2016-10-12 23:49:05 +03:00
|
|
|
return strcmp(*(char *const *)a, *(char *const *)b);
|
2009-09-09 19:53:37 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
void bdrv_iterate_format(void (*it)(void *opaque, const char *name),
|
2019-03-07 16:33:58 +03:00
|
|
|
void *opaque, bool read_only)
|
2009-09-09 19:53:37 +04:00
|
|
|
{
|
2015-04-28 16:27:52 +03:00
|
|
|
BlockDriver *drv;
|
|
|
|
int count = 0;
|
|
|
|
int i;
|
|
|
|
const char **formats = NULL;
|
2009-09-09 19:53:37 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
QLIST_FOREACH(drv, &bdrv_drivers, list) {
|
|
|
|
if (drv->format_name) {
|
|
|
|
bool found = false;
|
|
|
|
int i = count;
|
2019-03-07 16:33:58 +03:00
|
|
|
|
|
|
|
if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, read_only)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
while (formats && i && !found) {
|
|
|
|
found = !strcmp(formats[--i], drv->format_name);
|
|
|
|
}
|
2010-01-26 16:49:08 +03:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (!found) {
|
|
|
|
formats = g_renew(const char *, formats, count + 1);
|
|
|
|
formats[count++] = drv->format_name;
|
|
|
|
}
|
2014-10-27 12:18:46 +03:00
|
|
|
}
|
2015-04-28 16:27:52 +03:00
|
|
|
}
|
2014-10-27 12:18:46 +03:00
|
|
|
|
2016-10-12 23:49:06 +03:00
|
|
|
for (i = 0; i < (int)ARRAY_SIZE(block_driver_modules); i++) {
|
|
|
|
const char *format_name = block_driver_modules[i].format_name;
|
|
|
|
|
|
|
|
if (format_name) {
|
|
|
|
bool found = false;
|
|
|
|
int j = count;
|
|
|
|
|
2019-03-07 16:33:58 +03:00
|
|
|
if (use_bdrv_whitelist &&
|
|
|
|
!bdrv_format_is_whitelisted(format_name, read_only)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2016-10-12 23:49:06 +03:00
|
|
|
while (formats && j && !found) {
|
|
|
|
found = !strcmp(formats[--j], format_name);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!found) {
|
|
|
|
formats = g_renew(const char *, formats, count + 1);
|
|
|
|
formats[count++] = format_name;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
qsort(formats, count, sizeof(formats[0]), qsort_strcmp);
|
2009-09-09 19:53:37 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
for (i = 0; i < count; i++) {
|
|
|
|
it(opaque, formats[i]);
|
|
|
|
}
|
2009-09-09 19:53:37 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
g_free(formats);
|
|
|
|
}
|
2009-09-09 19:53:37 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/* This function is to find a node in the bs graph */
|
|
|
|
BlockDriverState *bdrv_find_node(const char *node_name)
|
|
|
|
{
|
|
|
|
BlockDriverState *bs;
|
2014-07-30 12:53:30 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
assert(node_name);
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2009-09-09 19:53:37 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
QTAILQ_FOREACH(bs, &graph_bdrv_states, node_list) {
|
|
|
|
if (!strcmp(node_name, bs->node_name)) {
|
|
|
|
return bs;
|
2009-09-09 19:53:37 +04:00
|
|
|
}
|
|
|
|
}
|
2015-04-28 16:27:52 +03:00
|
|
|
return NULL;
|
2009-09-09 19:53:37 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/* Put this QMP function here so it can access the static graph_bdrv_states. */
|
2020-01-20 11:50:49 +03:00
|
|
|
BlockDeviceInfoList *bdrv_named_nodes_list(bool flat,
|
|
|
|
Error **errp)
|
2009-09-09 19:53:37 +04:00
|
|
|
{
|
qapi: Add QAPI_LIST_PREPEND() macro
block.c has a useful macro QAPI_LIST_ADD() for inserting at the front
of any QAPI-generated list; move it from block.c to qapi/util.h so
more places can use it, including one earlier place in block.c, and
rename it to something more obvious (since we also have a lot of
places that append, rather than prepend, to a list).
There are many more places in the codebase that can benefit from using
the macro, but converting them will be left to later patches.
In theory, all QAPI list types are child classes of GenericList; but
in practice, that relationship is not explicitly spelled out in the C
type declarations (rather, it is something that happens implicitly due
to C compatible layouts), and the macro does not actually depend on
the GenericList type. We considered moving GenericList from visitor.h
into util.h to group related code; however, such a move would be
awkward if we do not also move GenericAlternate. Unfortunately,
moving GenericAlternate would introduce its own problems of
declaration circularity (qapi-builtin-types.h needs a complete
definition of QEnumLookup from util.h, but GenericAlternate needs a
complete definition of QType from qapi-builtin-types.h).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201027050556.269064-3-eblake@redhat.com>
[eblake: s/ADD/PREPEND/ per suggestion by Markus]
2020-10-27 08:05:47 +03:00
|
|
|
BlockDeviceInfoList *list;
|
2015-04-28 16:27:52 +03:00
|
|
|
BlockDriverState *bs;
|
2009-09-09 19:53:37 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
list = NULL;
|
|
|
|
QTAILQ_FOREACH(bs, &graph_bdrv_states, node_list) {
|
2020-01-20 11:50:49 +03:00
|
|
|
BlockDeviceInfo *info = bdrv_block_device_info(NULL, bs, flat, errp);
|
2015-04-28 16:27:52 +03:00
|
|
|
if (!info) {
|
|
|
|
qapi_free_BlockDeviceInfoList(list);
|
|
|
|
return NULL;
|
2011-03-07 19:01:04 +03:00
|
|
|
}
|
qapi: Add QAPI_LIST_PREPEND() macro
block.c has a useful macro QAPI_LIST_ADD() for inserting at the front
of any QAPI-generated list; move it from block.c to qapi/util.h so
more places can use it, including one earlier place in block.c, and
rename it to something more obvious (since we also have a lot of
places that append, rather than prepend, to a list).
There are many more places in the codebase that can benefit from using
the macro, but converting them will be left to later patches.
In theory, all QAPI list types are child classes of GenericList; but
in practice, that relationship is not explicitly spelled out in the C
type declarations (rather, it is something that happens implicitly due
to C compatible layouts), and the macro does not actually depend on
the GenericList type. We considered moving GenericList from visitor.h
into util.h to group related code; however, such a move would be
awkward if we do not also move GenericAlternate. Unfortunately,
moving GenericAlternate would introduce its own problems of
declaration circularity (qapi-builtin-types.h needs a complete
definition of QEnumLookup from util.h, but GenericAlternate needs a
complete definition of QType from qapi-builtin-types.h).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201027050556.269064-3-eblake@redhat.com>
[eblake: s/ADD/PREPEND/ per suggestion by Markus]
2020-10-27 08:05:47 +03:00
|
|
|
QAPI_LIST_PREPEND(list, info);
|
2011-03-07 19:01:04 +03:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
return list;
|
|
|
|
}
|
2009-09-09 19:53:37 +04:00
|
|
|
|
2018-12-21 20:09:07 +03:00
|
|
|
typedef struct XDbgBlockGraphConstructor {
|
|
|
|
XDbgBlockGraph *graph;
|
|
|
|
GHashTable *graph_nodes;
|
|
|
|
} XDbgBlockGraphConstructor;
|
|
|
|
|
|
|
|
static XDbgBlockGraphConstructor *xdbg_graph_new(void)
|
|
|
|
{
|
|
|
|
XDbgBlockGraphConstructor *gr = g_new(XDbgBlockGraphConstructor, 1);
|
|
|
|
|
|
|
|
gr->graph = g_new0(XDbgBlockGraph, 1);
|
|
|
|
gr->graph_nodes = g_hash_table_new(NULL, NULL);
|
|
|
|
|
|
|
|
return gr;
|
|
|
|
}
|
|
|
|
|
|
|
|
static XDbgBlockGraph *xdbg_graph_finalize(XDbgBlockGraphConstructor *gr)
|
|
|
|
{
|
|
|
|
XDbgBlockGraph *graph = gr->graph;
|
|
|
|
|
|
|
|
g_hash_table_destroy(gr->graph_nodes);
|
|
|
|
g_free(gr);
|
|
|
|
|
|
|
|
return graph;
|
|
|
|
}
|
|
|
|
|
|
|
|
static uintptr_t xdbg_graph_node_num(XDbgBlockGraphConstructor *gr, void *node)
|
|
|
|
{
|
|
|
|
uintptr_t ret = (uintptr_t)g_hash_table_lookup(gr->graph_nodes, node);
|
|
|
|
|
|
|
|
if (ret != 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Start counting from 1, not 0, because 0 interferes with not-found (NULL)
|
|
|
|
* answer of g_hash_table_lookup.
|
|
|
|
*/
|
|
|
|
ret = g_hash_table_size(gr->graph_nodes) + 1;
|
|
|
|
g_hash_table_insert(gr->graph_nodes, node, (void *)ret);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xdbg_graph_add_node(XDbgBlockGraphConstructor *gr, void *node,
|
|
|
|
XDbgBlockGraphNodeType type, const char *name)
|
|
|
|
{
|
|
|
|
XDbgBlockGraphNode *n;
|
|
|
|
|
|
|
|
n = g_new0(XDbgBlockGraphNode, 1);
|
|
|
|
|
|
|
|
n->id = xdbg_graph_node_num(gr, node);
|
|
|
|
n->type = type;
|
|
|
|
n->name = g_strdup(name);
|
|
|
|
|
qapi: Add QAPI_LIST_PREPEND() macro
block.c has a useful macro QAPI_LIST_ADD() for inserting at the front
of any QAPI-generated list; move it from block.c to qapi/util.h so
more places can use it, including one earlier place in block.c, and
rename it to something more obvious (since we also have a lot of
places that append, rather than prepend, to a list).
There are many more places in the codebase that can benefit from using
the macro, but converting them will be left to later patches.
In theory, all QAPI list types are child classes of GenericList; but
in practice, that relationship is not explicitly spelled out in the C
type declarations (rather, it is something that happens implicitly due
to C compatible layouts), and the macro does not actually depend on
the GenericList type. We considered moving GenericList from visitor.h
into util.h to group related code; however, such a move would be
awkward if we do not also move GenericAlternate. Unfortunately,
moving GenericAlternate would introduce its own problems of
declaration circularity (qapi-builtin-types.h needs a complete
definition of QEnumLookup from util.h, but GenericAlternate needs a
complete definition of QType from qapi-builtin-types.h).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201027050556.269064-3-eblake@redhat.com>
[eblake: s/ADD/PREPEND/ per suggestion by Markus]
2020-10-27 08:05:47 +03:00
|
|
|
QAPI_LIST_PREPEND(gr->graph->nodes, n);
|
2018-12-21 20:09:07 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static void xdbg_graph_add_edge(XDbgBlockGraphConstructor *gr, void *parent,
|
|
|
|
const BdrvChild *child)
|
|
|
|
{
|
2019-11-08 15:34:52 +03:00
|
|
|
BlockPermission qapi_perm;
|
2018-12-21 20:09:07 +03:00
|
|
|
XDbgBlockGraphEdge *edge;
|
2022-03-03 18:15:55 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2018-12-21 20:09:07 +03:00
|
|
|
|
|
|
|
edge = g_new0(XDbgBlockGraphEdge, 1);
|
|
|
|
|
|
|
|
edge->parent = xdbg_graph_node_num(gr, parent);
|
|
|
|
edge->child = xdbg_graph_node_num(gr, child->bs);
|
|
|
|
edge->name = g_strdup(child->name);
|
|
|
|
|
2019-11-08 15:34:52 +03:00
|
|
|
for (qapi_perm = 0; qapi_perm < BLOCK_PERMISSION__MAX; qapi_perm++) {
|
|
|
|
uint64_t flag = bdrv_qapi_perm_to_blk_perm(qapi_perm);
|
|
|
|
|
|
|
|
if (flag & child->perm) {
|
qapi: Add QAPI_LIST_PREPEND() macro
block.c has a useful macro QAPI_LIST_ADD() for inserting at the front
of any QAPI-generated list; move it from block.c to qapi/util.h so
more places can use it, including one earlier place in block.c, and
rename it to something more obvious (since we also have a lot of
places that append, rather than prepend, to a list).
There are many more places in the codebase that can benefit from using
the macro, but converting them will be left to later patches.
In theory, all QAPI list types are child classes of GenericList; but
in practice, that relationship is not explicitly spelled out in the C
type declarations (rather, it is something that happens implicitly due
to C compatible layouts), and the macro does not actually depend on
the GenericList type. We considered moving GenericList from visitor.h
into util.h to group related code; however, such a move would be
awkward if we do not also move GenericAlternate. Unfortunately,
moving GenericAlternate would introduce its own problems of
declaration circularity (qapi-builtin-types.h needs a complete
definition of QEnumLookup from util.h, but GenericAlternate needs a
complete definition of QType from qapi-builtin-types.h).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201027050556.269064-3-eblake@redhat.com>
[eblake: s/ADD/PREPEND/ per suggestion by Markus]
2020-10-27 08:05:47 +03:00
|
|
|
QAPI_LIST_PREPEND(edge->perm, qapi_perm);
|
2018-12-21 20:09:07 +03:00
|
|
|
}
|
2019-11-08 15:34:52 +03:00
|
|
|
if (flag & child->shared_perm) {
|
qapi: Add QAPI_LIST_PREPEND() macro
block.c has a useful macro QAPI_LIST_ADD() for inserting at the front
of any QAPI-generated list; move it from block.c to qapi/util.h so
more places can use it, including one earlier place in block.c, and
rename it to something more obvious (since we also have a lot of
places that append, rather than prepend, to a list).
There are many more places in the codebase that can benefit from using
the macro, but converting them will be left to later patches.
In theory, all QAPI list types are child classes of GenericList; but
in practice, that relationship is not explicitly spelled out in the C
type declarations (rather, it is something that happens implicitly due
to C compatible layouts), and the macro does not actually depend on
the GenericList type. We considered moving GenericList from visitor.h
into util.h to group related code; however, such a move would be
awkward if we do not also move GenericAlternate. Unfortunately,
moving GenericAlternate would introduce its own problems of
declaration circularity (qapi-builtin-types.h needs a complete
definition of QEnumLookup from util.h, but GenericAlternate needs a
complete definition of QType from qapi-builtin-types.h).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201027050556.269064-3-eblake@redhat.com>
[eblake: s/ADD/PREPEND/ per suggestion by Markus]
2020-10-27 08:05:47 +03:00
|
|
|
QAPI_LIST_PREPEND(edge->shared_perm, qapi_perm);
|
2018-12-21 20:09:07 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
qapi: Add QAPI_LIST_PREPEND() macro
block.c has a useful macro QAPI_LIST_ADD() for inserting at the front
of any QAPI-generated list; move it from block.c to qapi/util.h so
more places can use it, including one earlier place in block.c, and
rename it to something more obvious (since we also have a lot of
places that append, rather than prepend, to a list).
There are many more places in the codebase that can benefit from using
the macro, but converting them will be left to later patches.
In theory, all QAPI list types are child classes of GenericList; but
in practice, that relationship is not explicitly spelled out in the C
type declarations (rather, it is something that happens implicitly due
to C compatible layouts), and the macro does not actually depend on
the GenericList type. We considered moving GenericList from visitor.h
into util.h to group related code; however, such a move would be
awkward if we do not also move GenericAlternate. Unfortunately,
moving GenericAlternate would introduce its own problems of
declaration circularity (qapi-builtin-types.h needs a complete
definition of QEnumLookup from util.h, but GenericAlternate needs a
complete definition of QType from qapi-builtin-types.h).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201027050556.269064-3-eblake@redhat.com>
[eblake: s/ADD/PREPEND/ per suggestion by Markus]
2020-10-27 08:05:47 +03:00
|
|
|
QAPI_LIST_PREPEND(gr->graph->edges, edge);
|
2018-12-21 20:09:07 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
|
|
|
|
{
|
|
|
|
BlockBackend *blk;
|
|
|
|
BlockJob *job;
|
|
|
|
BlockDriverState *bs;
|
|
|
|
BdrvChild *child;
|
|
|
|
XDbgBlockGraphConstructor *gr = xdbg_graph_new();
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2018-12-21 20:09:07 +03:00
|
|
|
for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) {
|
|
|
|
char *allocated_name = NULL;
|
|
|
|
const char *name = blk_name(blk);
|
|
|
|
|
|
|
|
if (!*name) {
|
|
|
|
name = allocated_name = blk_get_attached_dev_id(blk);
|
|
|
|
}
|
|
|
|
xdbg_graph_add_node(gr, blk, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_BACKEND,
|
|
|
|
name);
|
|
|
|
g_free(allocated_name);
|
|
|
|
if (blk_root(blk)) {
|
|
|
|
xdbg_graph_add_edge(gr, blk, blk_root(blk));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-09-26 12:32:04 +03:00
|
|
|
WITH_JOB_LOCK_GUARD() {
|
|
|
|
for (job = block_job_next_locked(NULL); job;
|
|
|
|
job = block_job_next_locked(job)) {
|
|
|
|
GSList *el;
|
2018-12-21 20:09:07 +03:00
|
|
|
|
2022-09-26 12:32:04 +03:00
|
|
|
xdbg_graph_add_node(gr, job, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_JOB,
|
|
|
|
job->job.id);
|
|
|
|
for (el = job->nodes; el; el = el->next) {
|
|
|
|
xdbg_graph_add_edge(gr, job, (BdrvChild *)el->data);
|
|
|
|
}
|
2018-12-21 20:09:07 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
QTAILQ_FOREACH(bs, &graph_bdrv_states, node_list) {
|
|
|
|
xdbg_graph_add_node(gr, bs, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_DRIVER,
|
|
|
|
bs->node_name);
|
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
|
|
|
xdbg_graph_add_edge(gr, bs, child);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return xdbg_graph_finalize(gr);
|
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
BlockDriverState *bdrv_lookup_bs(const char *device,
|
|
|
|
const char *node_name,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
BlockBackend *blk;
|
|
|
|
BlockDriverState *bs;
|
2009-09-09 19:53:37 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (device) {
|
|
|
|
blk = blk_by_name(device);
|
2009-09-09 19:53:37 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (blk) {
|
2015-10-26 17:46:49 +03:00
|
|
|
bs = blk_bs(blk);
|
|
|
|
if (!bs) {
|
2015-10-19 18:53:29 +03:00
|
|
|
error_setg(errp, "Device '%s' has no medium", device);
|
|
|
|
}
|
|
|
|
|
2015-10-26 17:46:49 +03:00
|
|
|
return bs;
|
2015-04-28 16:27:52 +03:00
|
|
|
}
|
|
|
|
}
|
2009-09-09 19:53:37 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (node_name) {
|
|
|
|
bs = bdrv_find_node(node_name);
|
2010-05-22 21:15:08 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (bs) {
|
|
|
|
return bs;
|
|
|
|
}
|
2009-09-09 19:53:37 +04:00
|
|
|
}
|
|
|
|
|
2021-03-05 18:19:28 +03:00
|
|
|
error_setg(errp, "Cannot find device=\'%s\' nor node-name=\'%s\'",
|
2015-04-28 16:27:52 +03:00
|
|
|
device ? device : "",
|
|
|
|
node_name ? node_name : "");
|
|
|
|
return NULL;
|
2009-09-09 19:53:37 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/* If 'base' is in the same chain as 'top', return true. Otherwise,
|
|
|
|
* return false. If either argument is NULL, return false. */
|
|
|
|
bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
|
2006-08-01 20:21:11 +04:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
|
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
while (top && top != base) {
|
2019-06-12 18:34:45 +03:00
|
|
|
top = bdrv_filter_or_cow_bs(top);
|
2014-09-11 09:41:09 +04:00
|
|
|
}
|
2015-04-28 16:27:52 +03:00
|
|
|
|
|
|
|
return top != NULL;
|
2014-09-11 09:41:09 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
BlockDriverState *bdrv_next_node(BlockDriverState *bs)
|
2014-09-11 09:41:09 +04:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
if (!bs) {
|
|
|
|
return QTAILQ_FIRST(&graph_bdrv_states);
|
2014-09-11 09:41:09 +04:00
|
|
|
}
|
2015-04-28 16:27:52 +03:00
|
|
|
return QTAILQ_NEXT(bs, node_list);
|
2006-08-01 20:21:11 +04:00
|
|
|
}
|
|
|
|
|
2018-03-28 19:29:18 +03:00
|
|
|
BlockDriverState *bdrv_next_all_states(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2018-03-28 19:29:18 +03:00
|
|
|
if (!bs) {
|
|
|
|
return QTAILQ_FIRST(&all_bdrv_states);
|
|
|
|
}
|
|
|
|
return QTAILQ_NEXT(bs, bs_list);
|
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
const char *bdrv_get_node_name(const BlockDriverState *bs)
|
2006-08-01 20:21:11 +04:00
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
return bs->node_name;
|
2006-06-27 00:08:57 +04:00
|
|
|
}
|
|
|
|
|
2016-03-22 20:38:44 +03:00
|
|
|
const char *bdrv_get_parent_name(const BlockDriverState *bs)
|
2016-02-26 12:22:16 +03:00
|
|
|
{
|
|
|
|
BdrvChild *c;
|
|
|
|
const char *name;
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
2016-02-26 12:22:16 +03:00
|
|
|
|
|
|
|
/* If multiple parents have a name, just pick the first one. */
|
|
|
|
QLIST_FOREACH(c, &bs->parents, next_parent) {
|
2020-05-13 14:05:13 +03:00
|
|
|
if (c->klass->get_name) {
|
|
|
|
name = c->klass->get_name(c);
|
2016-02-26 12:22:16 +03:00
|
|
|
if (name && *name) {
|
|
|
|
return name;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/* TODO check what callers really want: bs->node_name or blk_name() */
|
|
|
|
const char *bdrv_get_device_name(const BlockDriverState *bs)
|
2006-06-27 00:08:57 +04:00
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2016-02-26 12:22:16 +03:00
|
|
|
return bdrv_get_parent_name(bs) ?: "";
|
2009-04-07 22:43:24 +04:00
|
|
|
}
|
2006-08-01 20:21:11 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/* This can be used to identify nodes that might not have a device
|
|
|
|
* name associated. Since node and device names live in the same
|
|
|
|
* namespace, the result is unambiguous. The exception is if both are
|
|
|
|
* absent, then this returns an empty (non-null) string. */
|
|
|
|
const char *bdrv_get_device_or_node_name(const BlockDriverState *bs)
|
2009-04-07 22:43:24 +04:00
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2016-02-26 12:22:16 +03:00
|
|
|
return bdrv_get_parent_name(bs) ?: bs->node_name;
|
2006-06-27 00:08:57 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
int bdrv_get_flags(BlockDriverState *bs)
|
2015-03-28 09:37:18 +03:00
|
|
|
{
|
2022-04-27 14:40:54 +03:00
|
|
|
IO_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
return bs->open_flags;
|
2015-03-28 09:37:18 +03:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
int bdrv_has_zero_init_1(BlockDriverState *bs)
|
2011-06-30 12:05:46 +04:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
return 1;
|
2015-03-28 09:37:18 +03:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
int bdrv_has_zero_init(BlockDriverState *bs)
|
2015-03-28 09:37:18 +03:00
|
|
|
{
|
2019-06-12 18:03:38 +03:00
|
|
|
BlockDriverState *filtered;
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2019-06-12 18:03:38 +03:00
|
|
|
|
2017-11-10 23:31:09 +03:00
|
|
|
if (!bs->drv) {
|
|
|
|
return 0;
|
|
|
|
}
|
2015-03-28 09:37:18 +03:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/* If BS is a copy on write image, it is initialized to
|
|
|
|
the contents of the base image, which may not be zeroes. */
|
2019-06-12 18:10:46 +03:00
|
|
|
if (bdrv_cow_child(bs)) {
|
2015-04-28 16:27:52 +03:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
if (bs->drv->bdrv_has_zero_init) {
|
|
|
|
return bs->drv->bdrv_has_zero_init(bs);
|
2015-03-28 09:37:18 +03:00
|
|
|
}
|
2019-06-12 18:03:38 +03:00
|
|
|
|
|
|
|
filtered = bdrv_filter_bs(bs);
|
|
|
|
if (filtered) {
|
|
|
|
return bdrv_has_zero_init(filtered);
|
2017-07-13 18:30:25 +03:00
|
|
|
}
|
2015-04-28 16:27:52 +03:00
|
|
|
|
|
|
|
/* safe default */
|
|
|
|
return 0;
|
2011-06-30 12:05:46 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs)
|
2011-06-30 12:05:46 +04:00
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2016-07-14 16:33:26 +03:00
|
|
|
if (!(bs->open_flags & BDRV_O_UNMAP)) {
|
2015-04-28 16:27:52 +03:00
|
|
|
return false;
|
|
|
|
}
|
2011-06-30 12:05:46 +04:00
|
|
|
|
block: Simplify bdrv_can_write_zeroes_with_unmap()
We don't need the can_write_zeroes_with_unmap field in
BlockDriverInfo, because it is redundant information with
supported_zero_flags & BDRV_REQ_MAY_UNMAP. Note that
BlockDriverInfo and supported_zero_flags are both per-device
settings, rather than global state about the driver as a
whole, which means one or both of these bits of information
can already be conditional. Let's audit how they were set:
crypto: always setting can_write_ to false is pointless (the
struct starts life zero-initialized), no use of supported_
nbd: just recently fixed to set can_write_ if supported_
includes MAY_UNMAP (thus this commit effectively reverts
bca80059e and solves the problem mentioned there in a more
global way)
file-posix, iscsi, qcow2: can_write_ is conditional, while
supported_ was unconditional; but passing MAY_UNMAP would
fail with ENOTSUP if the condition wasn't met
qed: can_write_ is unconditional, but pwrite_zeroes lacks
support for MAY_UNMAP and supported_ is not set. Perhaps
support can be added later (since it would be similar to
qcow2), but for now claiming false is no real loss
all other drivers: can_write_ is not set, and supported_ is
either unset or a passthrough
Simplify the code by moving the conditional into
supported_zero_flags for all drivers, then dropping the
now-unused BDI field. For callers that relied on
bdrv_can_write_zeroes_with_unmap(), we return the same
per-device settings for drivers that had conditions (no
observable change in behavior there); and can now return
true (instead of false) for drivers that support passthrough
(for example, the commit driver) which gives those drivers
the same fix as nbd just got in bca80059e. For callers that
relied on supported_zero_flags, we now have a few more places
that can avoid a wasted call to pwrite_zeroes() that will
just fail with ENOTSUP.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180126193439.20219-1-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-01-26 22:34:39 +03:00
|
|
|
return bs->supported_zero_flags & BDRV_REQ_MAY_UNMAP;
|
2011-06-30 12:05:46 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
void bdrv_get_backing_filename(BlockDriverState *bs,
|
|
|
|
char *filename, int filename_size)
|
2010-05-26 19:51:49 +04:00
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
pstrcpy(filename, filename_size, bs->backing_file);
|
|
|
|
}
|
2012-11-13 19:35:08 +04:00
|
|
|
|
2023-01-13 23:42:08 +03:00
|
|
|
int coroutine_fn bdrv_co_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
|
2015-04-28 16:27:52 +03:00
|
|
|
{
|
block: introduce BDRV_MAX_LENGTH
We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.
It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.
So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.
And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.
Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.
Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.
So, alignment can't be greater than 2^30.
Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.
OK, then, let's apply it to block/io.
Let's consider all block/io entry points of offset/bytes:
4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().
We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.
And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:
file-posix.c: adopted by this patch
write-threshold.c: only read fields
test-write-threshold.c: sets obviously small constant values
Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.
After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.
For all io requests except for discard we keep for now old restriction
of 32bit request length.
iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-04 01:27:13 +03:00
|
|
|
int ret;
|
2015-04-28 16:27:52 +03:00
|
|
|
BlockDriver *drv = bs->drv;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2023-05-04 14:57:44 +03:00
|
|
|
assert_bdrv_graph_readable();
|
|
|
|
|
2017-07-13 18:30:25 +03:00
|
|
|
/* if bs->drv == NULL, bs is closed, so there's nothing to do here */
|
|
|
|
if (!drv) {
|
2015-04-28 16:27:52 +03:00
|
|
|
return -ENOMEDIUM;
|
2017-07-13 18:30:25 +03:00
|
|
|
}
|
2023-01-13 23:42:08 +03:00
|
|
|
if (!drv->bdrv_co_get_info) {
|
2019-06-12 18:03:38 +03:00
|
|
|
BlockDriverState *filtered = bdrv_filter_bs(bs);
|
|
|
|
if (filtered) {
|
2023-01-13 23:42:08 +03:00
|
|
|
return bdrv_co_get_info(filtered, bdi);
|
2017-07-13 18:30:25 +03:00
|
|
|
}
|
2015-04-28 16:27:52 +03:00
|
|
|
return -ENOTSUP;
|
2017-07-13 18:30:25 +03:00
|
|
|
}
|
2015-04-28 16:27:52 +03:00
|
|
|
memset(bdi, 0, sizeof(*bdi));
|
2023-01-13 23:42:08 +03:00
|
|
|
ret = drv->bdrv_co_get_info(bs, bdi);
|
block: introduce BDRV_MAX_LENGTH
We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.
It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.
So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.
And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.
Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.
Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.
So, alignment can't be greater than 2^30.
Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.
OK, then, let's apply it to block/io.
Let's consider all block/io entry points of offset/bytes:
4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().
We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.
And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:
file-posix.c: adopted by this patch
write-threshold.c: only read fields
test-write-threshold.c: sets obviously small constant values
Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.
After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.
For all io requests except for discard we keep for now old restriction
of 32bit request length.
iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-04 01:27:13 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (bdi->cluster_size > BDRV_MAX_ALIGNMENT) {
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
2015-04-28 16:27:52 +03:00
|
|
|
}
|
2010-05-26 19:51:49 +04:00
|
|
|
|
2019-02-08 18:06:06 +03:00
|
|
|
ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs,
|
|
|
|
Error **errp)
|
2015-04-28 16:27:52 +03:00
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
if (drv && drv->bdrv_get_specific_info) {
|
2019-02-08 18:06:06 +03:00
|
|
|
return drv->bdrv_get_specific_info(bs, errp);
|
2015-04-28 16:27:52 +03:00
|
|
|
}
|
|
|
|
return NULL;
|
2010-05-26 19:51:49 +04:00
|
|
|
}
|
|
|
|
|
2019-09-23 15:17:37 +03:00
|
|
|
BlockStatsSpecific *bdrv_get_specific_stats(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2019-09-23 15:17:37 +03:00
|
|
|
if (!drv || !drv->bdrv_get_specific_stats) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
return drv->bdrv_get_specific_stats(bs);
|
|
|
|
}
|
|
|
|
|
2023-01-13 23:42:11 +03:00
|
|
|
void coroutine_fn bdrv_co_debug_event(BlockDriverState *bs, BlkdebugEvent event)
|
2011-10-17 14:32:14 +04:00
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2023-05-04 14:57:45 +03:00
|
|
|
assert_bdrv_graph_readable();
|
|
|
|
|
2023-01-13 23:42:11 +03:00
|
|
|
if (!bs || !bs->drv || !bs->drv->bdrv_co_debug_event) {
|
2015-04-28 16:27:52 +03:00
|
|
|
return;
|
|
|
|
}
|
2011-10-17 14:32:14 +04:00
|
|
|
|
2023-01-13 23:42:11 +03:00
|
|
|
bs->drv->bdrv_co_debug_event(bs, event);
|
2011-10-17 14:32:14 +04:00
|
|
|
}
|
|
|
|
|
2019-09-20 17:20:49 +03:00
|
|
|
static BlockDriverState *bdrv_find_debug_node(BlockDriverState *bs)
|
2011-10-17 14:32:14 +04:00
|
|
|
{
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
while (bs && bs->drv && !bs->drv->bdrv_debug_breakpoint) {
|
2019-06-12 18:42:13 +03:00
|
|
|
bs = bdrv_primary_bs(bs);
|
2015-04-28 16:27:52 +03:00
|
|
|
}
|
2011-10-17 14:32:14 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (bs && bs->drv && bs->drv->bdrv_debug_breakpoint) {
|
2019-09-20 17:20:49 +03:00
|
|
|
assert(bs->drv->bdrv_debug_remove_breakpoint);
|
|
|
|
return bs;
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bdrv_debug_breakpoint(BlockDriverState *bs, const char *event,
|
|
|
|
const char *tag)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2019-09-20 17:20:49 +03:00
|
|
|
bs = bdrv_find_debug_node(bs);
|
|
|
|
if (bs) {
|
2015-04-28 16:27:52 +03:00
|
|
|
return bs->drv->bdrv_debug_breakpoint(bs, event, tag);
|
|
|
|
}
|
2011-10-17 14:32:14 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
return -ENOTSUP;
|
2011-10-17 14:32:14 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
int bdrv_debug_remove_breakpoint(BlockDriverState *bs, const char *tag)
|
2004-08-02 01:59:26 +04:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2019-09-20 17:20:49 +03:00
|
|
|
bs = bdrv_find_debug_node(bs);
|
|
|
|
if (bs) {
|
2015-04-28 16:27:52 +03:00
|
|
|
return bs->drv->bdrv_debug_remove_breakpoint(bs, tag);
|
|
|
|
}
|
|
|
|
|
|
|
|
return -ENOTSUP;
|
2009-10-27 20:41:44 +03:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
int bdrv_debug_resume(BlockDriverState *bs, const char *tag)
|
2006-08-07 06:38:06 +04:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
while (bs && (!bs->drv || !bs->drv->bdrv_debug_resume)) {
|
2019-06-12 18:42:13 +03:00
|
|
|
bs = bdrv_primary_bs(bs);
|
2015-04-28 16:27:52 +03:00
|
|
|
}
|
2006-08-07 06:38:06 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (bs && bs->drv && bs->drv->bdrv_debug_resume) {
|
|
|
|
return bs->drv->bdrv_debug_resume(bs, tag);
|
|
|
|
}
|
2006-08-07 06:38:06 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
return -ENOTSUP;
|
2014-09-11 09:41:08 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag)
|
2006-08-07 06:38:06 +04:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2015-04-28 16:27:52 +03:00
|
|
|
while (bs && bs->drv && !bs->drv->bdrv_debug_is_suspended) {
|
2019-06-12 18:42:13 +03:00
|
|
|
bs = bdrv_primary_bs(bs);
|
2014-09-11 09:41:08 +04:00
|
|
|
}
|
2006-08-19 15:45:59 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (bs && bs->drv && bs->drv->bdrv_debug_is_suspended) {
|
|
|
|
return bs->drv->bdrv_debug_is_suspended(bs, tag);
|
|
|
|
}
|
2011-07-15 15:50:26 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
return false;
|
|
|
|
}
|
2011-07-15 15:50:26 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/* backing_file can either be relative, or absolute, or a protocol. If it is
|
|
|
|
* relative, it must be relative to the chain. So, passing in bs->filename
|
|
|
|
* from a BDS as backing_file should not be done, as that may be relative to
|
|
|
|
* the CWD rather than the chain. */
|
|
|
|
BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
|
|
|
|
const char *backing_file)
|
2011-07-15 15:50:26 +04:00
|
|
|
{
|
2015-04-28 16:27:52 +03:00
|
|
|
char *filename_full = NULL;
|
|
|
|
char *backing_file_full = NULL;
|
|
|
|
char *filename_tmp = NULL;
|
|
|
|
int is_protocol = 0;
|
block: Leave BDS.backing_{file,format} constant
Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.
Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).
Before this patch:
$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
After this patch:
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.
With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file. If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.
Note that this changes the QAPI output for a node's backing_file. We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image). This
inconsistent output was effectively useless, so we have to decide one
way or the other. Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on. If you want to receive the image
header information, you have to refer to full-backing-filename.
This necessitates a change to iotest 228. The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file. Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.
Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well. This way,
ImageInfo's backing-filename and backing-filename-format fields will
represent what the image header says and nothing else.
iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.
273 also changes: The base image is opened without a format layer, so
ImageInfo.backing-filename-format used to report "file" for the base
image's overlay after blockdev-snapshot. However, the image header
never says "file" anywhere, so it now reports $IMGFMT.
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-01 21:34:11 +03:00
|
|
|
bool filenames_refreshed = false;
|
2015-04-28 16:27:52 +03:00
|
|
|
BlockDriverState *curr_bs = NULL;
|
|
|
|
BlockDriverState *retval = NULL;
|
2019-06-12 18:34:45 +03:00
|
|
|
BlockDriverState *bs_below;
|
2011-07-15 15:50:26 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (!bs || !bs->drv || !backing_file) {
|
|
|
|
return NULL;
|
2011-07-15 15:50:26 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
filename_full = g_malloc(PATH_MAX);
|
|
|
|
backing_file_full = g_malloc(PATH_MAX);
|
2011-07-15 15:50:26 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
is_protocol = path_has_protocol(backing_file);
|
2011-07-15 15:50:26 +04:00
|
|
|
|
2019-06-12 18:34:45 +03:00
|
|
|
/*
|
|
|
|
* Being largely a legacy function, skip any filters here
|
|
|
|
* (because filters do not have normal filenames, so they cannot
|
|
|
|
* match anyway; and allowing json:{} filenames is a bit out of
|
|
|
|
* scope).
|
|
|
|
*/
|
|
|
|
for (curr_bs = bdrv_skip_filters(bs);
|
|
|
|
bdrv_cow_child(curr_bs) != NULL;
|
|
|
|
curr_bs = bs_below)
|
|
|
|
{
|
|
|
|
bs_below = bdrv_backing_chain_next(curr_bs);
|
2011-07-15 15:50:26 +04:00
|
|
|
|
block: Leave BDS.backing_{file,format} constant
Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.
Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).
Before this patch:
$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
After this patch:
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.
With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file. If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.
Note that this changes the QAPI output for a node's backing_file. We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image). This
inconsistent output was effectively useless, so we have to decide one
way or the other. Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on. If you want to receive the image
header information, you have to refer to full-backing-filename.
This necessitates a change to iotest 228. The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file. Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.
Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well. This way,
ImageInfo's backing-filename and backing-filename-format fields will
represent what the image header says and nothing else.
iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.
273 also changes: The base image is opened without a format layer, so
ImageInfo.backing-filename-format used to report "file" for the base
image's overlay after blockdev-snapshot. However, the image header
never says "file" anywhere, so it now reports $IMGFMT.
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-01 21:34:11 +03:00
|
|
|
if (bdrv_backing_overridden(curr_bs)) {
|
|
|
|
/*
|
|
|
|
* If the backing file was overridden, we can only compare
|
|
|
|
* directly against the backing node's filename.
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (!filenames_refreshed) {
|
|
|
|
/*
|
|
|
|
* This will automatically refresh all of the
|
|
|
|
* filenames in the rest of the backing chain, so we
|
|
|
|
* only need to do this once.
|
|
|
|
*/
|
|
|
|
bdrv_refresh_filename(bs_below);
|
|
|
|
filenames_refreshed = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (strcmp(backing_file, bs_below->filename) == 0) {
|
|
|
|
retval = bs_below;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
} else if (is_protocol || path_has_protocol(curr_bs->backing_file)) {
|
|
|
|
/*
|
|
|
|
* If either of the filename paths is actually a protocol, then
|
|
|
|
* compare unmodified paths; otherwise make paths relative.
|
|
|
|
*/
|
2019-02-01 22:29:15 +03:00
|
|
|
char *backing_file_full_ret;
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (strcmp(backing_file, curr_bs->backing_file) == 0) {
|
2019-06-12 18:34:45 +03:00
|
|
|
retval = bs_below;
|
2015-04-28 16:27:52 +03:00
|
|
|
break;
|
|
|
|
}
|
2017-01-26 04:08:20 +03:00
|
|
|
/* Also check against the full backing filename for the image */
|
2019-02-01 22:29:15 +03:00
|
|
|
backing_file_full_ret = bdrv_get_full_backing_filename(curr_bs,
|
|
|
|
NULL);
|
|
|
|
if (backing_file_full_ret) {
|
|
|
|
bool equal = strcmp(backing_file, backing_file_full_ret) == 0;
|
|
|
|
g_free(backing_file_full_ret);
|
|
|
|
if (equal) {
|
2019-06-12 18:34:45 +03:00
|
|
|
retval = bs_below;
|
2017-01-26 04:08:20 +03:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2015-04-28 16:27:52 +03:00
|
|
|
} else {
|
|
|
|
/* If not an absolute filename path, make it relative to the current
|
|
|
|
* image's filename path */
|
2019-02-01 22:29:17 +03:00
|
|
|
filename_tmp = bdrv_make_absolute_filename(curr_bs, backing_file,
|
|
|
|
NULL);
|
|
|
|
/* We are going to compare canonicalized absolute pathnames */
|
|
|
|
if (!filename_tmp || !realpath(filename_tmp, filename_full)) {
|
|
|
|
g_free(filename_tmp);
|
2015-04-28 16:27:52 +03:00
|
|
|
continue;
|
|
|
|
}
|
2019-02-01 22:29:17 +03:00
|
|
|
g_free(filename_tmp);
|
2011-10-17 14:32:12 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
/* We need to make sure the backing filename we are comparing against
|
|
|
|
* is relative to the current image filename (or absolute) */
|
2019-02-01 22:29:17 +03:00
|
|
|
filename_tmp = bdrv_get_full_backing_filename(curr_bs, NULL);
|
|
|
|
if (!filename_tmp || !realpath(filename_tmp, backing_file_full)) {
|
|
|
|
g_free(filename_tmp);
|
2015-04-28 16:27:52 +03:00
|
|
|
continue;
|
|
|
|
}
|
2019-02-01 22:29:17 +03:00
|
|
|
g_free(filename_tmp);
|
2011-11-10 21:10:11 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
if (strcmp(backing_file_full, filename_full) == 0) {
|
2019-06-12 18:34:45 +03:00
|
|
|
retval = bs_below;
|
2015-04-28 16:27:52 +03:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2011-11-10 21:10:11 +04:00
|
|
|
}
|
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
g_free(filename_full);
|
|
|
|
g_free(backing_file_full);
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
|
|
|
void bdrv_init(void)
|
|
|
|
{
|
2021-07-09 19:41:41 +03:00
|
|
|
#ifdef CONFIG_BDRV_WHITELIST_TOOLS
|
|
|
|
use_bdrv_whitelist = 1;
|
|
|
|
#endif
|
2015-04-28 16:27:52 +03:00
|
|
|
module_call_init(MODULE_INIT_BLOCK);
|
|
|
|
}
|
2012-03-12 21:26:01 +04:00
|
|
|
|
2015-04-28 16:27:52 +03:00
|
|
|
void bdrv_init_with_whitelist(void)
|
|
|
|
{
|
|
|
|
use_bdrv_whitelist = 1;
|
|
|
|
bdrv_init();
|
2011-10-17 14:32:12 +04:00
|
|
|
}
|
|
|
|
|
2022-02-09 13:54:50 +03:00
|
|
|
int bdrv_activate(BlockDriverState *bs, Error **errp)
|
2011-11-15 01:09:45 +04:00
|
|
|
{
|
2017-05-04 19:52:37 +03:00
|
|
|
BdrvChild *child, *parent;
|
2014-03-12 18:59:16 +04:00
|
|
|
Error *local_err = NULL;
|
|
|
|
int ret;
|
dirty-bitmaps: clean-up bitmaps loading and migration logic
This patch aims to bring the following behavior:
1. We don't load bitmaps, when started in inactive mode. It's the case
of incoming migration. In this case we wait for bitmaps migration
through migration channel (if 'dirty-bitmaps' capability is enabled) or
for invalidation (to load bitmaps from the image).
2. We don't remove persistent bitmaps on inactivation. Instead, we only
remove bitmaps after storing. This is the only way to restore bitmaps,
if we decided to resume source after [failed] migration with
'dirty-bitmaps' capability enabled (which means, that bitmaps were not
stored).
3. We load bitmaps on open and any invalidation, it's ok for all cases:
- normal open
- migration target invalidation with dirty-bitmaps capability
(bitmaps are migrating through migration channel, the are not
stored, so they should have IN_USE flag set and will be skipped
when loading. However, it would fail if bitmaps are read-only[1])
- migration target invalidation without dirty-bitmaps capability
(normal load of the bitmaps, if migrated with shared storage)
- source invalidation with dirty-bitmaps capability
(skip because IN_USE)
- source invalidation without dirty-bitmaps capability
(bitmaps were dropped, reload them)
[1]: to accurately handle this, migration of read-only bitmaps is
explicitly forbidden in this patch.
New mechanism for not storing bitmaps when migrate with dirty-bitmaps
capability is introduced: migration filed in BdrvDirtyBitmap.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 23:23:17 +03:00
|
|
|
BdrvDirtyBitmap *bm;
|
2014-03-12 18:59:16 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2014-03-11 13:58:39 +04:00
|
|
|
if (!bs->drv) {
|
2020-09-24 21:54:08 +03:00
|
|
|
return -ENOMEDIUM;
|
2014-03-11 13:58:39 +04:00
|
|
|
}
|
|
|
|
|
2017-01-31 14:23:08 +03:00
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
2022-02-09 13:54:52 +03:00
|
|
|
bdrv_activate(child->bs, &local_err);
|
2016-05-11 05:45:33 +03:00
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
2020-09-24 21:54:08 +03:00
|
|
|
return -EINVAL;
|
2016-05-11 05:45:33 +03:00
|
|
|
}
|
2014-03-12 18:59:16 +04:00
|
|
|
}
|
2016-05-11 05:45:33 +03:00
|
|
|
|
2017-11-16 15:00:01 +03:00
|
|
|
/*
|
|
|
|
* Update permissions, they may differ for inactive nodes.
|
|
|
|
*
|
|
|
|
* Note that the required permissions of inactive images are always a
|
|
|
|
* subset of the permissions required after activating the image. This
|
|
|
|
* allows us to just get the permissions upfront without restricting
|
2022-02-09 13:54:52 +03:00
|
|
|
* bdrv_co_invalidate_cache().
|
2017-11-16 15:00:01 +03:00
|
|
|
*
|
|
|
|
* It also means that in error cases, we don't have to try and revert to
|
|
|
|
* the old permissions (which is an operation that could fail, too). We can
|
|
|
|
* just keep the extended permissions for the next time that an activation
|
|
|
|
* of the image is tried.
|
|
|
|
*/
|
2019-12-17 17:06:38 +03:00
|
|
|
if (bs->open_flags & BDRV_O_INACTIVE) {
|
|
|
|
bs->open_flags &= ~BDRV_O_INACTIVE;
|
2022-11-07 19:35:57 +03:00
|
|
|
ret = bdrv_refresh_perms(bs, NULL, errp);
|
2019-12-17 17:06:38 +03:00
|
|
|
if (ret < 0) {
|
2016-05-11 05:45:33 +03:00
|
|
|
bs->open_flags |= BDRV_O_INACTIVE;
|
2020-09-24 21:54:08 +03:00
|
|
|
return ret;
|
2016-05-11 05:45:33 +03:00
|
|
|
}
|
2014-03-11 13:58:39 +04:00
|
|
|
|
2022-02-09 13:54:52 +03:00
|
|
|
ret = bdrv_invalidate_cache(bs, errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
bs->open_flags |= BDRV_O_INACTIVE;
|
|
|
|
return ret;
|
2019-12-17 17:06:38 +03:00
|
|
|
}
|
dirty-bitmaps: clean-up bitmaps loading and migration logic
This patch aims to bring the following behavior:
1. We don't load bitmaps, when started in inactive mode. It's the case
of incoming migration. In this case we wait for bitmaps migration
through migration channel (if 'dirty-bitmaps' capability is enabled) or
for invalidation (to load bitmaps from the image).
2. We don't remove persistent bitmaps on inactivation. Instead, we only
remove bitmaps after storing. This is the only way to restore bitmaps,
if we decided to resume source after [failed] migration with
'dirty-bitmaps' capability enabled (which means, that bitmaps were not
stored).
3. We load bitmaps on open and any invalidation, it's ok for all cases:
- normal open
- migration target invalidation with dirty-bitmaps capability
(bitmaps are migrating through migration channel, the are not
stored, so they should have IN_USE flag set and will be skipped
when loading. However, it would fail if bitmaps are read-only[1])
- migration target invalidation without dirty-bitmaps capability
(normal load of the bitmaps, if migrated with shared storage)
- source invalidation with dirty-bitmaps capability
(skip because IN_USE)
- source invalidation without dirty-bitmaps capability
(bitmaps were dropped, reload them)
[1]: to accurately handle this, migration of read-only bitmaps is
explicitly forbidden in this patch.
New mechanism for not storing bitmaps when migrate with dirty-bitmaps
capability is introduced: migration filed in BdrvDirtyBitmap.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-29 23:23:17 +03:00
|
|
|
|
2019-12-17 17:06:38 +03:00
|
|
|
FOR_EACH_DIRTY_BITMAP(bs, bm) {
|
|
|
|
bdrv_dirty_bitmap_skip_store(bm, false);
|
|
|
|
}
|
|
|
|
|
2023-01-13 23:42:03 +03:00
|
|
|
ret = bdrv_refresh_total_sectors(bs, bs->total_sectors);
|
2019-12-17 17:06:38 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
bs->open_flags |= BDRV_O_INACTIVE;
|
|
|
|
error_setg_errno(errp, -ret, "Could not refresh total sector count");
|
2020-09-24 21:54:08 +03:00
|
|
|
return ret;
|
2019-12-17 17:06:38 +03:00
|
|
|
}
|
2014-03-12 18:59:16 +04:00
|
|
|
}
|
2017-05-04 19:52:37 +03:00
|
|
|
|
|
|
|
QLIST_FOREACH(parent, &bs->parents, next_parent) {
|
2020-05-13 14:05:13 +03:00
|
|
|
if (parent->klass->activate) {
|
|
|
|
parent->klass->activate(parent, &local_err);
|
2017-05-04 19:52:37 +03:00
|
|
|
if (local_err) {
|
2019-01-31 17:16:10 +03:00
|
|
|
bs->open_flags |= BDRV_O_INACTIVE;
|
2017-05-04 19:52:37 +03:00
|
|
|
error_propagate(errp, local_err);
|
2020-09-24 21:54:08 +03:00
|
|
|
return -EINVAL;
|
2017-05-04 19:52:37 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2020-09-24 21:54:08 +03:00
|
|
|
|
|
|
|
return 0;
|
2011-11-15 01:09:45 +04:00
|
|
|
}
|
|
|
|
|
2022-02-09 13:54:52 +03:00
|
|
|
int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
|
|
|
|
{
|
|
|
|
Error *local_err = NULL;
|
2022-03-03 18:16:09 +03:00
|
|
|
IO_CODE();
|
2022-02-09 13:54:52 +03:00
|
|
|
|
|
|
|
assert(!(bs->open_flags & BDRV_O_INACTIVE));
|
2022-12-07 16:18:38 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2022-02-09 13:54:52 +03:00
|
|
|
|
|
|
|
if (bs->drv->bdrv_co_invalidate_cache) {
|
|
|
|
bs->drv->bdrv_co_invalidate_cache(bs, &local_err);
|
|
|
|
if (local_err) {
|
|
|
|
error_propagate(errp, local_err);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-02-09 13:54:51 +03:00
|
|
|
void bdrv_activate_all(Error **errp)
|
2011-11-15 01:09:45 +04:00
|
|
|
{
|
2016-03-22 20:58:50 +03:00
|
|
|
BlockDriverState *bs;
|
2016-05-20 19:49:07 +03:00
|
|
|
BdrvNextIterator it;
|
2011-11-15 01:09:45 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2016-05-20 19:49:07 +03:00
|
|
|
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
|
2014-05-08 18:34:35 +04:00
|
|
|
AioContext *aio_context = bdrv_get_aio_context(bs);
|
2020-09-24 21:54:08 +03:00
|
|
|
int ret;
|
2014-05-08 18:34:35 +04:00
|
|
|
|
|
|
|
aio_context_acquire(aio_context);
|
2022-02-09 13:54:50 +03:00
|
|
|
ret = bdrv_activate(bs, errp);
|
2014-05-08 18:34:35 +04:00
|
|
|
aio_context_release(aio_context);
|
2020-09-24 21:54:08 +03:00
|
|
|
if (ret < 0) {
|
2017-11-10 20:25:45 +03:00
|
|
|
bdrv_next_cleanup(&it);
|
2014-03-12 18:59:16 +04:00
|
|
|
return;
|
|
|
|
}
|
2011-11-15 01:09:45 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-11-23 17:11:14 +03:00
|
|
|
static bool bdrv_has_bds_parent(BlockDriverState *bs, bool only_active)
|
|
|
|
{
|
|
|
|
BdrvChild *parent;
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2018-11-23 17:11:14 +03:00
|
|
|
|
|
|
|
QLIST_FOREACH(parent, &bs->parents, next_parent) {
|
2020-05-13 14:05:13 +03:00
|
|
|
if (parent->klass->parent_is_bds) {
|
2018-11-23 17:11:14 +03:00
|
|
|
BlockDriverState *parent_bs = parent->opaque;
|
|
|
|
if (!only_active || !(parent_bs->open_flags & BDRV_O_INACTIVE)) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int bdrv_inactivate_recurse(BlockDriverState *bs)
|
2015-12-22 16:07:08 +03:00
|
|
|
{
|
2017-05-04 19:52:38 +03:00
|
|
|
BdrvChild *child, *parent;
|
2015-12-22 16:07:08 +03:00
|
|
|
int ret;
|
2021-09-11 15:00:27 +03:00
|
|
|
uint64_t cumulative_perms, cumulative_shared_perms;
|
2015-12-22 16:07:08 +03:00
|
|
|
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2017-11-10 23:31:09 +03:00
|
|
|
if (!bs->drv) {
|
|
|
|
return -ENOMEDIUM;
|
|
|
|
}
|
|
|
|
|
2018-11-23 17:11:14 +03:00
|
|
|
/* Make sure that we don't inactivate a child before its parent.
|
|
|
|
* It will be covered by recursion from the yet active parent. */
|
|
|
|
if (bdrv_has_bds_parent(bs, true)) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(!(bs->open_flags & BDRV_O_INACTIVE));
|
|
|
|
|
|
|
|
/* Inactivate this node */
|
|
|
|
if (bs->drv->bdrv_inactivate) {
|
2015-12-22 16:07:08 +03:00
|
|
|
ret = bs->drv->bdrv_inactivate(bs);
|
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-11-23 17:11:14 +03:00
|
|
|
QLIST_FOREACH(parent, &bs->parents, next_parent) {
|
2020-05-13 14:05:13 +03:00
|
|
|
if (parent->klass->inactivate) {
|
|
|
|
ret = parent->klass->inactivate(parent);
|
2018-11-23 17:11:14 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
2017-05-04 19:52:38 +03:00
|
|
|
}
|
|
|
|
}
|
2018-11-23 17:11:14 +03:00
|
|
|
}
|
2017-05-04 19:52:40 +03:00
|
|
|
|
2021-09-11 15:00:27 +03:00
|
|
|
bdrv_get_cumulative_perm(bs, &cumulative_perms,
|
|
|
|
&cumulative_shared_perms);
|
|
|
|
if (cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) {
|
|
|
|
/* Our inactive parents still need write access. Inactivation failed. */
|
|
|
|
return -EPERM;
|
|
|
|
}
|
|
|
|
|
2018-11-23 17:11:14 +03:00
|
|
|
bs->open_flags |= BDRV_O_INACTIVE;
|
2017-08-23 16:42:42 +03:00
|
|
|
|
2020-11-06 15:42:38 +03:00
|
|
|
/*
|
|
|
|
* Update permissions, they may differ for inactive nodes.
|
|
|
|
* We only tried to loosen restrictions, so errors are not fatal, ignore
|
|
|
|
* them.
|
|
|
|
*/
|
2022-11-07 19:35:57 +03:00
|
|
|
bdrv_refresh_perms(bs, NULL, NULL);
|
2018-11-23 17:11:14 +03:00
|
|
|
|
|
|
|
/* Recursively inactivate children */
|
2017-05-04 19:52:39 +03:00
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
2018-11-23 17:11:14 +03:00
|
|
|
ret = bdrv_inactivate_recurse(child->bs);
|
2017-05-04 19:52:39 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-12-22 16:07:08 +03:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bdrv_inactivate_all(void)
|
|
|
|
{
|
2016-03-16 21:54:44 +03:00
|
|
|
BlockDriverState *bs = NULL;
|
2016-05-20 19:49:07 +03:00
|
|
|
BdrvNextIterator it;
|
2016-05-11 05:45:35 +03:00
|
|
|
int ret = 0;
|
block: avoid recursive AioContext acquire in bdrv_inactivate_all()
BDRV_POLL_WHILE() does not support recursive AioContext locking. It
only releases the AioContext lock once regardless of how many times the
caller has acquired it. This results in a hang since the IOThread does
not make progress while the AioContext is still locked.
The following steps trigger the hang:
$ qemu-system-x86_64 -M accel=kvm -m 1G -cpu host \
-object iothread,id=iothread0 \
-device virtio-scsi-pci,iothread=iothread0 \
-drive if=none,id=drive0,file=test.img,format=raw \
-device scsi-hd,drive=drive0 \
-drive if=none,id=drive1,file=test.img,format=raw \
-device scsi-hd,drive=drive1
$ qemu-system-x86_64 ...same options... \
-incoming tcp::1234
(qemu) migrate tcp:127.0.0.1:1234
...hang...
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171207201320.19284-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-12-07 23:13:15 +03:00
|
|
|
GSList *aio_ctxs = NULL, *ctx;
|
2015-12-22 16:07:08 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2016-05-20 19:49:07 +03:00
|
|
|
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
|
block: avoid recursive AioContext acquire in bdrv_inactivate_all()
BDRV_POLL_WHILE() does not support recursive AioContext locking. It
only releases the AioContext lock once regardless of how many times the
caller has acquired it. This results in a hang since the IOThread does
not make progress while the AioContext is still locked.
The following steps trigger the hang:
$ qemu-system-x86_64 -M accel=kvm -m 1G -cpu host \
-object iothread,id=iothread0 \
-device virtio-scsi-pci,iothread=iothread0 \
-drive if=none,id=drive0,file=test.img,format=raw \
-device scsi-hd,drive=drive0 \
-drive if=none,id=drive1,file=test.img,format=raw \
-device scsi-hd,drive=drive1
$ qemu-system-x86_64 ...same options... \
-incoming tcp::1234
(qemu) migrate tcp:127.0.0.1:1234
...hang...
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171207201320.19284-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-12-07 23:13:15 +03:00
|
|
|
AioContext *aio_context = bdrv_get_aio_context(bs);
|
|
|
|
|
|
|
|
if (!g_slist_find(aio_ctxs, aio_context)) {
|
|
|
|
aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
|
|
|
|
aio_context_acquire(aio_context);
|
|
|
|
}
|
2016-05-11 05:45:35 +03:00
|
|
|
}
|
2015-12-22 16:07:08 +03:00
|
|
|
|
2018-11-23 17:11:14 +03:00
|
|
|
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
|
|
|
|
/* Nodes with BDS parents are covered by recursion from the last
|
|
|
|
* parent that gets inactivated. Don't inactivate them a second
|
|
|
|
* time if that has already happened. */
|
|
|
|
if (bdrv_has_bds_parent(bs, false)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
ret = bdrv_inactivate_recurse(bs);
|
|
|
|
if (ret < 0) {
|
|
|
|
bdrv_next_cleanup(&it);
|
|
|
|
goto out;
|
2015-12-22 16:07:08 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-05-11 05:45:35 +03:00
|
|
|
out:
|
block: avoid recursive AioContext acquire in bdrv_inactivate_all()
BDRV_POLL_WHILE() does not support recursive AioContext locking. It
only releases the AioContext lock once regardless of how many times the
caller has acquired it. This results in a hang since the IOThread does
not make progress while the AioContext is still locked.
The following steps trigger the hang:
$ qemu-system-x86_64 -M accel=kvm -m 1G -cpu host \
-object iothread,id=iothread0 \
-device virtio-scsi-pci,iothread=iothread0 \
-drive if=none,id=drive0,file=test.img,format=raw \
-device scsi-hd,drive=drive0 \
-drive if=none,id=drive1,file=test.img,format=raw \
-device scsi-hd,drive=drive1
$ qemu-system-x86_64 ...same options... \
-incoming tcp::1234
(qemu) migrate tcp:127.0.0.1:1234
...hang...
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171207201320.19284-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-12-07 23:13:15 +03:00
|
|
|
for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
|
|
|
|
AioContext *aio_context = ctx->data;
|
|
|
|
aio_context_release(aio_context);
|
2016-05-11 05:45:35 +03:00
|
|
|
}
|
block: avoid recursive AioContext acquire in bdrv_inactivate_all()
BDRV_POLL_WHILE() does not support recursive AioContext locking. It
only releases the AioContext lock once regardless of how many times the
caller has acquired it. This results in a hang since the IOThread does
not make progress while the AioContext is still locked.
The following steps trigger the hang:
$ qemu-system-x86_64 -M accel=kvm -m 1G -cpu host \
-object iothread,id=iothread0 \
-device virtio-scsi-pci,iothread=iothread0 \
-drive if=none,id=drive0,file=test.img,format=raw \
-device scsi-hd,drive=drive0 \
-drive if=none,id=drive1,file=test.img,format=raw \
-device scsi-hd,drive=drive1
$ qemu-system-x86_64 ...same options... \
-incoming tcp::1234
(qemu) migrate tcp:127.0.0.1:1234
...hang...
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171207201320.19284-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-12-07 23:13:15 +03:00
|
|
|
g_slist_free(aio_ctxs);
|
2016-05-11 05:45:35 +03:00
|
|
|
|
|
|
|
return ret;
|
2015-12-22 16:07:08 +03:00
|
|
|
}
|
|
|
|
|
2006-08-19 15:45:59 +04:00
|
|
|
/**************************************************************/
|
|
|
|
/* removable device support */
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Return TRUE if the media is present
|
|
|
|
*/
|
2023-01-13 23:42:02 +03:00
|
|
|
bool coroutine_fn bdrv_co_is_inserted(BlockDriverState *bs)
|
2006-08-19 15:45:59 +04:00
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2015-10-19 18:53:13 +03:00
|
|
|
BdrvChild *child;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2023-02-03 18:21:57 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2011-09-06 20:58:41 +04:00
|
|
|
|
2015-10-19 18:53:11 +03:00
|
|
|
if (!drv) {
|
|
|
|
return false;
|
|
|
|
}
|
2023-01-13 23:42:02 +03:00
|
|
|
if (drv->bdrv_co_is_inserted) {
|
|
|
|
return drv->bdrv_co_is_inserted(bs);
|
2015-10-19 18:53:13 +03:00
|
|
|
}
|
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
2023-01-13 23:42:02 +03:00
|
|
|
if (!bdrv_co_is_inserted(child->bs)) {
|
2015-10-19 18:53:13 +03:00
|
|
|
return false;
|
|
|
|
}
|
2015-10-19 18:53:11 +03:00
|
|
|
}
|
2015-10-19 18:53:13 +03:00
|
|
|
return true;
|
2006-08-19 15:45:59 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* If eject_flag is TRUE, eject the media. Otherwise, close the tray
|
|
|
|
*/
|
2023-01-13 23:42:09 +03:00
|
|
|
void coroutine_fn bdrv_co_eject(BlockDriverState *bs, bool eject_flag)
|
2006-08-19 15:45:59 +04:00
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2023-02-03 18:21:58 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2006-08-19 15:45:59 +04:00
|
|
|
|
2023-01-13 23:42:09 +03:00
|
|
|
if (drv && drv->bdrv_co_eject) {
|
|
|
|
drv->bdrv_co_eject(bs, eject_flag);
|
2006-08-19 15:45:59 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Lock or unlock the media (if it is locked, the user won't be able
|
|
|
|
* to eject it manually).
|
|
|
|
*/
|
2023-01-13 23:42:10 +03:00
|
|
|
void coroutine_fn bdrv_co_lock_medium(BlockDriverState *bs, bool locked)
|
2006-08-19 15:45:59 +04:00
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2023-02-03 18:21:58 +03:00
|
|
|
assert_bdrv_graph_readable();
|
2011-09-06 20:58:47 +04:00
|
|
|
trace_bdrv_lock_medium(bs, locked);
|
2011-03-29 23:04:40 +04:00
|
|
|
|
2023-01-13 23:42:10 +03:00
|
|
|
if (drv && drv->bdrv_co_lock_medium) {
|
|
|
|
drv->bdrv_co_lock_medium(bs, locked);
|
2006-08-19 15:45:59 +04:00
|
|
|
}
|
|
|
|
}
|
2007-12-24 19:10:43 +03:00
|
|
|
|
2013-08-23 05:14:46 +04:00
|
|
|
/* Get a reference to bs */
|
|
|
|
void bdrv_ref(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2013-08-23 05:14:46 +04:00
|
|
|
bs->refcnt++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Release a previously grabbed reference to bs.
|
|
|
|
* If after releasing, reference count is zero, the BlockDriverState is
|
|
|
|
* deleted. */
|
|
|
|
void bdrv_unref(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-07-24 01:22:57 +04:00
|
|
|
if (!bs) {
|
|
|
|
return;
|
|
|
|
}
|
2013-08-23 05:14:46 +04:00
|
|
|
assert(bs->refcnt > 0);
|
|
|
|
if (--bs->refcnt == 0) {
|
|
|
|
bdrv_delete(bs);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-05-23 17:29:42 +04:00
|
|
|
struct BdrvOpBlocker {
|
|
|
|
Error *reason;
|
|
|
|
QLIST_ENTRY(BdrvOpBlocker) list;
|
|
|
|
};
|
|
|
|
|
|
|
|
bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp)
|
|
|
|
{
|
|
|
|
BdrvOpBlocker *blocker;
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-05-23 17:29:42 +04:00
|
|
|
assert((int) op >= 0 && op < BLOCK_OP_TYPE_MAX);
|
|
|
|
if (!QLIST_EMPTY(&bs->op_blockers[op])) {
|
|
|
|
blocker = QLIST_FIRST(&bs->op_blockers[op]);
|
2018-10-17 11:26:25 +03:00
|
|
|
error_propagate_prepend(errp, error_copy(blocker->reason),
|
|
|
|
"Node '%s' is busy: ",
|
|
|
|
bdrv_get_device_or_node_name(bs));
|
2014-05-23 17:29:42 +04:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason)
|
|
|
|
{
|
|
|
|
BdrvOpBlocker *blocker;
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-05-23 17:29:42 +04:00
|
|
|
assert((int) op >= 0 && op < BLOCK_OP_TYPE_MAX);
|
|
|
|
|
block: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
Patch created with Coccinelle, with two manual changes on top:
* Add const to bdrv_iterate_format() to keep the types straight
* Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
inexplicably misses
Coccinelle semantic patch:
@@
type T;
@@
-g_malloc(sizeof(T))
+g_new(T, 1)
@@
type T;
@@
-g_try_malloc(sizeof(T))
+g_try_new(T, 1)
@@
type T;
@@
-g_malloc0(sizeof(T))
+g_new0(T, 1)
@@
type T;
@@
-g_try_malloc0(sizeof(T))
+g_try_new0(T, 1)
@@
type T;
expression n;
@@
-g_malloc(sizeof(T) * (n))
+g_new(T, n)
@@
type T;
expression n;
@@
-g_try_malloc(sizeof(T) * (n))
+g_try_new(T, n)
@@
type T;
expression n;
@@
-g_malloc0(sizeof(T) * (n))
+g_new0(T, n)
@@
type T;
expression n;
@@
-g_try_malloc0(sizeof(T) * (n))
+g_try_new0(T, n)
@@
type T;
expression p, n;
@@
-g_realloc(p, sizeof(T) * (n))
+g_renew(T, p, n)
@@
type T;
expression p, n;
@@
-g_try_realloc(p, sizeof(T) * (n))
+g_try_renew(T, p, n)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-19 12:31:08 +04:00
|
|
|
blocker = g_new0(BdrvOpBlocker, 1);
|
2014-05-23 17:29:42 +04:00
|
|
|
blocker->reason = reason;
|
|
|
|
QLIST_INSERT_HEAD(&bs->op_blockers[op], blocker, list);
|
|
|
|
}
|
|
|
|
|
|
|
|
void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason)
|
|
|
|
{
|
|
|
|
BdrvOpBlocker *blocker, *next;
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-05-23 17:29:42 +04:00
|
|
|
assert((int) op >= 0 && op < BLOCK_OP_TYPE_MAX);
|
|
|
|
QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) {
|
|
|
|
if (blocker->reason == reason) {
|
|
|
|
QLIST_REMOVE(blocker, list);
|
|
|
|
g_free(blocker);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void bdrv_op_block_all(BlockDriverState *bs, Error *reason)
|
|
|
|
{
|
|
|
|
int i;
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-05-23 17:29:42 +04:00
|
|
|
for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) {
|
|
|
|
bdrv_op_block(bs, i, reason);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void bdrv_op_unblock_all(BlockDriverState *bs, Error *reason)
|
|
|
|
{
|
|
|
|
int i;
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-05-23 17:29:42 +04:00
|
|
|
for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) {
|
|
|
|
bdrv_op_unblock(bs, i, reason);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
bool bdrv_op_blocker_is_empty(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
int i;
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-05-23 17:29:42 +04:00
|
|
|
for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) {
|
|
|
|
if (!QLIST_EMPTY(&bs->op_blockers[i])) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2022-12-07 16:18:30 +03:00
|
|
|
/*
|
|
|
|
* Must not be called while holding the lock of an AioContext other than the
|
|
|
|
* current one.
|
|
|
|
*/
|
2012-11-30 16:52:09 +04:00
|
|
|
void bdrv_img_create(const char *filename, const char *fmt,
|
|
|
|
const char *base_filename, const char *base_fmt,
|
2017-04-21 15:27:01 +03:00
|
|
|
char *options, uint64_t img_size, int flags, bool quiet,
|
|
|
|
Error **errp)
|
2010-12-16 15:52:15 +03:00
|
|
|
{
|
2014-06-05 13:20:51 +04:00
|
|
|
QemuOptsList *create_opts = NULL;
|
|
|
|
QemuOpts *opts = NULL;
|
|
|
|
const char *backing_fmt, *backing_file;
|
|
|
|
int64_t size;
|
2010-12-16 15:52:15 +03:00
|
|
|
BlockDriver *drv, *proto_drv;
|
2013-09-06 19:14:26 +04:00
|
|
|
Error *local_err = NULL;
|
2010-12-16 15:52:15 +03:00
|
|
|
int ret = 0;
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2010-12-16 15:52:15 +03:00
|
|
|
/* Find driver and parse its options */
|
|
|
|
drv = bdrv_find_format(fmt);
|
|
|
|
if (!drv) {
|
2012-11-30 16:52:04 +04:00
|
|
|
error_setg(errp, "Unknown file format '%s'", fmt);
|
2012-11-30 16:52:09 +04:00
|
|
|
return;
|
2010-12-16 15:52:15 +03:00
|
|
|
}
|
|
|
|
|
2015-02-05 21:58:12 +03:00
|
|
|
proto_drv = bdrv_find_protocol(filename, true, errp);
|
2010-12-16 15:52:15 +03:00
|
|
|
if (!proto_drv) {
|
2012-11-30 16:52:09 +04:00
|
|
|
return;
|
2010-12-16 15:52:15 +03:00
|
|
|
}
|
|
|
|
|
2014-12-02 20:32:45 +03:00
|
|
|
if (!drv->create_opts) {
|
|
|
|
error_setg(errp, "Format driver '%s' does not support image creation",
|
|
|
|
drv->format_name);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2020-03-26 04:12:18 +03:00
|
|
|
if (!proto_drv->create_opts) {
|
|
|
|
error_setg(errp, "Protocol driver '%s' does not support image creation",
|
|
|
|
proto_drv->format_name);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2019-11-26 18:45:49 +03:00
|
|
|
/* Create parameter list */
|
2014-06-05 13:21:11 +04:00
|
|
|
create_opts = qemu_opts_append(create_opts, drv->create_opts);
|
2020-03-26 04:12:18 +03:00
|
|
|
create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
|
2010-12-16 15:52:15 +03:00
|
|
|
|
2014-06-05 13:20:51 +04:00
|
|
|
opts = qemu_opts_create(create_opts, NULL, 0, &error_abort);
|
2010-12-16 15:52:15 +03:00
|
|
|
|
|
|
|
/* Parse -o options */
|
|
|
|
if (options) {
|
2020-07-07 19:06:05 +03:00
|
|
|
if (!qemu_opts_do_parse(opts, options, NULL, errp)) {
|
2010-12-16 15:52:15 +03:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-11-26 18:45:49 +03:00
|
|
|
if (!qemu_opt_get(opts, BLOCK_OPT_SIZE)) {
|
|
|
|
qemu_opt_set_number(opts, BLOCK_OPT_SIZE, img_size, &error_abort);
|
|
|
|
} else if (img_size != UINT64_C(-1)) {
|
|
|
|
error_setg(errp, "The image size must be specified only once");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2010-12-16 15:52:15 +03:00
|
|
|
if (base_filename) {
|
qemu-option: Use returned bool to check for failure
The previous commit enables conversion of
foo(..., &err);
if (err) {
...
}
to
if (!foo(..., &err)) {
...
}
for QemuOpts functions that now return true / false on success /
error. Coccinelle script:
@@
identifier fun = {
opts_do_parse, parse_option_bool, parse_option_number,
parse_option_size, qemu_opt_parse, qemu_opt_rename, qemu_opt_set,
qemu_opt_set_bool, qemu_opt_set_number, qemu_opts_absorb_qdict,
qemu_opts_do_parse, qemu_opts_from_qdict_entry, qemu_opts_set,
qemu_opts_validate
};
expression list args, args2;
typedef Error;
Error *err;
@@
- fun(args, &err, args2);
- if (err)
+ if (!fun(args, &err, args2))
{
...
}
A few line breaks tidied up manually.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-15-armbru@redhat.com>
[Conflict with commit 0b6786a9c1 "block/amend: refactor qcow2 amend
options" resolved by rerunning Coccinelle on master's version]
2020-07-07 19:05:42 +03:00
|
|
|
if (!qemu_opt_set(opts, BLOCK_OPT_BACKING_FILE, base_filename,
|
2020-07-07 19:05:43 +03:00
|
|
|
NULL)) {
|
2012-11-30 16:52:04 +04:00
|
|
|
error_setg(errp, "Backing file not supported for file format '%s'",
|
|
|
|
fmt);
|
2010-12-16 15:52:15 +03:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (base_fmt) {
|
2020-07-07 19:05:43 +03:00
|
|
|
if (!qemu_opt_set(opts, BLOCK_OPT_BACKING_FMT, base_fmt, NULL)) {
|
2012-11-30 16:52:04 +04:00
|
|
|
error_setg(errp, "Backing file format not supported for file "
|
|
|
|
"format '%s'", fmt);
|
2010-12-16 15:52:15 +03:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-06-05 13:20:51 +04:00
|
|
|
backing_file = qemu_opt_get(opts, BLOCK_OPT_BACKING_FILE);
|
|
|
|
if (backing_file) {
|
|
|
|
if (!strcmp(filename, backing_file)) {
|
2012-11-30 16:52:04 +04:00
|
|
|
error_setg(errp, "Error: Trying to create an image with the "
|
|
|
|
"same filename as the backing file");
|
2010-12-16 15:52:17 +03:00
|
|
|
goto out;
|
|
|
|
}
|
2020-08-13 16:47:22 +03:00
|
|
|
if (backing_file[0] == '\0') {
|
|
|
|
error_setg(errp, "Expected backing file name, got empty string");
|
|
|
|
goto out;
|
|
|
|
}
|
2010-12-16 15:52:17 +03:00
|
|
|
}
|
|
|
|
|
2014-06-05 13:20:51 +04:00
|
|
|
backing_fmt = qemu_opt_get(opts, BLOCK_OPT_BACKING_FMT);
|
2010-12-16 15:52:15 +03:00
|
|
|
|
2017-07-18 03:34:22 +03:00
|
|
|
/* The size for the image must always be specified, unless we have a backing
|
|
|
|
* file and we have not been forbidden from opening it. */
|
2017-09-25 17:55:07 +03:00
|
|
|
size = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, img_size);
|
2017-07-18 03:34:22 +03:00
|
|
|
if (backing_file && !(flags & BDRV_O_NO_BACKING)) {
|
|
|
|
BlockDriverState *bs;
|
2019-02-01 22:29:14 +03:00
|
|
|
char *full_backing;
|
2017-07-18 03:34:22 +03:00
|
|
|
int back_flags;
|
|
|
|
QDict *backing_options = NULL;
|
|
|
|
|
2019-02-01 22:29:14 +03:00
|
|
|
full_backing =
|
|
|
|
bdrv_get_full_backing_filename_from_filename(filename, backing_file,
|
|
|
|
&local_err);
|
2017-07-18 03:34:22 +03:00
|
|
|
if (local_err) {
|
|
|
|
goto out;
|
|
|
|
}
|
2019-02-01 22:29:14 +03:00
|
|
|
assert(full_backing);
|
2014-11-26 19:20:27 +03:00
|
|
|
|
2021-06-22 17:00:30 +03:00
|
|
|
/*
|
|
|
|
* No need to do I/O here, which allows us to open encrypted
|
|
|
|
* backing images without needing the secret
|
|
|
|
*/
|
2017-07-18 03:34:22 +03:00
|
|
|
back_flags = flags;
|
|
|
|
back_flags &= ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING);
|
2021-06-22 17:00:30 +03:00
|
|
|
back_flags |= BDRV_O_NO_IO;
|
2010-12-16 15:52:15 +03:00
|
|
|
|
2017-12-15 11:04:45 +03:00
|
|
|
backing_options = qdict_new();
|
2017-07-18 03:34:22 +03:00
|
|
|
if (backing_fmt) {
|
|
|
|
qdict_put_str(backing_options, "driver", backing_fmt);
|
|
|
|
}
|
2017-12-15 11:04:45 +03:00
|
|
|
qdict_put_bool(backing_options, BDRV_OPT_FORCE_SHARE, true);
|
2015-08-26 20:47:48 +03:00
|
|
|
|
2017-07-18 03:34:22 +03:00
|
|
|
bs = bdrv_open(full_backing, NULL, backing_options, back_flags,
|
|
|
|
&local_err);
|
|
|
|
g_free(full_backing);
|
2020-07-06 23:39:50 +03:00
|
|
|
if (!bs) {
|
|
|
|
error_append_hint(&local_err, "Could not open backing image.\n");
|
2017-07-18 03:34:22 +03:00
|
|
|
goto out;
|
|
|
|
} else {
|
qemu-img: Deprecate use of -b without -F
Creating an image that requires format probing of the backing image is
potentially unsafe (we've had several CVEs over the years based on
probes leaking information to the guest on a subsequent boot, although
these days tools like libvirt are aware of the issue enough to prevent
the worst effects). For example, if our probing algorithm ever
changes, or if other tools like libvirt determine a different probe
result than we do, then subsequent use of that backing file under a
different format will present corrupted data to the guest.
Fortunately, the worst effects occur only when the backing image is
originally raw, and we at least prevent commit into a probed raw
backing file that would change its probed type.
Still, it is worth starting a deprecation clock so that future
qemu-img can refuse to create backing chains that would rely on
probing, to encourage clients to avoid unsafe practices. Most
warnings are intentionally emitted from bdrv_img_create() in the block
layer, but qemu-img convert uses bdrv_create() which cannot emit its
own warning without causing spurious warnings on other code paths. In
the end, all command-line image creation or backing file rewriting now
performs a check.
Furthermore, if we probe a backing file as non-raw, then it is safe to
explicitly record that result (rather than relying on future probes);
only where we probe a raw image do we care about further warnings to
the user when using such an image (for example, commits into a
probed-raw backing file are prevented), to help them improve their
tooling. But whether or not we make the probe results explicit, we
still warn the user to remind them to upgrade their workflow to supply
-F always.
iotest 114 specifically wants to create an unsafe image for later
amendment rather than defaulting to our new default of recording a
probed format, so it needs an update. While touching it, expand it to
cover all of the various warnings enabled by this patch. iotest 301
also shows a change to qcow messages.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-11-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-06 23:39:54 +03:00
|
|
|
if (!backing_fmt) {
|
2021-05-04 00:36:00 +03:00
|
|
|
error_setg(&local_err,
|
|
|
|
"Backing file specified without backing format");
|
|
|
|
error_append_hint(&local_err, "Detected format of %s.",
|
|
|
|
bs->drv->format_name);
|
|
|
|
goto out;
|
qemu-img: Deprecate use of -b without -F
Creating an image that requires format probing of the backing image is
potentially unsafe (we've had several CVEs over the years based on
probes leaking information to the guest on a subsequent boot, although
these days tools like libvirt are aware of the issue enough to prevent
the worst effects). For example, if our probing algorithm ever
changes, or if other tools like libvirt determine a different probe
result than we do, then subsequent use of that backing file under a
different format will present corrupted data to the guest.
Fortunately, the worst effects occur only when the backing image is
originally raw, and we at least prevent commit into a probed raw
backing file that would change its probed type.
Still, it is worth starting a deprecation clock so that future
qemu-img can refuse to create backing chains that would rely on
probing, to encourage clients to avoid unsafe practices. Most
warnings are intentionally emitted from bdrv_img_create() in the block
layer, but qemu-img convert uses bdrv_create() which cannot emit its
own warning without causing spurious warnings on other code paths. In
the end, all command-line image creation or backing file rewriting now
performs a check.
Furthermore, if we probe a backing file as non-raw, then it is safe to
explicitly record that result (rather than relying on future probes);
only where we probe a raw image do we care about further warnings to
the user when using such an image (for example, commits into a
probed-raw backing file are prevented), to help them improve their
tooling. But whether or not we make the probe results explicit, we
still warn the user to remind them to upgrade their workflow to supply
-F always.
iotest 114 specifically wants to create an unsafe image for later
amendment rather than defaulting to our new default of recording a
probed format, so it needs an update. While touching it, expand it to
cover all of the various warnings enabled by this patch. iotest 301
also shows a change to qcow messages.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-11-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-06 23:39:54 +03:00
|
|
|
}
|
2017-07-18 03:34:22 +03:00
|
|
|
if (size == -1) {
|
|
|
|
/* Opened BS, have no size */
|
|
|
|
size = bdrv_getlength(bs);
|
|
|
|
if (size < 0) {
|
|
|
|
error_setg_errno(errp, -size, "Could not get size of '%s'",
|
|
|
|
backing_file);
|
|
|
|
bdrv_unref(bs);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
qemu_opt_set_number(opts, BLOCK_OPT_SIZE, size, &error_abort);
|
2014-06-26 15:23:25 +04:00
|
|
|
}
|
2013-12-03 17:57:52 +04:00
|
|
|
bdrv_unref(bs);
|
2010-12-16 15:52:15 +03:00
|
|
|
}
|
qemu-img: Deprecate use of -b without -F
Creating an image that requires format probing of the backing image is
potentially unsafe (we've had several CVEs over the years based on
probes leaking information to the guest on a subsequent boot, although
these days tools like libvirt are aware of the issue enough to prevent
the worst effects). For example, if our probing algorithm ever
changes, or if other tools like libvirt determine a different probe
result than we do, then subsequent use of that backing file under a
different format will present corrupted data to the guest.
Fortunately, the worst effects occur only when the backing image is
originally raw, and we at least prevent commit into a probed raw
backing file that would change its probed type.
Still, it is worth starting a deprecation clock so that future
qemu-img can refuse to create backing chains that would rely on
probing, to encourage clients to avoid unsafe practices. Most
warnings are intentionally emitted from bdrv_img_create() in the block
layer, but qemu-img convert uses bdrv_create() which cannot emit its
own warning without causing spurious warnings on other code paths. In
the end, all command-line image creation or backing file rewriting now
performs a check.
Furthermore, if we probe a backing file as non-raw, then it is safe to
explicitly record that result (rather than relying on future probes);
only where we probe a raw image do we care about further warnings to
the user when using such an image (for example, commits into a
probed-raw backing file are prevented), to help them improve their
tooling. But whether or not we make the probe results explicit, we
still warn the user to remind them to upgrade their workflow to supply
-F always.
iotest 114 specifically wants to create an unsafe image for later
amendment rather than defaulting to our new default of recording a
probed format, so it needs an update. While touching it, expand it to
cover all of the various warnings enabled by this patch. iotest 301
also shows a change to qcow messages.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-11-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-06 23:39:54 +03:00
|
|
|
/* (backing_file && !(flags & BDRV_O_NO_BACKING)) */
|
|
|
|
} else if (backing_file && !backing_fmt) {
|
2021-05-04 00:36:00 +03:00
|
|
|
error_setg(&local_err,
|
|
|
|
"Backing file specified without backing format");
|
|
|
|
goto out;
|
qemu-img: Deprecate use of -b without -F
Creating an image that requires format probing of the backing image is
potentially unsafe (we've had several CVEs over the years based on
probes leaking information to the guest on a subsequent boot, although
these days tools like libvirt are aware of the issue enough to prevent
the worst effects). For example, if our probing algorithm ever
changes, or if other tools like libvirt determine a different probe
result than we do, then subsequent use of that backing file under a
different format will present corrupted data to the guest.
Fortunately, the worst effects occur only when the backing image is
originally raw, and we at least prevent commit into a probed raw
backing file that would change its probed type.
Still, it is worth starting a deprecation clock so that future
qemu-img can refuse to create backing chains that would rely on
probing, to encourage clients to avoid unsafe practices. Most
warnings are intentionally emitted from bdrv_img_create() in the block
layer, but qemu-img convert uses bdrv_create() which cannot emit its
own warning without causing spurious warnings on other code paths. In
the end, all command-line image creation or backing file rewriting now
performs a check.
Furthermore, if we probe a backing file as non-raw, then it is safe to
explicitly record that result (rather than relying on future probes);
only where we probe a raw image do we care about further warnings to
the user when using such an image (for example, commits into a
probed-raw backing file are prevented), to help them improve their
tooling. But whether or not we make the probe results explicit, we
still warn the user to remind them to upgrade their workflow to supply
-F always.
iotest 114 specifically wants to create an unsafe image for later
amendment rather than defaulting to our new default of recording a
probed format, so it needs an update. While touching it, expand it to
cover all of the various warnings enabled by this patch. iotest 301
also shows a change to qcow messages.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-11-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-07-06 23:39:54 +03:00
|
|
|
}
|
2017-07-18 03:34:22 +03:00
|
|
|
|
|
|
|
if (size == -1) {
|
|
|
|
error_setg(errp, "Image creation needs a size parameter");
|
|
|
|
goto out;
|
2010-12-16 15:52:15 +03:00
|
|
|
}
|
|
|
|
|
2013-02-13 12:09:40 +04:00
|
|
|
if (!quiet) {
|
2015-07-07 17:42:10 +03:00
|
|
|
printf("Formatting '%s', fmt=%s ", filename, fmt);
|
2014-12-09 10:38:04 +03:00
|
|
|
qemu_opts_print(opts, " ");
|
2013-02-13 12:09:40 +04:00
|
|
|
puts("");
|
2020-07-06 23:39:45 +03:00
|
|
|
fflush(stdout);
|
2013-02-13 12:09:40 +04:00
|
|
|
}
|
2014-06-05 13:20:51 +04:00
|
|
|
|
2014-06-05 13:21:11 +04:00
|
|
|
ret = bdrv_create(drv, filename, opts, &local_err);
|
2014-06-05 13:20:51 +04:00
|
|
|
|
2013-09-06 19:14:26 +04:00
|
|
|
if (ret == -EFBIG) {
|
|
|
|
/* This is generally a better message than whatever the driver would
|
|
|
|
* deliver (especially because of the cluster_size_hint), since that
|
|
|
|
* is most probably not much different from "image too large". */
|
|
|
|
const char *cluster_size_hint = "";
|
2014-06-05 13:20:51 +04:00
|
|
|
if (qemu_opt_get_size(opts, BLOCK_OPT_CLUSTER_SIZE, 0)) {
|
2013-09-06 19:14:26 +04:00
|
|
|
cluster_size_hint = " (try using a larger cluster size)";
|
2010-12-16 15:52:15 +03:00
|
|
|
}
|
2013-09-06 19:14:26 +04:00
|
|
|
error_setg(errp, "The image size is too large for file format '%s'"
|
|
|
|
"%s", fmt, cluster_size_hint);
|
|
|
|
error_free(local_err);
|
|
|
|
local_err = NULL;
|
2010-12-16 15:52:15 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
2014-06-05 13:20:51 +04:00
|
|
|
qemu_opts_del(opts);
|
|
|
|
qemu_opts_free(create_opts);
|
2016-06-14 00:57:56 +03:00
|
|
|
error_propagate(errp, local_err);
|
2010-12-16 15:52:15 +03:00
|
|
|
}
|
2013-03-07 16:41:48 +04:00
|
|
|
|
|
|
|
AioContext *bdrv_get_aio_context(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2018-02-16 19:50:13 +03:00
|
|
|
return bs ? bs->aio_context : qemu_get_aio_context();
|
2014-05-08 18:34:37 +04:00
|
|
|
}
|
|
|
|
|
2020-10-05 18:58:53 +03:00
|
|
|
AioContext *coroutine_fn bdrv_co_enter(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
Coroutine *self = qemu_coroutine_self();
|
|
|
|
AioContext *old_ctx = qemu_coroutine_get_aio_context(self);
|
|
|
|
AioContext *new_ctx;
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2020-10-05 18:58:53 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Increase bs->in_flight to ensure that this operation is completed before
|
|
|
|
* moving the node to a different AioContext. Read new_ctx only afterwards.
|
|
|
|
*/
|
|
|
|
bdrv_inc_in_flight(bs);
|
|
|
|
|
|
|
|
new_ctx = bdrv_get_aio_context(bs);
|
|
|
|
aio_co_reschedule_self(new_ctx);
|
|
|
|
return old_ctx;
|
|
|
|
}
|
|
|
|
|
|
|
|
void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx)
|
|
|
|
{
|
2022-03-03 18:15:50 +03:00
|
|
|
IO_CODE();
|
2020-10-05 18:58:53 +03:00
|
|
|
aio_co_reschedule_self(old_ctx);
|
|
|
|
bdrv_dec_in_flight(bs);
|
|
|
|
}
|
|
|
|
|
2020-10-05 18:58:54 +03:00
|
|
|
void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
AioContext *ctx = bdrv_get_aio_context(bs);
|
|
|
|
|
|
|
|
/* In the main thread, bs->aio_context won't change concurrently */
|
|
|
|
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We're in coroutine context, so we already hold the lock of the main
|
|
|
|
* loop AioContext. Don't lock it twice to avoid deadlocks.
|
|
|
|
*/
|
|
|
|
assert(qemu_in_coroutine());
|
|
|
|
if (ctx != qemu_get_aio_context()) {
|
|
|
|
aio_context_acquire(ctx);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void coroutine_fn bdrv_co_unlock(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
AioContext *ctx = bdrv_get_aio_context(bs);
|
|
|
|
|
|
|
|
assert(qemu_in_coroutine());
|
|
|
|
if (ctx != qemu_get_aio_context()) {
|
|
|
|
aio_context_release(ctx);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-06-16 19:56:26 +03:00
|
|
|
static void bdrv_do_remove_aio_context_notifier(BdrvAioNotifier *ban)
|
|
|
|
{
|
2022-03-03 18:16:02 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2016-06-16 19:56:26 +03:00
|
|
|
QLIST_REMOVE(ban, list);
|
|
|
|
g_free(ban);
|
|
|
|
}
|
|
|
|
|
2019-05-06 20:17:57 +03:00
|
|
|
static void bdrv_detach_aio_context(BlockDriverState *bs)
|
2014-05-08 18:34:37 +04:00
|
|
|
{
|
2016-06-16 19:56:26 +03:00
|
|
|
BdrvAioNotifier *baf, *baf_tmp;
|
2014-06-20 23:57:33 +04:00
|
|
|
|
2016-06-16 19:56:26 +03:00
|
|
|
assert(!bs->walking_aio_notifiers);
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2016-06-16 19:56:26 +03:00
|
|
|
bs->walking_aio_notifiers = true;
|
|
|
|
QLIST_FOREACH_SAFE(baf, &bs->aio_notifiers, list, baf_tmp) {
|
|
|
|
if (baf->deleted) {
|
|
|
|
bdrv_do_remove_aio_context_notifier(baf);
|
|
|
|
} else {
|
|
|
|
baf->detach_aio_context(baf->opaque);
|
|
|
|
}
|
2014-06-20 23:57:33 +04:00
|
|
|
}
|
2016-06-16 19:56:26 +03:00
|
|
|
/* Never mind iterating again to check for ->deleted. bdrv_close() will
|
|
|
|
* remove remaining aio notifiers if we aren't called again.
|
|
|
|
*/
|
|
|
|
bs->walking_aio_notifiers = false;
|
2014-06-20 23:57:33 +04:00
|
|
|
|
block: Fix AioContext switch for bs->drv == NULL
Even for block nodes with bs->drv == NULL, we can't just ignore a
bdrv_set_aio_context() call. Leaving the node in its old context can
mean that it's still in an iothread context in bdrv_close_all() during
shutdown, resulting in an attempted unlock of the AioContext lock which
we don't hold.
This is an example stack trace of a related crash:
#0 0x00007ffff59da57f in raise () at /lib64/libc.so.6
#1 0x00007ffff59c4895 in abort () at /lib64/libc.so.6
#2 0x0000555555b97b1e in error_exit (err=<optimized out>, msg=msg@entry=0x555555d386d0 <__func__.19059> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
#3 0x0000555555b97f7f in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5555568002f0, file=file@entry=0x555555d378df "util/async.c", line=line@entry=507) at util/qemu-thread-posix.c:97
#4 0x0000555555b92f55 in aio_context_release (ctx=ctx@entry=0x555556800290) at util/async.c:507
#5 0x0000555555b05cf8 in bdrv_prwv_co (child=child@entry=0x7fffc80012f0, offset=offset@entry=131072, qiov=qiov@entry=0x7fffffffd4f0, is_write=is_write@entry=true, flags=flags@entry=0)
at block/io.c:833
#6 0x0000555555b060a9 in bdrv_pwritev (qiov=0x7fffffffd4f0, offset=131072, child=0x7fffc80012f0) at block/io.c:990
#7 0x0000555555b060a9 in bdrv_pwrite (child=0x7fffc80012f0, offset=131072, buf=<optimized out>, bytes=<optimized out>) at block/io.c:990
#8 0x0000555555ae172b in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568cc740, i=i@entry=0) at block/qcow2-cache.c:51
#9 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568cc740) at block/qcow2-cache.c:248
#10 0x0000555555ae15de in qcow2_cache_flush (bs=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
#11 0x0000555555ae16b1 in qcow2_cache_flush_dependency (c=0x5555568a1700, c=0x5555568a1700, bs=0x555556810680) at block/qcow2-cache.c:194
#12 0x0000555555ae16b1 in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568a1700, i=i@entry=0) at block/qcow2-cache.c:194
#13 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568a1700) at block/qcow2-cache.c:248
#14 0x0000555555ae15de in qcow2_cache_flush (bs=bs@entry=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
#15 0x0000555555ad242c in qcow2_inactivate (bs=bs@entry=0x555556810680) at block/qcow2.c:2124
#16 0x0000555555ad2590 in qcow2_close (bs=0x555556810680) at block/qcow2.c:2153
#17 0x0000555555ab0c62 in bdrv_close (bs=0x555556810680) at block.c:3358
#18 0x0000555555ab0c62 in bdrv_delete (bs=0x555556810680) at block.c:3542
#19 0x0000555555ab0c62 in bdrv_unref (bs=0x555556810680) at block.c:4598
#20 0x0000555555af4d72 in blk_remove_bs (blk=blk@entry=0x5555568103d0) at block/block-backend.c:785
#21 0x0000555555af4dbb in blk_remove_all_bs () at block/block-backend.c:483
#22 0x0000555555aae02f in bdrv_close_all () at block.c:3412
#23 0x00005555557f9796 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776
The reproducer I used is a qcow2 image on gluster volume, where the
virtual disk size (4 GB) is larger than the gluster volume size (64M),
so we can easily trigger an ENOSPC. This backend is assigned to a
virtio-blk device using an iothread, and then from the guest a
'dd if=/dev/zero of=/dev/vda bs=1G count=1' causes the VM to stop
because of an I/O error. qemu_gluster_co_flush_to_disk() sets
bs->drv = NULL on error, so when virtio-blk stops the dataplane, the
block nodes stay in the iothread AioContext. A 'quit' monitor command
issued from this paused state crashes the process.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1631227
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2019-04-17 18:15:25 +03:00
|
|
|
if (bs->drv && bs->drv->bdrv_detach_aio_context) {
|
2014-05-08 18:34:37 +04:00
|
|
|
bs->drv->bdrv_detach_aio_context(bs);
|
|
|
|
}
|
|
|
|
|
2019-02-08 18:51:17 +03:00
|
|
|
if (bs->quiesce_counter) {
|
|
|
|
aio_enable_external(bs->aio_context);
|
|
|
|
}
|
2014-05-08 18:34:37 +04:00
|
|
|
bs->aio_context = NULL;
|
|
|
|
}
|
|
|
|
|
2019-05-06 20:17:57 +03:00
|
|
|
static void bdrv_attach_aio_context(BlockDriverState *bs,
|
|
|
|
AioContext *new_context)
|
2014-05-08 18:34:37 +04:00
|
|
|
{
|
2016-06-16 19:56:26 +03:00
|
|
|
BdrvAioNotifier *ban, *ban_tmp;
|
2022-03-03 18:16:11 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-06-20 23:57:33 +04:00
|
|
|
|
2019-02-08 18:51:17 +03:00
|
|
|
if (bs->quiesce_counter) {
|
|
|
|
aio_disable_external(new_context);
|
|
|
|
}
|
|
|
|
|
2014-05-08 18:34:37 +04:00
|
|
|
bs->aio_context = new_context;
|
|
|
|
|
block: Fix AioContext switch for bs->drv == NULL
Even for block nodes with bs->drv == NULL, we can't just ignore a
bdrv_set_aio_context() call. Leaving the node in its old context can
mean that it's still in an iothread context in bdrv_close_all() during
shutdown, resulting in an attempted unlock of the AioContext lock which
we don't hold.
This is an example stack trace of a related crash:
#0 0x00007ffff59da57f in raise () at /lib64/libc.so.6
#1 0x00007ffff59c4895 in abort () at /lib64/libc.so.6
#2 0x0000555555b97b1e in error_exit (err=<optimized out>, msg=msg@entry=0x555555d386d0 <__func__.19059> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
#3 0x0000555555b97f7f in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5555568002f0, file=file@entry=0x555555d378df "util/async.c", line=line@entry=507) at util/qemu-thread-posix.c:97
#4 0x0000555555b92f55 in aio_context_release (ctx=ctx@entry=0x555556800290) at util/async.c:507
#5 0x0000555555b05cf8 in bdrv_prwv_co (child=child@entry=0x7fffc80012f0, offset=offset@entry=131072, qiov=qiov@entry=0x7fffffffd4f0, is_write=is_write@entry=true, flags=flags@entry=0)
at block/io.c:833
#6 0x0000555555b060a9 in bdrv_pwritev (qiov=0x7fffffffd4f0, offset=131072, child=0x7fffc80012f0) at block/io.c:990
#7 0x0000555555b060a9 in bdrv_pwrite (child=0x7fffc80012f0, offset=131072, buf=<optimized out>, bytes=<optimized out>) at block/io.c:990
#8 0x0000555555ae172b in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568cc740, i=i@entry=0) at block/qcow2-cache.c:51
#9 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568cc740) at block/qcow2-cache.c:248
#10 0x0000555555ae15de in qcow2_cache_flush (bs=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
#11 0x0000555555ae16b1 in qcow2_cache_flush_dependency (c=0x5555568a1700, c=0x5555568a1700, bs=0x555556810680) at block/qcow2-cache.c:194
#12 0x0000555555ae16b1 in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568a1700, i=i@entry=0) at block/qcow2-cache.c:194
#13 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568a1700) at block/qcow2-cache.c:248
#14 0x0000555555ae15de in qcow2_cache_flush (bs=bs@entry=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
#15 0x0000555555ad242c in qcow2_inactivate (bs=bs@entry=0x555556810680) at block/qcow2.c:2124
#16 0x0000555555ad2590 in qcow2_close (bs=0x555556810680) at block/qcow2.c:2153
#17 0x0000555555ab0c62 in bdrv_close (bs=0x555556810680) at block.c:3358
#18 0x0000555555ab0c62 in bdrv_delete (bs=0x555556810680) at block.c:3542
#19 0x0000555555ab0c62 in bdrv_unref (bs=0x555556810680) at block.c:4598
#20 0x0000555555af4d72 in blk_remove_bs (blk=blk@entry=0x5555568103d0) at block/block-backend.c:785
#21 0x0000555555af4dbb in blk_remove_all_bs () at block/block-backend.c:483
#22 0x0000555555aae02f in bdrv_close_all () at block.c:3412
#23 0x00005555557f9796 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776
The reproducer I used is a qcow2 image on gluster volume, where the
virtual disk size (4 GB) is larger than the gluster volume size (64M),
so we can easily trigger an ENOSPC. This backend is assigned to a
virtio-blk device using an iothread, and then from the guest a
'dd if=/dev/zero of=/dev/vda bs=1G count=1' causes the VM to stop
because of an I/O error. qemu_gluster_co_flush_to_disk() sets
bs->drv = NULL on error, so when virtio-blk stops the dataplane, the
block nodes stay in the iothread AioContext. A 'quit' monitor command
issued from this paused state crashes the process.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1631227
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2019-04-17 18:15:25 +03:00
|
|
|
if (bs->drv && bs->drv->bdrv_attach_aio_context) {
|
2014-05-08 18:34:37 +04:00
|
|
|
bs->drv->bdrv_attach_aio_context(bs, new_context);
|
|
|
|
}
|
2014-06-20 23:57:33 +04:00
|
|
|
|
2016-06-16 19:56:26 +03:00
|
|
|
assert(!bs->walking_aio_notifiers);
|
|
|
|
bs->walking_aio_notifiers = true;
|
|
|
|
QLIST_FOREACH_SAFE(ban, &bs->aio_notifiers, list, ban_tmp) {
|
|
|
|
if (ban->deleted) {
|
|
|
|
bdrv_do_remove_aio_context_notifier(ban);
|
|
|
|
} else {
|
|
|
|
ban->attached_aio_context(new_context, ban->opaque);
|
|
|
|
}
|
2014-06-20 23:57:33 +04:00
|
|
|
}
|
2016-06-16 19:56:26 +03:00
|
|
|
bs->walking_aio_notifiers = false;
|
2014-05-08 18:34:37 +04:00
|
|
|
}
|
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
typedef struct BdrvStateSetAioContext {
|
|
|
|
AioContext *new_ctx;
|
|
|
|
BlockDriverState *bs;
|
|
|
|
} BdrvStateSetAioContext;
|
2013-06-24 19:13:10 +04:00
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
static bool bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
|
2022-10-25 11:49:45 +03:00
|
|
|
GHashTable *visited,
|
|
|
|
Transaction *tran,
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
Error **errp)
|
2019-05-06 20:17:56 +03:00
|
|
|
{
|
2022-03-03 18:16:13 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2022-10-25 11:49:45 +03:00
|
|
|
if (g_hash_table_contains(visited, c)) {
|
2019-05-06 20:17:56 +03:00
|
|
|
return true;
|
|
|
|
}
|
2022-10-25 11:49:45 +03:00
|
|
|
g_hash_table_add(visited, c);
|
2019-05-06 20:17:56 +03:00
|
|
|
|
2020-05-13 14:05:13 +03:00
|
|
|
/*
|
|
|
|
* A BdrvChildClass that doesn't handle AioContext changes cannot
|
|
|
|
* tolerate any AioContext changes
|
|
|
|
*/
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
if (!c->klass->change_aio_ctx) {
|
2019-05-06 20:17:56 +03:00
|
|
|
char *user = bdrv_child_user_desc(c);
|
|
|
|
error_setg(errp, "Changing iothreads is not supported by %s", user);
|
|
|
|
g_free(user);
|
|
|
|
return false;
|
|
|
|
}
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
if (!c->klass->change_aio_ctx(c, ctx, visited, tran, errp)) {
|
2019-05-06 20:17:56 +03:00
|
|
|
assert(!errp || *errp);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
|
2022-10-25 11:49:45 +03:00
|
|
|
GHashTable *visited, Transaction *tran,
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
Error **errp)
|
2019-05-06 20:17:56 +03:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2022-10-25 11:49:45 +03:00
|
|
|
if (g_hash_table_contains(visited, c)) {
|
2019-05-06 20:17:56 +03:00
|
|
|
return true;
|
|
|
|
}
|
2022-10-25 11:49:45 +03:00
|
|
|
g_hash_table_add(visited, c);
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
return bdrv_change_aio_context(c->bs, ctx, visited, tran, errp);
|
2019-05-06 20:17:56 +03:00
|
|
|
}
|
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
static void bdrv_set_aio_context_clean(void *opaque)
|
|
|
|
{
|
|
|
|
BdrvStateSetAioContext *state = (BdrvStateSetAioContext *) opaque;
|
|
|
|
BlockDriverState *bs = (BlockDriverState *) state->bs;
|
|
|
|
|
|
|
|
/* Paired with bdrv_drained_begin in bdrv_change_aio_context() */
|
|
|
|
bdrv_drained_end(bs);
|
|
|
|
|
|
|
|
g_free(state);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bdrv_set_aio_context_commit(void *opaque)
|
|
|
|
{
|
|
|
|
BdrvStateSetAioContext *state = (BdrvStateSetAioContext *) opaque;
|
|
|
|
BlockDriverState *bs = (BlockDriverState *) state->bs;
|
|
|
|
AioContext *new_context = state->new_ctx;
|
|
|
|
AioContext *old_context = bdrv_get_aio_context(bs);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Take the old AioContex when detaching it from bs.
|
|
|
|
* At this point, new_context lock is already acquired, and we are now
|
|
|
|
* also taking old_context. This is safe as long as bdrv_detach_aio_context
|
|
|
|
* does not call AIO_POLL_WHILE().
|
|
|
|
*/
|
|
|
|
if (old_context != qemu_get_aio_context()) {
|
|
|
|
aio_context_acquire(old_context);
|
|
|
|
}
|
|
|
|
bdrv_detach_aio_context(bs);
|
|
|
|
if (old_context != qemu_get_aio_context()) {
|
|
|
|
aio_context_release(old_context);
|
|
|
|
}
|
|
|
|
bdrv_attach_aio_context(bs, new_context);
|
|
|
|
}
|
|
|
|
|
|
|
|
static TransactionActionDrv set_aio_context = {
|
|
|
|
.commit = bdrv_set_aio_context_commit,
|
|
|
|
.clean = bdrv_set_aio_context_clean,
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Changes the AioContext used for fd handlers, timers, and BHs by this
|
|
|
|
* BlockDriverState and all its children and parents.
|
|
|
|
*
|
|
|
|
* Must be called from the main AioContext.
|
|
|
|
*
|
|
|
|
* The caller must own the AioContext lock for the old AioContext of bs, but it
|
|
|
|
* must not own the AioContext lock for new_context (unless new_context is the
|
|
|
|
* same as the current context of bs).
|
|
|
|
*
|
|
|
|
* @visited will accumulate all visited BdrvChild objects. The caller is
|
|
|
|
* responsible for freeing the list afterwards.
|
|
|
|
*/
|
|
|
|
static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
2022-10-25 11:49:45 +03:00
|
|
|
GHashTable *visited, Transaction *tran,
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
Error **errp)
|
2019-05-06 20:17:56 +03:00
|
|
|
{
|
|
|
|
BdrvChild *c;
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
BdrvStateSetAioContext *state;
|
|
|
|
|
|
|
|
GLOBAL_STATE_CODE();
|
2019-05-06 20:17:56 +03:00
|
|
|
|
|
|
|
if (bdrv_get_aio_context(bs) == ctx) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
QLIST_FOREACH(c, &bs->parents, next_parent) {
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
if (!bdrv_parent_change_aio_context(c, ctx, visited, tran, errp)) {
|
2019-05-06 20:17:56 +03:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
|
2019-05-06 20:17:56 +03:00
|
|
|
QLIST_FOREACH(c, &bs->children, next) {
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
if (!bdrv_child_change_aio_context(c, ctx, visited, tran, errp)) {
|
2019-05-06 20:17:56 +03:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
state = g_new(BdrvStateSetAioContext, 1);
|
|
|
|
*state = (BdrvStateSetAioContext) {
|
|
|
|
.new_ctx = ctx,
|
|
|
|
.bs = bs,
|
|
|
|
};
|
|
|
|
|
|
|
|
/* Paired with bdrv_drained_end in bdrv_set_aio_context_clean() */
|
|
|
|
bdrv_drained_begin(bs);
|
|
|
|
|
|
|
|
tran_add(tran, &set_aio_context, state);
|
|
|
|
|
2019-05-06 20:17:56 +03:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
/*
|
|
|
|
* Change bs's and recursively all of its parents' and children's AioContext
|
|
|
|
* to the given new context, returning an error if that isn't possible.
|
|
|
|
*
|
|
|
|
* If ignore_child is not NULL, that child (and its subgraph) will not
|
|
|
|
* be touched.
|
|
|
|
*
|
|
|
|
* This function still requires the caller to take the bs current
|
|
|
|
* AioContext lock, otherwise draining will fail since AIO_WAIT_WHILE
|
|
|
|
* assumes the lock is always held if bs is in another AioContext.
|
|
|
|
* For the same reason, it temporarily also holds the new AioContext, since
|
|
|
|
* bdrv_drained_end calls BDRV_POLL_WHILE that assumes the lock is taken too.
|
|
|
|
* Therefore the new AioContext lock must not be taken by the caller.
|
|
|
|
*/
|
2022-10-25 11:49:51 +03:00
|
|
|
int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
|
|
|
BdrvChild *ignore_child, Error **errp)
|
2019-05-06 20:17:56 +03:00
|
|
|
{
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
Transaction *tran;
|
2022-10-25 11:49:45 +03:00
|
|
|
GHashTable *visited;
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
int ret;
|
|
|
|
AioContext *old_context = bdrv_get_aio_context(bs);
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
/*
|
|
|
|
* Recursion phase: go through all nodes of the graph.
|
|
|
|
* Take care of checking that all nodes support changing AioContext
|
|
|
|
* and drain them, builing a linear list of callbacks to run if everything
|
|
|
|
* is successful (the transaction itself).
|
|
|
|
*/
|
|
|
|
tran = tran_new();
|
2022-10-25 11:49:45 +03:00
|
|
|
visited = g_hash_table_new(NULL, NULL);
|
|
|
|
if (ignore_child) {
|
|
|
|
g_hash_table_add(visited, ignore_child);
|
|
|
|
}
|
|
|
|
ret = bdrv_change_aio_context(bs, ctx, visited, tran, errp);
|
|
|
|
g_hash_table_destroy(visited);
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Linear phase: go through all callbacks collected in the transaction.
|
|
|
|
* Run all callbacks collected in the recursion to switch all nodes
|
|
|
|
* AioContext lock (transaction commit), or undo all changes done in the
|
|
|
|
* recursion (transaction abort).
|
|
|
|
*/
|
2019-05-06 20:17:56 +03:00
|
|
|
|
|
|
|
if (!ret) {
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
/* Just run clean() callbacks. No AioContext changed. */
|
|
|
|
tran_abort(tran);
|
2019-05-06 20:17:56 +03:00
|
|
|
return -EPERM;
|
|
|
|
}
|
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
/*
|
|
|
|
* Release old AioContext, it won't be needed anymore, as all
|
|
|
|
* bdrv_drained_begin() have been called already.
|
|
|
|
*/
|
|
|
|
if (qemu_get_aio_context() != old_context) {
|
|
|
|
aio_context_release(old_context);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Acquire new AioContext since bdrv_drained_end() is going to be called
|
|
|
|
* after we switched all nodes in the new AioContext, and the function
|
|
|
|
* assumes that the lock of the bs is always taken.
|
|
|
|
*/
|
|
|
|
if (qemu_get_aio_context() != ctx) {
|
|
|
|
aio_context_acquire(ctx);
|
|
|
|
}
|
2019-05-06 20:17:59 +03:00
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
tran_commit(tran);
|
2019-05-06 20:17:56 +03:00
|
|
|
|
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221025084952.2139888-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-25 11:49:44 +03:00
|
|
|
if (qemu_get_aio_context() != ctx) {
|
|
|
|
aio_context_release(ctx);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Re-acquire the old AioContext, since the caller takes and releases it. */
|
|
|
|
if (qemu_get_aio_context() != old_context) {
|
|
|
|
aio_context_acquire(old_context);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
2019-05-06 20:17:56 +03:00
|
|
|
}
|
|
|
|
|
2014-06-20 23:57:33 +04:00
|
|
|
void bdrv_add_aio_context_notifier(BlockDriverState *bs,
|
|
|
|
void (*attached_aio_context)(AioContext *new_context, void *opaque),
|
|
|
|
void (*detach_aio_context)(void *opaque), void *opaque)
|
|
|
|
{
|
|
|
|
BdrvAioNotifier *ban = g_new(BdrvAioNotifier, 1);
|
|
|
|
*ban = (BdrvAioNotifier){
|
|
|
|
.attached_aio_context = attached_aio_context,
|
|
|
|
.detach_aio_context = detach_aio_context,
|
|
|
|
.opaque = opaque
|
|
|
|
};
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-06-20 23:57:33 +04:00
|
|
|
|
|
|
|
QLIST_INSERT_HEAD(&bs->aio_notifiers, ban, list);
|
|
|
|
}
|
|
|
|
|
|
|
|
void bdrv_remove_aio_context_notifier(BlockDriverState *bs,
|
|
|
|
void (*attached_aio_context)(AioContext *,
|
|
|
|
void *),
|
|
|
|
void (*detach_aio_context)(void *),
|
|
|
|
void *opaque)
|
|
|
|
{
|
|
|
|
BdrvAioNotifier *ban, *ban_next;
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2014-06-20 23:57:33 +04:00
|
|
|
|
|
|
|
QLIST_FOREACH_SAFE(ban, &bs->aio_notifiers, list, ban_next) {
|
|
|
|
if (ban->attached_aio_context == attached_aio_context &&
|
|
|
|
ban->detach_aio_context == detach_aio_context &&
|
2016-06-16 19:56:26 +03:00
|
|
|
ban->opaque == opaque &&
|
|
|
|
ban->deleted == false)
|
2014-06-20 23:57:33 +04:00
|
|
|
{
|
2016-06-16 19:56:26 +03:00
|
|
|
if (bs->walking_aio_notifiers) {
|
|
|
|
ban->deleted = true;
|
|
|
|
} else {
|
|
|
|
bdrv_do_remove_aio_context_notifier(ban);
|
|
|
|
}
|
2014-06-20 23:57:33 +04:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
abort();
|
|
|
|
}
|
|
|
|
|
2014-10-27 13:12:50 +03:00
|
|
|
int bdrv_amend_options(BlockDriverState *bs, QemuOpts *opts,
|
2018-05-10 00:00:18 +03:00
|
|
|
BlockDriverAmendStatusCB *status_cb, void *cb_opaque,
|
2020-06-25 15:55:38 +03:00
|
|
|
bool force,
|
2018-05-10 00:00:18 +03:00
|
|
|
Error **errp)
|
2013-09-03 12:09:50 +04:00
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2017-11-10 23:31:09 +03:00
|
|
|
if (!bs->drv) {
|
2018-05-10 00:00:18 +03:00
|
|
|
error_setg(errp, "Node is ejected");
|
2017-11-10 23:31:09 +03:00
|
|
|
return -ENOMEDIUM;
|
|
|
|
}
|
2014-06-05 13:21:11 +04:00
|
|
|
if (!bs->drv->bdrv_amend_options) {
|
2018-05-10 00:00:18 +03:00
|
|
|
error_setg(errp, "Block driver '%s' does not support option amendment",
|
|
|
|
bs->drv->format_name);
|
2013-09-03 12:09:50 +04:00
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
2020-06-25 15:55:38 +03:00
|
|
|
return bs->drv->bdrv_amend_options(bs, opts, status_cb,
|
|
|
|
cb_opaque, force, errp);
|
2013-09-03 12:09:50 +04:00
|
|
|
}
|
2013-10-02 16:33:48 +04:00
|
|
|
|
2020-02-18 13:34:41 +03:00
|
|
|
/*
|
|
|
|
* This function checks whether the given @to_replace is allowed to be
|
|
|
|
* replaced by a node that always shows the same data as @bs. This is
|
|
|
|
* used for example to verify whether the mirror job can replace
|
|
|
|
* @to_replace by the target mirrored from @bs.
|
|
|
|
* To be replaceable, @bs and @to_replace may either be guaranteed to
|
|
|
|
* always show the same data (because they are only connected through
|
|
|
|
* filters), or some driver may allow replacing one of its children
|
|
|
|
* because it can guarantee that this child's data is not visible at
|
|
|
|
* all (for example, for dissenting quorum children that have no other
|
|
|
|
* parents).
|
|
|
|
*/
|
|
|
|
bool bdrv_recurse_can_replace(BlockDriverState *bs,
|
|
|
|
BlockDriverState *to_replace)
|
|
|
|
{
|
2019-06-12 18:03:38 +03:00
|
|
|
BlockDriverState *filtered;
|
|
|
|
|
2022-03-03 18:15:57 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2020-02-18 13:34:41 +03:00
|
|
|
if (!bs || !bs->drv) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (bs == to_replace) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* See what the driver can do */
|
|
|
|
if (bs->drv->bdrv_recurse_can_replace) {
|
|
|
|
return bs->drv->bdrv_recurse_can_replace(bs, to_replace);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* For filters without an own implementation, we can recurse on our own */
|
2019-06-12 18:03:38 +03:00
|
|
|
filtered = bdrv_filter_bs(bs);
|
|
|
|
if (filtered) {
|
|
|
|
return bdrv_recurse_can_replace(filtered, to_replace);
|
2020-02-18 13:34:41 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Safe default */
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2020-02-18 13:34:44 +03:00
|
|
|
/*
|
|
|
|
* Check whether the given @node_name can be replaced by a node that
|
|
|
|
* has the same data as @parent_bs. If so, return @node_name's BDS;
|
|
|
|
* NULL otherwise.
|
|
|
|
*
|
|
|
|
* @node_name must be a (recursive) *child of @parent_bs (or this
|
|
|
|
* function will return NULL).
|
|
|
|
*
|
|
|
|
* The result (whether the node can be replaced or not) is only valid
|
|
|
|
* for as long as no graph or permission changes occur.
|
|
|
|
*/
|
2015-07-17 05:12:22 +03:00
|
|
|
BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
|
|
|
|
const char *node_name, Error **errp)
|
2014-06-27 20:25:25 +04:00
|
|
|
{
|
|
|
|
BlockDriverState *to_replace_bs = bdrv_find_node(node_name);
|
2014-10-21 15:03:58 +04:00
|
|
|
AioContext *aio_context;
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2014-06-27 20:25:25 +04:00
|
|
|
if (!to_replace_bs) {
|
2021-03-05 18:19:28 +03:00
|
|
|
error_setg(errp, "Failed to find node with node-name='%s'", node_name);
|
2014-06-27 20:25:25 +04:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2014-10-21 15:03:58 +04:00
|
|
|
aio_context = bdrv_get_aio_context(to_replace_bs);
|
|
|
|
aio_context_acquire(aio_context);
|
|
|
|
|
2014-06-27 20:25:25 +04:00
|
|
|
if (bdrv_op_is_blocked(to_replace_bs, BLOCK_OP_TYPE_REPLACE, errp)) {
|
2014-10-21 15:03:58 +04:00
|
|
|
to_replace_bs = NULL;
|
|
|
|
goto out;
|
2014-06-27 20:25:25 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* We don't want arbitrary node of the BDS chain to be replaced only the top
|
|
|
|
* most non filter in order to prevent data corruption.
|
|
|
|
* Another benefit is that this tests exclude backing files which are
|
|
|
|
* blocked by the backing blockers.
|
|
|
|
*/
|
2020-02-18 13:34:44 +03:00
|
|
|
if (!bdrv_recurse_can_replace(parent_bs, to_replace_bs)) {
|
|
|
|
error_setg(errp, "Cannot replace '%s' by a node mirrored from '%s', "
|
|
|
|
"because it cannot be guaranteed that doing so would not "
|
|
|
|
"lead to an abrupt change of visible data",
|
|
|
|
node_name, parent_bs->node_name);
|
2014-10-21 15:03:58 +04:00
|
|
|
to_replace_bs = NULL;
|
|
|
|
goto out;
|
2014-06-27 20:25:25 +04:00
|
|
|
}
|
|
|
|
|
2014-10-21 15:03:58 +04:00
|
|
|
out:
|
|
|
|
aio_context_release(aio_context);
|
2014-06-27 20:25:25 +04:00
|
|
|
return to_replace_bs;
|
|
|
|
}
|
2014-07-04 14:04:33 +04:00
|
|
|
|
2019-02-01 22:29:27 +03:00
|
|
|
/**
|
|
|
|
* Iterates through the list of runtime option keys that are said to
|
|
|
|
* be "strong" for a BDS. An option is called "strong" if it changes
|
|
|
|
* a BDS's data. For example, the null block driver's "size" and
|
|
|
|
* "read-zeroes" options are strong, but its "latency-ns" option is
|
|
|
|
* not.
|
|
|
|
*
|
|
|
|
* If a key returned by this function ends with a dot, all options
|
|
|
|
* starting with that prefix are strong.
|
|
|
|
*/
|
|
|
|
static const char *const *strong_options(BlockDriverState *bs,
|
|
|
|
const char *const *curopt)
|
|
|
|
{
|
|
|
|
static const char *const global_options[] = {
|
|
|
|
"driver", "filename", NULL
|
|
|
|
};
|
|
|
|
|
|
|
|
if (!curopt) {
|
|
|
|
return &global_options[0];
|
|
|
|
}
|
|
|
|
|
|
|
|
curopt++;
|
|
|
|
if (curopt == &global_options[ARRAY_SIZE(global_options) - 1] && bs->drv) {
|
|
|
|
curopt = bs->drv->strong_runtime_opts;
|
|
|
|
}
|
|
|
|
|
|
|
|
return (curopt && *curopt) ? curopt : NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Copies all strong runtime options from bs->options to the given
|
|
|
|
* QDict. The set of strong option keys is determined by invoking
|
|
|
|
* strong_options().
|
|
|
|
*
|
|
|
|
* Returns true iff any strong option was present in bs->options (and
|
|
|
|
* thus copied to the target QDict) with the exception of "filename"
|
|
|
|
* and "driver". The caller is expected to use this value to decide
|
|
|
|
* whether the existence of strong options prevents the generation of
|
|
|
|
* a plain filename.
|
|
|
|
*/
|
|
|
|
static bool append_strong_runtime_options(QDict *d, BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
bool found_any = false;
|
|
|
|
const char *const *option_name = NULL;
|
|
|
|
|
|
|
|
if (!bs->drv) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
while ((option_name = strong_options(bs, option_name))) {
|
|
|
|
bool option_given = false;
|
|
|
|
|
|
|
|
assert(strlen(*option_name) > 0);
|
|
|
|
if ((*option_name)[strlen(*option_name) - 1] != '.') {
|
|
|
|
QObject *entry = qdict_get(bs->options, *option_name);
|
|
|
|
if (!entry) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
qdict_put_obj(d, *option_name, qobject_ref(entry));
|
|
|
|
option_given = true;
|
|
|
|
} else {
|
|
|
|
const QDictEntry *entry;
|
|
|
|
for (entry = qdict_first(bs->options); entry;
|
|
|
|
entry = qdict_next(bs->options, entry))
|
|
|
|
{
|
|
|
|
if (strstart(qdict_entry_key(entry), *option_name, NULL)) {
|
|
|
|
qdict_put_obj(d, qdict_entry_key(entry),
|
|
|
|
qobject_ref(qdict_entry_value(entry)));
|
|
|
|
option_given = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* While "driver" and "filename" need to be included in a JSON filename,
|
|
|
|
* their existence does not prohibit generation of a plain filename. */
|
|
|
|
if (!found_any && option_given &&
|
|
|
|
strcmp(*option_name, "driver") && strcmp(*option_name, "filename"))
|
|
|
|
{
|
|
|
|
found_any = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-02-01 22:29:34 +03:00
|
|
|
if (!qdict_haskey(d, "driver")) {
|
|
|
|
/* Drivers created with bdrv_new_open_driver() may not have a
|
|
|
|
* @driver option. Add it here. */
|
|
|
|
qdict_put_str(d, "driver", bs->drv->format_name);
|
|
|
|
}
|
|
|
|
|
2019-02-01 22:29:27 +03:00
|
|
|
return found_any;
|
|
|
|
}
|
|
|
|
|
block: Respect backing bs in bdrv_refresh_filename
Basically, bdrv_refresh_filename() should respect all children of a
BlockDriverState. However, generally those children are driver-specific,
so this function cannot handle the general case. On the other hand,
there are only few drivers which use other children than @file and
@backing (that being vmdk, quorum, and blkverify).
Most block drivers only use @file and/or @backing (if they use any
children at all). Both can be implemented directly in
bdrv_refresh_filename.
The user overriding the file's filename is already handled, however, the
user overriding the backing file is not. If this is done, opening the
BDS with the plain filename of its file will not be correct, so we may
not set bs->exact_filename in that case.
iotest 051 contains test cases for overriding the backing file, and so
its output changes with this patch applied.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:09 +03:00
|
|
|
/* Note: This function may return false positives; it may return true
|
|
|
|
* even if opening the backing file specified by bs's image header
|
|
|
|
* would result in exactly bs->backing. */
|
2021-12-15 15:11:38 +03:00
|
|
|
static bool bdrv_backing_overridden(BlockDriverState *bs)
|
block: Respect backing bs in bdrv_refresh_filename
Basically, bdrv_refresh_filename() should respect all children of a
BlockDriverState. However, generally those children are driver-specific,
so this function cannot handle the general case. On the other hand,
there are only few drivers which use other children than @file and
@backing (that being vmdk, quorum, and blkverify).
Most block drivers only use @file and/or @backing (if they use any
children at all). Both can be implemented directly in
bdrv_refresh_filename.
The user overriding the file's filename is already handled, however, the
user overriding the backing file is not. If this is done, opening the
BDS with the plain filename of its file will not be correct, so we may
not set bs->exact_filename in that case.
iotest 051 contains test cases for overriding the backing file, and so
its output changes with this patch applied.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:09 +03:00
|
|
|
{
|
2022-03-03 18:15:57 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
block: Respect backing bs in bdrv_refresh_filename
Basically, bdrv_refresh_filename() should respect all children of a
BlockDriverState. However, generally those children are driver-specific,
so this function cannot handle the general case. On the other hand,
there are only few drivers which use other children than @file and
@backing (that being vmdk, quorum, and blkverify).
Most block drivers only use @file and/or @backing (if they use any
children at all). Both can be implemented directly in
bdrv_refresh_filename.
The user overriding the file's filename is already handled, however, the
user overriding the backing file is not. If this is done, opening the
BDS with the plain filename of its file will not be correct, so we may
not set bs->exact_filename in that case.
iotest 051 contains test cases for overriding the backing file, and so
its output changes with this patch applied.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:09 +03:00
|
|
|
if (bs->backing) {
|
|
|
|
return strcmp(bs->auto_backing_file,
|
|
|
|
bs->backing->bs->filename);
|
|
|
|
} else {
|
|
|
|
/* No backing BDS, so if the image header reports any backing
|
|
|
|
* file, it must have been suppressed */
|
|
|
|
return bs->auto_backing_file[0] != '\0';
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-07-18 22:24:56 +04:00
|
|
|
/* Updates the following BDS fields:
|
|
|
|
* - exact_filename: A filename which may be used for opening a block device
|
|
|
|
* which (mostly) equals the given BDS (even without any
|
|
|
|
* other options; so reading and writing must return the same
|
|
|
|
* results, but caching etc. may be different)
|
|
|
|
* - full_open_options: Options which, when given when opening a block device
|
|
|
|
* (without a filename), result in a BDS (mostly)
|
|
|
|
* equalling the given one
|
|
|
|
* - filename: If exact_filename is set, it is copied here. Otherwise,
|
|
|
|
* full_open_options is converted to a JSON object, prefixed with
|
|
|
|
* "json:" (for use through the JSON pseudo protocol) and put here.
|
|
|
|
*/
|
|
|
|
void bdrv_refresh_filename(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2019-02-01 22:29:06 +03:00
|
|
|
BdrvChild *child;
|
2019-06-12 18:43:03 +03:00
|
|
|
BlockDriverState *primary_child_bs;
|
2014-07-18 22:24:56 +04:00
|
|
|
QDict *opts;
|
block: Respect backing bs in bdrv_refresh_filename
Basically, bdrv_refresh_filename() should respect all children of a
BlockDriverState. However, generally those children are driver-specific,
so this function cannot handle the general case. On the other hand,
there are only few drivers which use other children than @file and
@backing (that being vmdk, quorum, and blkverify).
Most block drivers only use @file and/or @backing (if they use any
children at all). Both can be implemented directly in
bdrv_refresh_filename.
The user overriding the file's filename is already handled, however, the
user overriding the backing file is not. If this is done, opening the
BDS with the plain filename of its file will not be correct, so we may
not set bs->exact_filename in that case.
iotest 051 contains test cases for overriding the backing file, and so
its output changes with this patch applied.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:09 +03:00
|
|
|
bool backing_overridden;
|
2019-02-01 22:29:28 +03:00
|
|
|
bool generate_json_filename; /* Whether our default implementation should
|
|
|
|
fill exact_filename (false) or not (true) */
|
2014-07-18 22:24:56 +04:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2014-07-18 22:24:56 +04:00
|
|
|
if (!drv) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2019-02-01 22:29:06 +03:00
|
|
|
/* This BDS's file name may depend on any of its children's file names, so
|
|
|
|
* refresh those first */
|
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
|
|
|
bdrv_refresh_filename(child->bs);
|
2014-07-18 22:24:56 +04:00
|
|
|
}
|
|
|
|
|
2019-02-01 22:29:07 +03:00
|
|
|
if (bs->implicit) {
|
|
|
|
/* For implicit nodes, just copy everything from the single child */
|
|
|
|
child = QLIST_FIRST(&bs->children);
|
|
|
|
assert(QLIST_NEXT(child, next) == NULL);
|
|
|
|
|
|
|
|
pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
|
|
|
|
child->bs->exact_filename);
|
|
|
|
pstrcpy(bs->filename, sizeof(bs->filename), child->bs->filename);
|
|
|
|
|
2020-01-16 11:56:00 +03:00
|
|
|
qobject_unref(bs->full_open_options);
|
2019-02-01 22:29:07 +03:00
|
|
|
bs->full_open_options = qobject_ref(child->bs->full_open_options);
|
|
|
|
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
block: Respect backing bs in bdrv_refresh_filename
Basically, bdrv_refresh_filename() should respect all children of a
BlockDriverState. However, generally those children are driver-specific,
so this function cannot handle the general case. On the other hand,
there are only few drivers which use other children than @file and
@backing (that being vmdk, quorum, and blkverify).
Most block drivers only use @file and/or @backing (if they use any
children at all). Both can be implemented directly in
bdrv_refresh_filename.
The user overriding the file's filename is already handled, however, the
user overriding the backing file is not. If this is done, opening the
BDS with the plain filename of its file will not be correct, so we may
not set bs->exact_filename in that case.
iotest 051 contains test cases for overriding the backing file, and so
its output changes with this patch applied.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-01 22:29:09 +03:00
|
|
|
backing_overridden = bdrv_backing_overridden(bs);
|
|
|
|
|
|
|
|
if (bs->open_flags & BDRV_O_NO_IO) {
|
|
|
|
/* Without I/O, the backing file does not change anything.
|
|
|
|
* Therefore, in such a case (primarily qemu-img), we can
|
|
|
|
* pretend the backing file has not been overridden even if
|
|
|
|
* it technically has been. */
|
|
|
|
backing_overridden = false;
|
|
|
|
}
|
|
|
|
|
2019-02-01 22:29:27 +03:00
|
|
|
/* Gather the options QDict */
|
|
|
|
opts = qdict_new();
|
2019-02-01 22:29:28 +03:00
|
|
|
generate_json_filename = append_strong_runtime_options(opts, bs);
|
|
|
|
generate_json_filename |= backing_overridden;
|
2019-02-01 22:29:27 +03:00
|
|
|
|
|
|
|
if (drv->bdrv_gather_child_options) {
|
|
|
|
/* Some block drivers may not want to present all of their children's
|
|
|
|
* options, or name them differently from BdrvChild.name */
|
|
|
|
drv->bdrv_gather_child_options(bs, opts, backing_overridden);
|
|
|
|
} else {
|
|
|
|
QLIST_FOREACH(child, &bs->children, next) {
|
2020-05-13 14:05:33 +03:00
|
|
|
if (child == bs->backing && !backing_overridden) {
|
2019-02-01 22:29:27 +03:00
|
|
|
/* We can skip the backing BDS if it has not been overridden */
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
qdict_put(opts, child->name,
|
|
|
|
qobject_ref(child->bs->full_open_options));
|
|
|
|
}
|
|
|
|
|
|
|
|
if (backing_overridden && !bs->backing) {
|
|
|
|
/* Force no backing file */
|
|
|
|
qdict_put_null(opts, "backing");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
qobject_unref(bs->full_open_options);
|
|
|
|
bs->full_open_options = opts;
|
|
|
|
|
2019-06-12 18:43:03 +03:00
|
|
|
primary_child_bs = bdrv_primary_bs(bs);
|
|
|
|
|
2019-02-01 22:29:28 +03:00
|
|
|
if (drv->bdrv_refresh_filename) {
|
|
|
|
/* Obsolete information is of no use here, so drop the old file name
|
|
|
|
* information before refreshing it */
|
|
|
|
bs->exact_filename[0] = '\0';
|
|
|
|
|
|
|
|
drv->bdrv_refresh_filename(bs);
|
2019-06-12 18:43:03 +03:00
|
|
|
} else if (primary_child_bs) {
|
|
|
|
/*
|
|
|
|
* Try to reconstruct valid information from the underlying
|
|
|
|
* file -- this only works for format nodes (filter nodes
|
|
|
|
* cannot be probed and as such must be selected by the user
|
|
|
|
* either through an options dict, or through a special
|
|
|
|
* filename which the filter driver must construct in its
|
|
|
|
* .bdrv_refresh_filename() implementation).
|
|
|
|
*/
|
2019-02-01 22:29:28 +03:00
|
|
|
|
|
|
|
bs->exact_filename[0] = '\0';
|
|
|
|
|
2019-02-01 22:29:29 +03:00
|
|
|
/*
|
|
|
|
* We can use the underlying file's filename if:
|
|
|
|
* - it has a filename,
|
2019-06-12 18:43:03 +03:00
|
|
|
* - the current BDS is not a filter,
|
2019-02-01 22:29:29 +03:00
|
|
|
* - the file is a protocol BDS, and
|
|
|
|
* - opening that file (as this BDS's format) will automatically create
|
|
|
|
* the BDS tree we have right now, that is:
|
|
|
|
* - the user did not significantly change this BDS's behavior with
|
|
|
|
* some explicit (strong) options
|
|
|
|
* - no non-file child of this BDS has been overridden by the user
|
|
|
|
* Both of these conditions are represented by generate_json_filename.
|
|
|
|
*/
|
2019-06-12 18:43:03 +03:00
|
|
|
if (primary_child_bs->exact_filename[0] &&
|
|
|
|
primary_child_bs->drv->bdrv_file_open &&
|
|
|
|
!drv->is_filter && !generate_json_filename)
|
2019-02-01 22:29:29 +03:00
|
|
|
{
|
2019-06-12 18:43:03 +03:00
|
|
|
strcpy(bs->exact_filename, primary_child_bs->exact_filename);
|
2019-02-01 22:29:28 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-07-18 22:24:56 +04:00
|
|
|
if (bs->exact_filename[0]) {
|
|
|
|
pstrcpy(bs->filename, sizeof(bs->filename), bs->exact_filename);
|
2019-02-01 22:29:27 +03:00
|
|
|
} else {
|
2020-12-11 20:11:37 +03:00
|
|
|
GString *json = qobject_to_json(QOBJECT(bs->full_open_options));
|
2020-06-08 21:26:38 +03:00
|
|
|
if (snprintf(bs->filename, sizeof(bs->filename), "json:%s",
|
2020-12-11 20:11:37 +03:00
|
|
|
json->str) >= sizeof(bs->filename)) {
|
2020-06-08 21:26:38 +03:00
|
|
|
/* Give user a hint if we truncated things. */
|
|
|
|
strcpy(bs->filename + sizeof(bs->filename) - 4, "...");
|
|
|
|
}
|
2020-12-11 20:11:37 +03:00
|
|
|
g_string_free(json, true);
|
2014-07-18 22:24:56 +04:00
|
|
|
}
|
|
|
|
}
|
2016-05-10 10:36:37 +03:00
|
|
|
|
2019-02-01 22:29:18 +03:00
|
|
|
char *bdrv_dirname(BlockDriverState *bs, Error **errp)
|
|
|
|
{
|
|
|
|
BlockDriver *drv = bs->drv;
|
2019-06-12 18:43:03 +03:00
|
|
|
BlockDriverState *child_bs;
|
2019-02-01 22:29:18 +03:00
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
|
|
|
|
2019-02-01 22:29:18 +03:00
|
|
|
if (!drv) {
|
|
|
|
error_setg(errp, "Node '%s' is ejected", bs->node_name);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (drv->bdrv_dirname) {
|
|
|
|
return drv->bdrv_dirname(bs, errp);
|
|
|
|
}
|
|
|
|
|
2019-06-12 18:43:03 +03:00
|
|
|
child_bs = bdrv_primary_bs(bs);
|
|
|
|
if (child_bs) {
|
|
|
|
return bdrv_dirname(child_bs, errp);
|
2019-02-01 22:29:18 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
bdrv_refresh_filename(bs);
|
|
|
|
if (bs->exact_filename[0] != '\0') {
|
|
|
|
return path_combine(bs->exact_filename, "");
|
|
|
|
}
|
|
|
|
|
|
|
|
error_setg(errp, "Cannot generate a base directory for %s nodes",
|
|
|
|
drv->format_name);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2016-05-10 10:36:37 +03:00
|
|
|
/*
|
|
|
|
* Hot add/remove a BDS's child. So the user can take a child offline when
|
|
|
|
* it is broken and take a new child online
|
|
|
|
*/
|
|
|
|
void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
|
|
|
|
Error **errp)
|
|
|
|
{
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2016-05-10 10:36:37 +03:00
|
|
|
if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
|
|
|
|
error_setg(errp, "The node %s does not support adding a child",
|
|
|
|
bdrv_get_device_or_node_name(parent_bs));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2023-05-08 07:55:30 +03:00
|
|
|
/*
|
|
|
|
* Non-zoned block drivers do not follow zoned storage constraints
|
|
|
|
* (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
|
|
|
|
* drivers in a graph.
|
|
|
|
*/
|
|
|
|
if (!parent_bs->drv->supports_zoned_children &&
|
|
|
|
child_bs->bl.zoned == BLK_Z_HM) {
|
|
|
|
/*
|
|
|
|
* The host-aware model allows zoned storage constraints and random
|
|
|
|
* write. Allow mixing host-aware and non-zoned drivers. Using
|
|
|
|
* host-aware device as a regular device.
|
|
|
|
*/
|
|
|
|
error_setg(errp, "Cannot add a %s child to a %s parent",
|
|
|
|
child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
|
|
|
|
parent_bs->drv->supports_zoned_children ?
|
|
|
|
"support zoned children" : "not support zoned children");
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2016-05-10 10:36:37 +03:00
|
|
|
if (!QLIST_EMPTY(&child_bs->parents)) {
|
|
|
|
error_setg(errp, "The node %s already has a parent",
|
|
|
|
child_bs->node_name);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
|
|
|
|
}
|
|
|
|
|
|
|
|
void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error **errp)
|
|
|
|
{
|
|
|
|
BdrvChild *tmp;
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2016-05-10 10:36:37 +03:00
|
|
|
if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
|
|
|
|
error_setg(errp, "The node %s does not support removing a child",
|
|
|
|
bdrv_get_device_or_node_name(parent_bs));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
QLIST_FOREACH(tmp, &parent_bs->children, next) {
|
|
|
|
if (tmp == child) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!tmp) {
|
|
|
|
error_setg(errp, "The node %s does not have a child named %s",
|
|
|
|
bdrv_get_device_or_node_name(parent_bs),
|
|
|
|
bdrv_get_device_or_node_name(child->bs));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
|
|
|
|
}
|
2020-04-29 17:11:23 +03:00
|
|
|
|
|
|
|
int bdrv_make_empty(BdrvChild *c, Error **errp)
|
|
|
|
{
|
|
|
|
BlockDriver *drv = c->bs->drv;
|
|
|
|
int ret;
|
|
|
|
|
2022-03-03 18:15:49 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2020-04-29 17:11:23 +03:00
|
|
|
assert(c->perm & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED));
|
|
|
|
|
|
|
|
if (!drv->bdrv_make_empty) {
|
|
|
|
error_setg(errp, "%s does not support emptying nodes",
|
|
|
|
drv->format_name);
|
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = drv->bdrv_make_empty(c->bs);
|
|
|
|
if (ret < 0) {
|
|
|
|
error_setg_errno(errp, -ret, "Failed to empty %s",
|
|
|
|
c->bs->filename);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2019-05-31 16:23:11 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the child that @bs acts as an overlay for, and from which data may be
|
|
|
|
* copied in COW or COR operations. Usually this is the backing file.
|
|
|
|
*/
|
|
|
|
BdrvChild *bdrv_cow_child(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
|
|
|
|
2019-05-31 16:23:11 +03:00
|
|
|
if (!bs || !bs->drv) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (bs->drv->is_filter) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!bs->backing) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(bs->backing->role & BDRV_CHILD_COW);
|
|
|
|
return bs->backing;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If @bs acts as a filter for exactly one of its children, return
|
|
|
|
* that child.
|
|
|
|
*/
|
|
|
|
BdrvChild *bdrv_filter_child(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
BdrvChild *c;
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
2019-05-31 16:23:11 +03:00
|
|
|
|
|
|
|
if (!bs || !bs->drv) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!bs->drv->is_filter) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Only one of @backing or @file may be used */
|
|
|
|
assert(!(bs->backing && bs->file));
|
|
|
|
|
|
|
|
c = bs->backing ?: bs->file;
|
|
|
|
if (!c) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(c->role & BDRV_CHILD_FILTERED);
|
|
|
|
return c;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return either the result of bdrv_cow_child() or bdrv_filter_child(),
|
|
|
|
* whichever is non-NULL.
|
|
|
|
*
|
|
|
|
* Return NULL if both are NULL.
|
|
|
|
*/
|
|
|
|
BdrvChild *bdrv_filter_or_cow_child(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
BdrvChild *cow_child = bdrv_cow_child(bs);
|
|
|
|
BdrvChild *filter_child = bdrv_filter_child(bs);
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
2019-05-31 16:23:11 +03:00
|
|
|
|
|
|
|
/* Filter nodes cannot have COW backing files */
|
|
|
|
assert(!(cow_child && filter_child));
|
|
|
|
|
|
|
|
return cow_child ?: filter_child;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the primary child of this node: For filters, that is the
|
|
|
|
* filtered child. For other nodes, that is usually the child storing
|
|
|
|
* metadata.
|
|
|
|
* (A generally more helpful description is that this is (usually) the
|
|
|
|
* child that has the same filename as @bs.)
|
|
|
|
*
|
|
|
|
* Drivers do not necessarily have a primary child; for example quorum
|
|
|
|
* does not.
|
|
|
|
*/
|
|
|
|
BdrvChild *bdrv_primary_child(BlockDriverState *bs)
|
|
|
|
{
|
|
|
|
BdrvChild *c, *found = NULL;
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
2019-05-31 16:23:11 +03:00
|
|
|
|
|
|
|
QLIST_FOREACH(c, &bs->children, next) {
|
|
|
|
if (c->role & BDRV_CHILD_PRIMARY) {
|
|
|
|
assert(!found);
|
|
|
|
found = c;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return found;
|
|
|
|
}
|
2019-06-12 16:06:37 +03:00
|
|
|
|
|
|
|
static BlockDriverState *bdrv_do_skip_filters(BlockDriverState *bs,
|
|
|
|
bool stop_on_explicit_filter)
|
|
|
|
{
|
|
|
|
BdrvChild *c;
|
|
|
|
|
|
|
|
if (!bs) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (!(stop_on_explicit_filter && !bs->implicit)) {
|
|
|
|
c = bdrv_filter_child(bs);
|
|
|
|
if (!c) {
|
|
|
|
/*
|
|
|
|
* A filter that is embedded in a working block graph must
|
|
|
|
* have a child. Assert this here so this function does
|
|
|
|
* not return a filter node that is not expected by the
|
|
|
|
* caller.
|
|
|
|
*/
|
|
|
|
assert(!bs->drv || !bs->drv->is_filter);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
bs = c->bs;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* Note that this treats nodes with bs->drv == NULL as not being
|
|
|
|
* filters (bs->drv == NULL should be replaced by something else
|
|
|
|
* anyway).
|
|
|
|
* The advantage of this behavior is that this function will thus
|
|
|
|
* always return a non-NULL value (given a non-NULL @bs).
|
|
|
|
*/
|
|
|
|
|
|
|
|
return bs;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the first BDS that has not been added implicitly or that
|
|
|
|
* does not have a filtered child down the chain starting from @bs
|
|
|
|
* (including @bs itself).
|
|
|
|
*/
|
|
|
|
BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:57 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2019-06-12 16:06:37 +03:00
|
|
|
return bdrv_do_skip_filters(bs, true);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the first BDS that does not have a filtered child down the
|
|
|
|
* chain starting from @bs (including @bs itself).
|
|
|
|
*/
|
|
|
|
BlockDriverState *bdrv_skip_filters(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
2019-06-12 16:06:37 +03:00
|
|
|
return bdrv_do_skip_filters(bs, false);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For a backing chain, return the first non-filter backing image of
|
|
|
|
* the first non-filter image.
|
|
|
|
*/
|
|
|
|
BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
|
|
|
|
{
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
2019-06-12 16:06:37 +03:00
|
|
|
return bdrv_skip_filters(bdrv_cow_bs(bdrv_skip_filters(bs)));
|
|
|
|
}
|
block: block-status cache for data regions
As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform. The
main difference is that this time it is implemented as part of the
general block layer code.
The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the
implementation is outside of qemu, there is little we can do about its
performance.
We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts. So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.
To address this, we want to cache data regions. Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.
(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine. While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)
We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-08-12 11:41:44 +03:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Check whether [offset, offset + bytes) overlaps with the cached
|
|
|
|
* block-status data region.
|
|
|
|
*
|
|
|
|
* If so, and @pnum is not NULL, set *pnum to `bsc.data_end - offset`,
|
|
|
|
* which is what bdrv_bsc_is_data()'s interface needs.
|
|
|
|
* Otherwise, *pnum is not touched.
|
|
|
|
*/
|
|
|
|
static bool bdrv_bsc_range_overlaps_locked(BlockDriverState *bs,
|
|
|
|
int64_t offset, int64_t bytes,
|
|
|
|
int64_t *pnum)
|
|
|
|
{
|
|
|
|
BdrvBlockStatusCache *bsc = qatomic_rcu_read(&bs->block_status_cache);
|
|
|
|
bool overlaps;
|
|
|
|
|
|
|
|
overlaps =
|
|
|
|
qatomic_read(&bsc->valid) &&
|
|
|
|
ranges_overlap(offset, bytes, bsc->data_start,
|
|
|
|
bsc->data_end - bsc->data_start);
|
|
|
|
|
|
|
|
if (overlaps && pnum) {
|
|
|
|
*pnum = bsc->data_end - offset;
|
|
|
|
}
|
|
|
|
|
|
|
|
return overlaps;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* See block_int.h for this function's documentation.
|
|
|
|
*/
|
|
|
|
bool bdrv_bsc_is_data(BlockDriverState *bs, int64_t offset, int64_t *pnum)
|
|
|
|
{
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
block: block-status cache for data regions
As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform. The
main difference is that this time it is implemented as part of the
general block layer code.
The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the
implementation is outside of qemu, there is little we can do about its
performance.
We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts. So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.
To address this, we want to cache data regions. Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.
(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine. While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)
We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-08-12 11:41:44 +03:00
|
|
|
RCU_READ_LOCK_GUARD();
|
|
|
|
return bdrv_bsc_range_overlaps_locked(bs, offset, 1, pnum);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* See block_int.h for this function's documentation.
|
|
|
|
*/
|
|
|
|
void bdrv_bsc_invalidate_range(BlockDriverState *bs,
|
|
|
|
int64_t offset, int64_t bytes)
|
|
|
|
{
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
block: block-status cache for data regions
As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform. The
main difference is that this time it is implemented as part of the
general block layer code.
The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the
implementation is outside of qemu, there is little we can do about its
performance.
We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts. So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.
To address this, we want to cache data regions. Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.
(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine. While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)
We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-08-12 11:41:44 +03:00
|
|
|
RCU_READ_LOCK_GUARD();
|
|
|
|
|
|
|
|
if (bdrv_bsc_range_overlaps_locked(bs, offset, bytes, NULL)) {
|
|
|
|
qatomic_set(&bs->block_status_cache->valid, false);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* See block_int.h for this function's documentation.
|
|
|
|
*/
|
|
|
|
void bdrv_bsc_fill(BlockDriverState *bs, int64_t offset, int64_t bytes)
|
|
|
|
{
|
|
|
|
BdrvBlockStatusCache *new_bsc = g_new(BdrvBlockStatusCache, 1);
|
|
|
|
BdrvBlockStatusCache *old_bsc;
|
2022-03-03 18:15:58 +03:00
|
|
|
IO_CODE();
|
block: block-status cache for data regions
As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform. The
main difference is that this time it is implemented as part of the
general block layer code.
The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the
implementation is outside of qemu, there is little we can do about its
performance.
We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts. So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.
To address this, we want to cache data regions. Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.
(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine. While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)
We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-08-12 11:41:44 +03:00
|
|
|
|
|
|
|
*new_bsc = (BdrvBlockStatusCache) {
|
|
|
|
.valid = true,
|
|
|
|
.data_start = offset,
|
|
|
|
.data_end = offset + bytes,
|
|
|
|
};
|
|
|
|
|
|
|
|
QEMU_LOCK_GUARD(&bs->bsc_modify_lock);
|
|
|
|
|
|
|
|
old_bsc = qatomic_rcu_read(&bs->block_status_cache);
|
|
|
|
qatomic_rcu_set(&bs->block_status_cache, new_bsc);
|
|
|
|
if (old_bsc) {
|
|
|
|
g_free_rcu(old_bsc, rcu);
|
|
|
|
}
|
|
|
|
}
|