qemu_rbd_open() takes option parameters as a flattened QDict, with
keys of the form server.%d.host, server.%d.port, where %d counts up
from zero.
qemu_rbd_array_opts() extracts these values as follows. First, it
calls qdict_array_entries() to find the list's length. For each list
element, it formats the list's key prefix (e.g. "server.0."), then
creates a new QDict holding the options with that key prefix, then
converts that to a QemuOpts, so it can finally get the member values
from there.
If there's one surefire way to make code using QDict more awkward,
it's creating more of them and mixing in QemuOpts for good measure.
The extraction of keys starting with server.%d into another QDict
makes us ignore parameters like server.0.neither-host-nor-port
silently.
The conversion to QemuOpts abuses runtime_opts, as described a few
commits ago.
Rewrite to simply get the values straight from the options QDict.
Fixes -drive not to crash when server.*.* are present, but
server.*.host is absent.
Fixes -drive to reject invalid server.*.*.
Permits cleaning up runtime_opts. Do that, and fix -drive to reject
bogus parameters host and port instead of silently ignoring them.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-11-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This reverts half of commit 0a55679. We're having second thoughts on
the QAPI schema (and thus the external interface), and haven't reached
consensus, yet. Issues include:
* The implementation uses deprecated rados_conf_set() key
"auth_supported". No biggie.
* The implementation makes -drive silently ignore invalid parameters
"auth" and "auth-supported.*.X" where X isn't "auth". Fixable (in
fact I'm going to fix similar bugs around parameter server), so
again no biggie.
* BlockdevOptionsRbd member @password-secret applies only to
authentication method cephx. Should it be a variant member of
RbdAuthMethod?
* BlockdevOptionsRbd member @user could apply to both methods cephx
and none, but I'm not sure it's actually used with none. If it
isn't, should it be a variant member of RbdAuthMethod?
* The client offers a *set* of authentication methods, not a list.
Should the methods be optional members of BlockdevOptionsRbd instead
of members of list @auth-supported? The latter begs the question
what multiple entries for the same method mean. Trivial question
now that RbdAuthMethod contains nothing but @type, but less so when
RbdAuthMethod acquires other members, such the ones discussed above.
* How BlockdevOptionsRbd member @auth-supported interacts with
settings from a configuration file specified with @conf is
undocumented. I suspect it's untested, too.
Let's avoid painting ourselves into a corner now, and revert the
feature for 2.9.
Note that users can still configure authentication methods with a
configuration file. They probably do that anyway if they use Ceph
outside QEMU as well.
Further note that this doesn't affect use of key "auth-supported" in
-drive file=rbd:...:key=value.
qemu_rbd_array_opts()'s parameter @type now must be RBD_MON_HOST,
which is silly. This will be cleaned up shortly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-9-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
The conversion from QDict to QemuOpts is pointless. Simply get the
stuff straight from the QDict.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-8-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
runtime_opts is used for three different purposes:
* qemu_rbd_open() uses it to accept options it recognizes, such as
"pool" and "image". Other .bdrv_open() methods do it similarly.
* qemu_rbd_open() accepts additional list-valued options
auth-supported and server, with the help of qemu_rbd_array_opts().
The list elements are again dictionaries. qemu_rbd_array_opts()
uses runtime_opts to accept their members. Thus, runtime_opts
contains recognized sub-sub-options "auth", "host", "port" in
addition to recognized options. No other block driver does that.
* qemu_rbd_create() uses it to convert the QDict produced by
qemu_rbd_parse_filename() to QemuOpts. No other block driver does
that. The keys produced by qemu_rbd_parse_filename() are "pool",
"image", "snapshot", "conf", "user" and "keyvalue-pairs".
qemu_rbd_open() accepts these, so no additional ones here.
This is a confusing mess. Dates back to commit 0f9d252. First step
to clean it up is documenting runtime_opts.desc[]:
* Reorder entries to match the QAPI schema, like we do in other block
drivers.
* Document why the schema's "server" and "auth-supported" aren't in
.desc[].
* Document why "keyvalue-pairs", "host", "port" and "auth" are in
.desc[], but not the schema.
* Delete "filename", because none of the three users actually uses it.
This fixes -drive to reject parameter filename instead of silently
ignoring it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-7-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
The way we communicate extra key-value pairs from
qemu_rbd_parse_filename() to qemu_rbd_open() exposes option parameter
"keyvalue-pairs" on the command line. It's not wanted there. Hack:
rename the parameter to "=keyvalue-pairs" to make it inaccessible.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-6-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This code in qemu_rbd_parse_filename()
found_str = qemu_rbd_next_tok(p, '\0', &p);
p = found_str;
has no effect. Drop it, and simplify qemu_rbd_next_tok().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-5-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
We laboriously enforce that parameter values are between one and some
arbitrary limit in length. Only RBD_MAX_IMAGE_NAME_SIZE comes from
librbd.h, and I'm not sure it applies. Where the other limits come
from is unclear.
Drop the length checking. The limits librbd actually imposes must be
checked by librbd anyway.
There's one minor complication: BDRVRBDState member name is a
fixed-size array. Depends on the length limit. Make it a pointer to
a dynamically allocated string.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-4-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
qemu_rbd_open() neglects to check pool and image are present. Missing
image is caught by rbd_open(), but missing pool crashes. Reproducer:
$ qemu-system-x86_64 -nodefaults -drive driver=rbd,id=rbd,image=i,...
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_M_construct null not valid
Aborted (core dumped)
where ... is a working server.0.{host,port} configuration.
Doesn't affect -drive with file=..., because qemu_rbd_parse_filename()
always sets both pool and image.
Doesn't affect -blockdev, because pool and image are mandatory in the
QAPI schema.
Fix by adding the missing checks.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-3-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This adds support for three additional options that may be specified
by QAPI in blockdev-add:
server: host, port
auth method: either 'cephx' or 'none'
The "server" and "auth-supported" QAPI parameters are arrays. To conform
with the rados API, the array items are join as a single string with a ';'
character as a delimiter when setting the configuration values.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Get rid of qemu_rbd_parsename in favor of bdrv_parse_filename.
This simplifies a lot of the parsing as well, as we can treat everything
a bit simpler since nonexistent options are simply NULL pointers instead
of empty strings.
An important item to note:
Ceph has many extra option values that can be specified as key/value
pairs. This was handled previously in the driver by extracting the
values that the QEMU driver cared about, and then blindly passing all
extra options to rbd after splitting them into key/value pairs, and
cleaning up any special character escaping.
The practice is continued in this patch; there is an option
"keyvalue-pairs" that is populated with all the key/value pairs that the
QEMU driver does not care about. These key/value pairs will override
any settings in the 'conf' configuration file, just as they did before.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
This adds all the currently supported runtime opts, which
are the options as parsed from the filename. All of these
options are explicitly checked for during during runtime,
with an exception to the "keyvalue-pairs" option.
This option contains all the key/value pairs that the QEMU rbd
driver merely unescapes, and passes along blindly to rados. This
option is a "legacy" option, and will not be exposed in the QAPI
or available for introspection.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
This patch is prep work for parsing options for .bdrv_parse_filename,
and using QDict options.
The function qemu_rbd_next_tok() searched for various key/value pairs,
and copied them into buffers. This will soon be an unnecessary extra
step, so we will now return found strings by reference only, and
offload the responsibility for safely handling/coping these strings to
the caller.
This also cleans up error handling some, as the callers now rely on
the Error object to determine if there is a parse error.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Rbd can do readv and writev directly, so wo do not need to transform
iov to buf or vice versa any more.
Signed-off-by: tianqing <tianqing@unitedstack.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Make it a bit clearer and more readable.
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1476519973-6436-1-git-send-email-lixiubo@cmss.chinamobile.com
CC: John Snow <jsnow@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Otherwise, reads of more than 2GB fail. Until commit
7bbca9e290, reads of 2^41
bytes succeeded at least theoretically.
In fact, pdiscard ought to receive a 64-bit integer as the
count for the same reason.
Reported by Coverity.
Fixes: 7bbca9e290
Cc: qemu-stable@nongnu.org
Cc: kwolf@redhat.com
Cc: eblake@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This simplifies bottom half handlers by removing calls to qemu_bh_delete and
thus removing the need to stash the bottom half pointer in the opaque
datum.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Another step towards byte-based interfaces everywhere. Replace
the sector-based driver callback .bdrv_aio_discard() with a new
byte-based .bdrv_aio_pdiscard(). Only raw-posix and RBD drivers
are affected, so it was not worth splitting into multiple patches.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-9-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The internal function converts to byte-based before calling into
RBD code; hoist the conversion to the callers so that callers
can then be switched to byte-based themselves.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-8-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Use Coccinelle script to replace 'ret = E; return ret' with
'return E'. The script will do the substitution only when the
function return type and variable type are the same.
Manual fixups:
* audio/audio.c: coding style of "read (...)" and "write (...)"
* block/qcow2-cluster.c: wrap line to make it shorter
* block/qcow2-refcount.c: change indentation of wrapped line
* target-tricore/op_helper.c: fix coding style of
"remainder|quotient"
* target-mips/dsp_helper.c: reverted changes because I don't
want to argue about checkpatch.pl
* ui/qemu-pixman.c: fix line indentation
* block/rbd.c: restore blank line between declarations and
statements
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-4-git-send-email-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Unused Coccinelle rule name dropped along with a redundant comment;
whitespace touched up in block/qcow2-cluster.c; stale commit message
paragraph deleted]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Ceph RBD block driver does not use error_setg_errno() where
it is possible to use. This patch replaces error_setg()
from error_setg_errno().
Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
Message-id: 1462780319-5796-1-git-send-email-vumrao@redhat.com
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)
Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef. Since then, we've moved to include qemu/osdep.h
everywhere. Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h. That's in excess of
100KiB of crap most .c files don't actually need.
Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h. Include qapi/error.h in .c files that need it and don't
get it now. Include qapi-types.h in qom/object.h for uint16List.
Update scripts/clean-includes accordingly. Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h
comment quoted above similarly.
This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third. Unfortunately, the number depending on
qapi-types.h shrinks only a little. More work is needed for that one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently RBD passwords must be provided on the command line
via
$QEMU -drive file=rbd:pool/image:id=myname:\
key=QVFDVm41aE82SHpGQWhBQXEwTkN2OGp0SmNJY0UrSE9CbE1RMUE=:\
auth_supported=cephx
This is insecure because the key is visible in the OS process
listing.
This adds support for an 'password-secret' parameter in the RBD
parameters that can be used with the QCryptoSecret object to
provide the password via a file:
echo "QVFDVm41aE82SHpGQWhBQXEwTkN2OGp0SmNJY0UrSE9CbE1RMUE=" > poolkey.b64
$QEMU -object secret,id=secret0,file=poolkey.b64,format=base64 \
-drive driver=rbd,filename=rbd:pool/image:id=myname:\
auth_supported=cephx,password-secret=secret0
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-2-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Apply the ceph settings from a config file before any ceph settings
from the command line. Since the ceph config file location may be
specified on the command line, parse it once to read the config file,
and do a second pass to apply the rest of the command line ceph
options.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To be safe, when cache=none is used ceph settings should not be able
to override it to turn on caching. This was previously possible with
rbd_cache=true in the rbd device configuration or a ceph configuration
file. Similarly, rbd settings could have turned off caching when qemu
requested it, although this would just be a performance problem.
Fix this by changing rbd's cache setting to match qemu after all other
ceph settings have been applied.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
RBDAIOCB.status was only used for cancel, which was removed in
7691e24dbe.
RBDAIOCB.sector_num was never used.
RADOSCB.done and rcbid were never used.
RBD_FD* are obsolete since the pipe was removed in
e04fb07fd1.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This commit was generated mechanically by coccinelle from the following
semantic patch:
@@
expression val;
@@
- (ffs(val) - 1)
+ ctz32(val)
The call sites have been audited to ensure the ffs(0) - 1 == -1 case
never occurs (due to input validation, asserts, etc). Therefore we
don't need to worry about the fact that ctz32(0) == 32.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1427124571-28598-5-git-send-email-stefanha@redhat.com
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Variable local_err going out of scope
leaks the storage it points to.
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Message-id: 1417674851-6248-1-git-send-email-arei.gonglei@huawei.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This fixes Ceph issue 2467: ttp://tracker.ceph.com/issues/2467
[Dropped return r in void function as suggested by Josh Durgin
<josh.durgin@inktank.com>.
--Stefan]
Signed-off-by: Adam Crume <adamcrume@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1412880272-3154-1-git-send-email-adamcrume@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
I'll use it with block backends shortly, and the name is going to fit
badly there. It's a block layer thing anyway, not just a block driver
thing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I'll use BlockDriverAIOCB with block backends shortly, and the name is
going to fit badly there. It's a block layer thing anyway, not just a
block driver thing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently the file size requested by user is rounded down to nearest
sector, causing the actual file size could be a bit less than the size
user requested. Since some formats (like qcow2) record virtual disk
size in bytes, this can make the last few bytes cannot be accessed.
This patch fixes it by rounding up file size to nearest sector so that
the actual file size is no less than the requested file size.
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
g_new(T, n) is safer than g_malloc(sizeof(*v) * n) for two reasons.
One, it catches multiplication overflowing size_t. Two, it returns
T * rather than void *, which lets the compiler catch more type
errors.
Perhaps a conversion to g_malloc_n() would be neater in places, but
that's merely four years old, and we can't use such newfangled stuff.
This commit only touches allocations with size arguments of the form
sizeof(T), plus two that use 4 instead of sizeof(uint32_t). We can
make the others safe by converting to g_malloc_n() when it becomes
available to us in a couple of years.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
Patch created with Coccinelle, with two manual changes on top:
* Add const to bdrv_iterate_format() to keep the types straight
* Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
inexplicably misses
Coccinelle semantic patch:
@@
type T;
@@
-g_malloc(sizeof(T))
+g_new(T, 1)
@@
type T;
@@
-g_try_malloc(sizeof(T))
+g_try_new(T, 1)
@@
type T;
@@
-g_malloc0(sizeof(T))
+g_new0(T, 1)
@@
type T;
@@
-g_try_malloc0(sizeof(T))
+g_try_new0(T, 1)
@@
type T;
expression n;
@@
-g_malloc(sizeof(T) * (n))
+g_new(T, n)
@@
type T;
expression n;
@@
-g_try_malloc(sizeof(T) * (n))
+g_try_new(T, n)
@@
type T;
expression n;
@@
-g_malloc0(sizeof(T) * (n))
+g_new0(T, n)
@@
type T;
expression n;
@@
-g_try_malloc0(sizeof(T) * (n))
+g_try_new0(T, n)
@@
type T;
expression p, n;
@@
-g_realloc(p, sizeof(T) * (n))
+g_renew(T, p, n)
@@
type T;
expression p, n;
@@
-g_try_realloc(p, sizeof(T) * (n))
+g_try_renew(T, p, n)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the rbd block driver.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that all backend drivers are using QemuOpts, remove all
QEMUOptionParameter related codes.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Drop the assumption that we're using the main AioContext. Convert
qemu_bh_new() to aio_bh_new() and qemu_aio_wait() to aio_poll().
The .bdrv_detach_aio_context() and .bdrv_attach_aio_context() interfaces
are not needed since no fd handlers, timers, or BHs stay registered when
requests have been drained.
Cc: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Completes the conversion to Error started in commit 015a103^..d5124c0.
Cc: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
error_is_set(&var) is the same as var != NULL, but it takes
whole-program analysis to figure that out. Unnecessarily hard for
optimizers, static checkers, and human readers. Dumb it down to
obvious.
Gets rid of several dozen Coverity false positives.
Note that the obvious form is already used in many places.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
rbd callbacks are called from non-QEMU threads. Up until now a pipe was
used to signal completion back to the QEMU iothread.
The pipe writer code handles EAGAIN using select(2). The select(2) API
is not scalable since fd_set size is static. FD_SET() can write beyond
the end of fd_set if the file descriptor number is too high. (QEMU's
main loop uses poll(2) to avoid this issue with select(2).)
Since the pipe itself is quite clumsy to use and QEMUBH is now
thread-safe, just schedule a BH from the rbd callback function. This
way we can simplify I/O completion in addition to eliminating the
potential FD_SET() crash when file descriptor numbers become too high.
Crash scenario: QEMU already has 1024 file descriptors open. Hotplug an
rbd drive and get the pipe writer to take the select(2) code path.
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Tested-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a boiler-plate _nofail variant of qemu_opts_create. Remove and
use error_abort in call sites.
null/0 arguments needs to be added for the id and fail_if_exists fields
in affected callsites due to argument inconsistency between the normal and
no_fail variants.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
When there are no snapshots qemu_rbd_snap_list() returns 0 and the
snapshot table pointer is NULL. Don't forget to free the snaps buffer
we allocated for librbd rbd_snap_list().
When the function succeeds don't forget to free the snaps buffer after
calling rbd_snap_list_end().
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some drivers will have driver specifics options but no filename.
This new bool allow the block layer to treat them correctly.
The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and
not having .bdrv_open.
The first exception to this rule will be the quorum driver.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add an Error ** parameter to BlockDriver.bdrv_open and
BlockDriver.bdrv_file_open to allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Snapshot creation actually already distinguish id and name since it take
a structured parameter *sn, but delete can't. Later an accurate delete
is needed in qmp_transaction abort and blockdev-snapshot-delete-sync,
so change its prototype. Also *errp is added to tip error, but return
value is kepted to let caller check what kind of error happens. Existing
caller for it are savevm, delvm and qemu-img, they are not impacted by
introducing a new function bdrv_snapshot_delete_by_id_or_name(), which
check the return value and do the operation again.
Before this patch:
For qcow2, it search id first then name to find the one to delete.
For rbd, it search name.
For sheepdog, it does nothing.
After this patch:
For qcow2, logic is the same by call it twice in caller.
For rbd, it always fails in delete with id, but still search for name
in second try, no change to user.
Some code for *errp is based on Pavel's patch.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The .io_flush() handler no longer exists and has no users. Drop the
io_flush argument to aio_set_fd_handler() and related functions.
The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no
longer used and are dropped too.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
.io_flush() is no longer called so drop qemu_rbd_aio_flush_cb().
qemu_aio_count is unused now so drop it too.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
.has_zero_init defaults to 1 for all formats and protocols.
this is a dangerous default since this means that all
new added drivers need to manually overwrite it to 0 if
they do not ensure that a device is zero initialized
after bdrv_create().
if a driver needs to explicitly set this value to
1 its easier to verify the correctness in the review process.
during review of the existing drivers it turned out
that ssh and gluster had a wrong default of 1.
both protocols support host_devices as backend
which are not by default zero initialized. this
wrong assumption will lead to possible corruption
if qemu-img convert is used to write to such a backend.
vpc and vmdk also defaulted to 1 altough they support
fixed respectively flat extends. this has to be addresses
in separate patches. both formats as well as the mentioned
ssh and gluster are turned to the default of 0 with this
patch for safety.
a similar problem with the wrong default existed for
iscsi most likely because the driver developer did
oversee the default value of 1.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit a9ccedc3 frees the QemuOpts for the driver-specific options
immediately, even though it still needs the filename string that is
contained there. This doesn't work. Move the deletion of the QemuOpts to
the end of the function where its content isn't needed any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is only to convert the internal interface that is used for passing
the "filename" to be parsed, but converting to actual fine grained
options is left for another day, as it doesn't look trivial.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The existing bdrv_co_flush_to_disk implementation uses rbd_flush(),
which is sychronous and causes the main qemu thread to block until it
is complete. This results in unresponsiveness and extra latency for
the guest.
Fix this by using an asynchronous version of flush. This was added to
librbd with a special #define to indicate its presence, since it will
be backported to stable versions. Thus, there is no need to check the
version of librbd.
Implement this as bdrv_aio_flush, since it matches other aio functions
in the rbd block driver, and leave out bdrv_co_flush_to_disk when the
asynchronous version is available.
Reported-by: Oliver Francke <oliver@filoo.de>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Commit 787e4a85 [block: Add options QDict to bdrv_file_open() prototypes] didn't
update rbd.c accordingly.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This one fixes a race which qemu had also in iscsi block driver
between cancellation and io completition.
qemu_rbd_aio_cancel was not synchronously waiting for the end of
the command.
To archieve this it introduces a new status flag which uses
-EINPROGRESS.
Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
rbd / rados tends to return pretty often length of writes
or discarded blocks. These values might be bigger than int.
The steps to reproduce are:
mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends
a discard. Important is that you use scsi-hd and set
discard_granularity=512. Otherwise rbd disabled discard support.
Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that AIOPool no longer keeps a freelist, it isn't really a "pool"
anymore. Rename it to AIOCBInfo and make it const since it no longer
needs to be modified.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block drivers should ignore BDRV_O_CACHE_WB in .bdrv_open flags,
and in the bs->open_flags.
This patch removes the code, leaving the behaviour behind as if
BDRV_O_CACHE_WB was set.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* mjt/mjt-iov2:
rewrite iov_send_recv() and move it to iov.c
cleanup qemu_co_sendv(), qemu_co_recvv() and friends
export iov_send_recv() and use it in iov_send() and iov_recv()
rename qemu_sendv to iov_send, change proto and move declarations to iov.h
change qemu_iovec_to_buf() to match other to,from_buf functions
consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent
allow qemu_iovec_from_buffer() to specify offset from which to start copying
consolidate qemu_iovec_memset{,_skip}() into single function and use existing iov_memset()
rewrite iov_* functions
change iov_* function prototypes to be more appropriate
virtio-serial-bus: use correct lengths in control_out() message
Conflicts:
tests/Makefile
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Writeback caching was added in Ceph 0.46, and writethrough will be in
0.47. These are controlled by general config options, so there's no
need to check for librbd version.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It now allows specifying offset within qiov to start from and
amount of bytes to copy. Actual implementation is just a call
to iov_to_buf().
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Similar to
qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
int c, size_t bytes);
the new prototype is:
qemu_iovec_from_buf(QEMUIOVector *qiov, size_t offset,
const void *buf, size_t bytes);
The processing starts at offset bytes within qiov.
This way, we may copy a bounce buffer directly to
a middle of qiov.
This is exactly the same function as iov_from_buf() from
iov.c, so use the existing implementation and rename it
to qemu_iovec_from_buf() to be shorter and to match the
utility function.
As with utility implementation, we now assert that the
offset is inside actual iovec. Nothing changed for
current callers, because `offset' parameter is new.
While at it, stop using "bounce-qiov" in block/qcow2.c
and copy decrypted data directly from cluster_data
instead of recreating a temp qiov for doing that.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Change the write flag to an operation type in RBDAIOCB, and make the
buffer optional since discard doesn't use it.
Discard is first included in librbd 0.1.2 (which is in Ceph 0.46).
If librbd is too old, leave out qemu_rbd_aio_discard entirely,
so the old behavior is preserved.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All files under GPLv2 will get GPLv2+ changes starting tomorrow.
event_notifier.c and exec-obsolete.h were only ever touched by Red Hat
employees and can be relicensed now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The caller expects psn_tab to be NULL when there are no snapshots or
an error occurs. This results in calling g_free on an invalid address.
Reported-by: Oliver Francke <Oliver@filoo.de>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Initially done with the following semantic patch:
@ rule1 @
expression E;
statement S;
@@
E = qemu_aio_get (...);
(
- if (E == NULL) { ... }
|
- if (E)
{ <... S ...> }
)
which however missed occurrences in linux-aio.c and posix-aio-compat.c.
Those were done by hand.
The change in vdi_aio_setup's caller was also done by hand.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are two different types of flush that you can do: Flushing one level up
to the OS (i.e. writing data to the host page cache) or flushing it all the way
down to the disk. The existing functions flush to the disk, reflect this in the
function name.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since coroutine operation is now mandatory, convert all bdrv_flush
implementations to coroutines. For qcow2, this means taking the lock.
Other implementations are simpler and just forward bdrv_flush to the
underlying protocol, so they can avoid the lock.
The bdrv_flush callback is then unused and can be eliminated.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The config string is variously delimited by =, @, and /, depending on the
field. Allow these characters to be escaped by preceeding them with \.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
librbd recently added async writeback and flush support. If the new
rbd_flush() call is available, call it.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Properly document the configuration string syntax and semantics. Remove
(out of date) details about the librbd implementation.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If we are reading from the default config location, ignore any failures.
It is perfectly legal for the user to specify exactly the options they need
and to not rely on any config file.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fix leak of s->snap in failure path. Simplify error paths for the whole
function.
Reported-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
No assignment in condition. Remove duplicate ret > 0 check.
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow the client id to be specified in the config string via 'id=' so that
users can control who they authenticate as. Currently they are stuck with
the default ('admin'). This is necessary for anyone using authentication
in their environment.
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Variable 'snap' is assigned a value that is never used.
Remove snap and the related code.
Cc: Christian Brunner <chb@muc.de>
Cc: Josh Durgin <josh.durgin@dreamhost.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If scheduling fails, the number of outstanding I/Os must be correct,
or there will be a hang when waiting for everything to be flushed.
Reviewed-by: Christian Brunner <chb@muc.de>
Reported-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The new format is rbd:pool/image[@snapshot][:option1=value1[:option2=value2...]]
Each option is used to configure rados, and may be any Ceph option, or "conf".
The "conf" option specifies a Ceph configuration file to read.
This allows rbd volumes from more than one Ceph cluster to be used by
specifying different monitor addresses, as well as having different
logging levels or locations for different volumes.
Reviewed-by: Christian Brunner <chb@muc.de>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
librbd stacks on top of librados to provide access
to rbd images.
Using librbd simplifies the qemu code, and allows
qemu to use new versions of the rbd format
with few (if any) changes.
Reviewed-by: Christian Brunner <chb@muc.de>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
RBD is an block driver for the distributed file system Ceph
(http://ceph.newdream.net/). This driver uses librados (which is part
of the Ceph server) for direct access to the Ceph object store and is
running entirely in userspace (Yehuda also wrote a driver for the
linux kernel, that can be used to access rbd volumes as a block
device).
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Christian Brunner <chb@muc.de>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>