2008-07-03 17:41:03 +04:00
|
|
|
/*
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
* Copyright Red Hat
|
2008-05-28 01:13:40 +04:00
|
|
|
* Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws>
|
|
|
|
*
|
2016-01-14 11:41:02 +03:00
|
|
|
* Network Block Device Server Side
|
2008-05-28 01:13:40 +04:00
|
|
|
*
|
|
|
|
* This program is free software; you can redistribute it and/or modify
|
|
|
|
* it under the terms of the GNU General Public License as published by
|
|
|
|
* the Free Software Foundation; under version 2 of the License.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public License
|
2009-07-17 00:47:01 +04:00
|
|
|
* along with this program; if not, see <http://www.gnu.org/licenses/>.
|
2008-07-03 17:41:03 +04:00
|
|
|
*/
|
2008-05-28 01:13:40 +04:00
|
|
|
|
2016-01-29 20:50:05 +03:00
|
|
|
#include "qemu/osdep.h"
|
2020-09-24 18:26:50 +03:00
|
|
|
|
2022-12-21 16:35:49 +03:00
|
|
|
#include "block/block_int.h"
|
2020-09-24 18:26:50 +03:00
|
|
|
#include "block/export.h"
|
2022-12-21 16:35:49 +03:00
|
|
|
#include "block/dirty-bitmap.h"
|
include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef. Since then, we've moved to include qemu/osdep.h
everywhere. Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h. That's in excess of
100KiB of crap most .c files don't actually need.
Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h. Include qapi/error.h in .c files that need it and don't
get it now. Include qapi-types.h in qom/object.h for uint16List.
Update scripts/clean-includes accordingly. Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h
comment quoted above similarly.
This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third. Unfortunately, the number depending on
qapi-types.h shrinks only a little. More work is needed for that one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-14 11:01:28 +03:00
|
|
|
#include "qapi/error.h"
|
2019-08-12 08:23:49 +03:00
|
|
|
#include "qemu/queue.h"
|
2017-07-07 18:29:18 +03:00
|
|
|
#include "trace.h"
|
2016-01-14 11:41:02 +03:00
|
|
|
#include "nbd-internal.h"
|
2019-05-10 18:17:35 +03:00
|
|
|
#include "qemu/units.h"
|
2022-02-26 21:07:23 +03:00
|
|
|
#include "qemu/memalign.h"
|
qemu-nbd: only send a limited number of errno codes on the wire
Right now, NBD includes potentially platform-specific error values in
the wire protocol.
Luckily, most common error values are more or less universal: in
particular, of all errno values <= 34 (up to ERANGE), they are all the
same on supported platforms except for 11 (which is EAGAIN on Windows and
Linux, but EDEADLK on Darwin and the *BSDs). So, in order to guarantee
some portability, only keep a handful of possible error codes and squash
everything else to EINVAL.
This patch defines a limited set of errno values that are valid for the
NBD protocol, and specifies recommendations for what error to return
in specific corner cases. The set of errno values is roughly based on
the errors listed in the read(2) and write(2) man pages, with some
exceptions:
- ENOMEM is added for servers that implement copy-on-write or other
formats that require dynamic allocation.
- EDQUOT is not part of the universal set of errors; it can be changed
to ENOSPC on the wire format.
- EFBIG is part of the universal set of errors, but it is also changed
to ENOSPC because it is pretty similar to ENOSPC or EDQUOT.
Incoming values will in general match system errno values, but not
on the Hurd which has different errno values (they have a "subsystem
code" equal to 0x10 in bits 24-31). The Hurd is probably not something
to which QEMU has been ported, but still do the right thing and
reverse-map the NBD errno values to the system errno values.
The corresponding patch to the NBD protocol description can be found at
http://article.gmane.org/gmane.linux.drivers.nbd.general/3154.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07 18:25:10 +03:00
|
|
|
|
2018-03-12 18:21:21 +03:00
|
|
|
#define NBD_META_ID_BASE_ALLOCATION 0
|
2020-10-27 08:05:54 +03:00
|
|
|
#define NBD_META_ID_ALLOCATION_DEPTH 1
|
2020-10-27 08:05:52 +03:00
|
|
|
/* Dirty bitmaps use 'NBD_META_ID_DIRTY_BITMAP + i', so keep this id last. */
|
2020-10-27 08:05:54 +03:00
|
|
|
#define NBD_META_ID_DIRTY_BITMAP 2
|
2018-06-09 18:17:56 +03:00
|
|
|
|
2019-05-10 18:17:35 +03:00
|
|
|
/*
|
|
|
|
* NBD_MAX_BLOCK_STATUS_EXTENTS: 1 MiB of extents data. An empirical
|
2018-06-09 18:17:56 +03:00
|
|
|
* constant. If an increase is needed, note that the NBD protocol
|
|
|
|
* recommends no larger than 32 mb, so that the client won't consider
|
2019-05-10 18:17:35 +03:00
|
|
|
* the reply as a denial of service attack.
|
|
|
|
*/
|
|
|
|
#define NBD_MAX_BLOCK_STATUS_EXTENTS (1 * MiB / 8)
|
2018-03-12 18:21:21 +03:00
|
|
|
|
qemu-nbd: only send a limited number of errno codes on the wire
Right now, NBD includes potentially platform-specific error values in
the wire protocol.
Luckily, most common error values are more or less universal: in
particular, of all errno values <= 34 (up to ERANGE), they are all the
same on supported platforms except for 11 (which is EAGAIN on Windows and
Linux, but EDEADLK on Darwin and the *BSDs). So, in order to guarantee
some portability, only keep a handful of possible error codes and squash
everything else to EINVAL.
This patch defines a limited set of errno values that are valid for the
NBD protocol, and specifies recommendations for what error to return
in specific corner cases. The set of errno values is roughly based on
the errors listed in the read(2) and write(2) man pages, with some
exceptions:
- ENOMEM is added for servers that implement copy-on-write or other
formats that require dynamic allocation.
- EDQUOT is not part of the universal set of errors; it can be changed
to ENOSPC on the wire format.
- EFBIG is part of the universal set of errors, but it is also changed
to ENOSPC because it is pretty similar to ENOSPC or EDQUOT.
Incoming values will in general match system errno values, but not
on the Hurd which has different errno values (they have a "subsystem
code" equal to 0x10 in bits 24-31). The Hurd is probably not something
to which QEMU has been ported, but still do the right thing and
reverse-map the NBD errno values to the system errno values.
The corresponding patch to the NBD protocol description can be found at
http://article.gmane.org/gmane.linux.drivers.nbd.general/3154.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07 18:25:10 +03:00
|
|
|
static int system_errno_to_nbd_errno(int err)
|
|
|
|
{
|
|
|
|
switch (err) {
|
|
|
|
case 0:
|
|
|
|
return NBD_SUCCESS;
|
|
|
|
case EPERM:
|
2016-04-06 06:35:02 +03:00
|
|
|
case EROFS:
|
qemu-nbd: only send a limited number of errno codes on the wire
Right now, NBD includes potentially platform-specific error values in
the wire protocol.
Luckily, most common error values are more or less universal: in
particular, of all errno values <= 34 (up to ERANGE), they are all the
same on supported platforms except for 11 (which is EAGAIN on Windows and
Linux, but EDEADLK on Darwin and the *BSDs). So, in order to guarantee
some portability, only keep a handful of possible error codes and squash
everything else to EINVAL.
This patch defines a limited set of errno values that are valid for the
NBD protocol, and specifies recommendations for what error to return
in specific corner cases. The set of errno values is roughly based on
the errors listed in the read(2) and write(2) man pages, with some
exceptions:
- ENOMEM is added for servers that implement copy-on-write or other
formats that require dynamic allocation.
- EDQUOT is not part of the universal set of errors; it can be changed
to ENOSPC on the wire format.
- EFBIG is part of the universal set of errors, but it is also changed
to ENOSPC because it is pretty similar to ENOSPC or EDQUOT.
Incoming values will in general match system errno values, but not
on the Hurd which has different errno values (they have a "subsystem
code" equal to 0x10 in bits 24-31). The Hurd is probably not something
to which QEMU has been ported, but still do the right thing and
reverse-map the NBD errno values to the system errno values.
The corresponding patch to the NBD protocol description can be found at
http://article.gmane.org/gmane.linux.drivers.nbd.general/3154.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07 18:25:10 +03:00
|
|
|
return NBD_EPERM;
|
|
|
|
case EIO:
|
|
|
|
return NBD_EIO;
|
|
|
|
case ENOMEM:
|
|
|
|
return NBD_ENOMEM;
|
|
|
|
#ifdef EDQUOT
|
|
|
|
case EDQUOT:
|
|
|
|
#endif
|
|
|
|
case EFBIG:
|
|
|
|
case ENOSPC:
|
|
|
|
return NBD_ENOSPC;
|
2017-10-27 13:40:28 +03:00
|
|
|
case EOVERFLOW:
|
|
|
|
return NBD_EOVERFLOW;
|
2019-08-23 17:37:23 +03:00
|
|
|
case ENOTSUP:
|
|
|
|
#if ENOTSUP != EOPNOTSUPP
|
|
|
|
case EOPNOTSUPP:
|
|
|
|
#endif
|
|
|
|
return NBD_ENOTSUP;
|
2016-10-14 21:33:16 +03:00
|
|
|
case ESHUTDOWN:
|
|
|
|
return NBD_ESHUTDOWN;
|
qemu-nbd: only send a limited number of errno codes on the wire
Right now, NBD includes potentially platform-specific error values in
the wire protocol.
Luckily, most common error values are more or less universal: in
particular, of all errno values <= 34 (up to ERANGE), they are all the
same on supported platforms except for 11 (which is EAGAIN on Windows and
Linux, but EDEADLK on Darwin and the *BSDs). So, in order to guarantee
some portability, only keep a handful of possible error codes and squash
everything else to EINVAL.
This patch defines a limited set of errno values that are valid for the
NBD protocol, and specifies recommendations for what error to return
in specific corner cases. The set of errno values is roughly based on
the errors listed in the read(2) and write(2) man pages, with some
exceptions:
- ENOMEM is added for servers that implement copy-on-write or other
formats that require dynamic allocation.
- EDQUOT is not part of the universal set of errors; it can be changed
to ENOSPC on the wire format.
- EFBIG is part of the universal set of errors, but it is also changed
to ENOSPC because it is pretty similar to ENOSPC or EDQUOT.
Incoming values will in general match system errno values, but not
on the Hurd which has different errno values (they have a "subsystem
code" equal to 0x10 in bits 24-31). The Hurd is probably not something
to which QEMU has been ported, but still do the right thing and
reverse-map the NBD errno values to the system errno values.
The corresponding patch to the NBD protocol description can be found at
http://article.gmane.org/gmane.linux.drivers.nbd.general/3154.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07 18:25:10 +03:00
|
|
|
case EINVAL:
|
|
|
|
default:
|
|
|
|
return NBD_EINVAL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-08-22 17:30:31 +04:00
|
|
|
/* Definitions for opaque data types */
|
|
|
|
|
2016-10-14 21:33:05 +03:00
|
|
|
typedef struct NBDRequestData NBDRequestData;
|
2012-08-22 17:30:31 +04:00
|
|
|
|
2016-10-14 21:33:05 +03:00
|
|
|
struct NBDRequestData {
|
2012-08-22 17:30:31 +04:00
|
|
|
NBDClient *client;
|
|
|
|
uint8_t *data;
|
2016-05-12 01:39:37 +03:00
|
|
|
bool complete;
|
2012-08-22 17:30:31 +04:00
|
|
|
};
|
|
|
|
|
|
|
|
struct NBDExport {
|
2020-09-24 18:26:50 +03:00
|
|
|
BlockExport common;
|
2012-09-18 15:59:03 +04:00
|
|
|
|
2012-08-22 17:59:23 +04:00
|
|
|
char *name;
|
2016-10-14 21:33:03 +03:00
|
|
|
char *description;
|
2019-01-17 22:36:43 +03:00
|
|
|
uint64_t size;
|
2016-07-21 22:34:46 +03:00
|
|
|
uint16_t nbdflags;
|
2012-09-18 15:58:25 +04:00
|
|
|
QTAILQ_HEAD(, NBDClient) clients;
|
2012-08-22 17:59:23 +04:00
|
|
|
QTAILQ_ENTRY(NBDExport) next;
|
2014-06-20 23:57:32 +04:00
|
|
|
|
2016-07-06 12:22:39 +03:00
|
|
|
BlockBackend *eject_notifier_blk;
|
2016-01-29 18:36:06 +03:00
|
|
|
Notifier eject_notifier;
|
2018-06-09 18:17:56 +03:00
|
|
|
|
2020-10-27 08:05:54 +03:00
|
|
|
bool allocation_depth;
|
2020-10-27 08:05:52 +03:00
|
|
|
BdrvDirtyBitmap **export_bitmaps;
|
|
|
|
size_t nr_export_bitmaps;
|
2012-08-22 17:30:31 +04:00
|
|
|
};
|
|
|
|
|
2012-08-22 17:59:23 +04:00
|
|
|
static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
|
|
|
|
|
2023-09-25 22:22:40 +03:00
|
|
|
/*
|
|
|
|
* NBDMetaContexts represents a list of meta contexts in use,
|
2018-03-12 18:21:21 +03:00
|
|
|
* as selected by NBD_OPT_SET_META_CONTEXT. Also used for
|
2023-09-25 22:22:40 +03:00
|
|
|
* NBD_OPT_LIST_META_CONTEXT.
|
|
|
|
*/
|
|
|
|
struct NBDMetaContexts {
|
|
|
|
const NBDExport *exp; /* associated export */
|
2020-10-27 08:05:51 +03:00
|
|
|
size_t count; /* number of negotiated contexts */
|
2018-03-12 18:21:21 +03:00
|
|
|
bool base_allocation; /* export base:allocation context (block status) */
|
2020-10-27 08:05:54 +03:00
|
|
|
bool allocation_depth; /* export qemu:allocation-depth */
|
2020-10-27 08:05:52 +03:00
|
|
|
bool *bitmaps; /*
|
|
|
|
* export qemu:dirty-bitmap:<export bitmap name>,
|
|
|
|
* sized by exp->nr_export_bitmaps
|
|
|
|
*/
|
2023-09-25 22:22:40 +03:00
|
|
|
};
|
2018-03-12 18:21:21 +03:00
|
|
|
|
2012-08-22 17:30:31 +04:00
|
|
|
struct NBDClient {
|
2023-12-21 22:24:51 +03:00
|
|
|
int refcount; /* atomic */
|
nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-09 01:26:17 +03:00
|
|
|
void (*close_fn)(NBDClient *client, bool negotiated);
|
2012-08-22 17:30:31 +04:00
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
QemuMutex lock;
|
|
|
|
|
2012-08-22 17:30:31 +04:00
|
|
|
NBDExport *exp;
|
2016-02-10 21:41:11 +03:00
|
|
|
QCryptoTLSCreds *tlscreds;
|
qemu-nbd: add support for authorization of TLS clients
Currently any client which can complete the TLS handshake is able to use
the NBD server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509 certificate.
This means the client will have to acquire a certificate from the CA
before they are permitted to use the NBD server. This is still a fairly
low bar to cross.
This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
takes the ID of a previously added 'QAuthZ' object instance. This will
be used to validate the client's x509 distinguished name. Clients
failing the authorization check will not be permitted to use the NBD
server.
For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name is
CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB
escape the commas in the name and use:
qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
endpoint=server,verify-peer=yes \
--object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\
O=Example Org,,L=London,,ST=London,,C=GB' \
--tls-creds tls0 \
--tls-authz authz0 \
....other qemu-nbd args...
NB: a real shell command line would not have leading whitespace after
the line continuation, it is just included here for clarity.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20190227162035.18543-2-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: split long line in --help text, tweak 233 to show that whitespace
after ,, in identity= portion is actually okay]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-27 19:20:33 +03:00
|
|
|
char *tlsauthz;
|
2016-02-10 21:41:04 +03:00
|
|
|
QIOChannelSocket *sioc; /* The underlying data channel */
|
|
|
|
QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
|
2012-08-22 17:30:31 +04:00
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
Coroutine *recv_coroutine; /* protected by lock */
|
2012-08-22 17:30:31 +04:00
|
|
|
|
|
|
|
CoMutex send_lock;
|
|
|
|
Coroutine *send_coroutine;
|
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
bool read_yielding; /* protected by lock */
|
|
|
|
bool quiescing; /* protected by lock */
|
2020-12-14 20:05:18 +03:00
|
|
|
|
2012-09-18 15:58:25 +04:00
|
|
|
QTAILQ_ENTRY(NBDClient) next;
|
2023-12-21 22:24:52 +03:00
|
|
|
int nb_requests; /* protected by lock */
|
|
|
|
bool closing; /* protected by lock */
|
2017-10-27 13:40:32 +03:00
|
|
|
|
nbd/server: Trace client noncompliance on unaligned requests
We've recently added traces for clients to flag server non-compliance;
let's do the same for servers to flag client non-compliance. According
to the spec, if the client requests NBD_INFO_BLOCK_SIZE, it is
promising to send all requests aligned to those boundaries. Of
course, if the client does not request NBD_INFO_BLOCK_SIZE, then it
made no promises so we shouldn't flag anything; and because we are
willing to handle clients that made no promises (the spec allows us to
use NBD_REP_ERR_BLOCK_SIZE_REQD if we had been unwilling), we already
have to handle unaligned requests (which the block layer already does
on our behalf). So even though the spec allows us to return EINVAL
for clients that promised to behave, it's easier to always answer
unaligned requests. Still, flagging non-compliance can be useful in
debugging a client that is trying to be maximally portable.
Qemu as client used to have one spot where it sent non-compliant
requests: if the server sends an unaligned reply to
NBD_CMD_BLOCK_STATUS, and the client was iterating over the entire
disk, the next request would start at that unaligned point; this was
fixed in commit a39286dd when the client was taught to work around
server non-compliance; but is equally fixed if the server is patched
to not send unaligned replies in the first place (yes, qemu 4.0 as
server still has few such bugs, although they will be patched in
4.1). Fortunately, I did not find any more spots where qemu as client
was non-compliant. I was able to test the patch by using the following
hack to convince qemu-io to run various unaligned commands, coupled
with serving 512-byte alignment by intentionally omitting '-f raw' on
the server while viewing server traces.
| diff --git i/nbd/client.c w/nbd/client.c
| index 427980bdd22..1858b2aac35 100644
| --- i/nbd/client.c
| +++ w/nbd/client.c
| @@ -449,6 +449,7 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt,
| nbd_send_opt_abort(ioc);
| return -1;
| }
| + info->min_block = 1;//hack
| if (!is_power_of_2(info->min_block)) {
| error_setg(errp, "server minimum block size %" PRIu32
| " is not a power of two", info->min_block);
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190403030526.12258-3-eblake@redhat.com>
[eblake: address minor review nits]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-03 06:05:21 +03:00
|
|
|
uint32_t check_align; /* If non-zero, check for aligned client requests */
|
|
|
|
|
2023-08-29 20:58:28 +03:00
|
|
|
NBDMode mode;
|
2023-09-25 22:22:40 +03:00
|
|
|
NBDMetaContexts contexts; /* Negotiated meta contexts */
|
2012-08-22 17:30:31 +04:00
|
|
|
|
2018-01-11 02:08:21 +03:00
|
|
|
uint32_t opt; /* Current option being negotiated */
|
|
|
|
uint32_t optlen; /* remaining length of data in ioc for the option being
|
|
|
|
negotiated now */
|
|
|
|
};
|
2008-05-28 01:13:40 +04:00
|
|
|
|
2017-02-13 16:52:24 +03:00
|
|
|
static void nbd_client_receive_next_request(NBDClient *client);
|
2014-06-20 23:57:32 +04:00
|
|
|
|
2012-08-23 16:57:11 +04:00
|
|
|
/* Basic flow for negotiation
|
2008-05-28 01:13:40 +04:00
|
|
|
|
|
|
|
Server Client
|
|
|
|
Negotiate
|
2012-08-23 16:57:11 +04:00
|
|
|
|
|
|
|
or
|
|
|
|
|
|
|
|
Server Client
|
|
|
|
Negotiate #1
|
|
|
|
Option
|
|
|
|
Negotiate #2
|
|
|
|
|
|
|
|
----
|
|
|
|
|
|
|
|
followed by
|
|
|
|
|
|
|
|
Server Client
|
2008-05-28 01:13:40 +04:00
|
|
|
Request
|
|
|
|
Response
|
|
|
|
Request
|
|
|
|
Response
|
|
|
|
...
|
|
|
|
...
|
|
|
|
Request (type == 2)
|
2012-08-23 16:57:11 +04:00
|
|
|
|
2008-05-28 01:13:40 +04:00
|
|
|
*/
|
|
|
|
|
2018-01-11 02:08:25 +03:00
|
|
|
static inline void set_be_option_rep(NBDOptionReply *rep, uint32_t option,
|
|
|
|
uint32_t type, uint32_t length)
|
|
|
|
{
|
|
|
|
stq_be_p(&rep->magic, NBD_REP_MAGIC);
|
|
|
|
stl_be_p(&rep->option, option);
|
|
|
|
stl_be_p(&rep->type, type);
|
|
|
|
stl_be_p(&rep->length, length);
|
|
|
|
}
|
|
|
|
|
2016-10-14 21:33:08 +03:00
|
|
|
/* Send a reply header, including length, but no payload.
|
|
|
|
* Return -errno on error, 0 on success. */
|
2018-01-11 02:08:21 +03:00
|
|
|
static int nbd_negotiate_send_rep_len(NBDClient *client, uint32_t type,
|
|
|
|
uint32_t len, Error **errp)
|
2012-08-23 16:57:11 +04:00
|
|
|
{
|
2018-01-11 02:08:25 +03:00
|
|
|
NBDOptionReply rep;
|
2012-08-23 16:57:11 +04:00
|
|
|
|
2018-01-11 02:08:25 +03:00
|
|
|
trace_nbd_negotiate_send_rep_len(client->opt, nbd_opt_lookup(client->opt),
|
2017-07-07 23:30:43 +03:00
|
|
|
type, nbd_rep_lookup(type), len);
|
2016-02-10 21:41:11 +03:00
|
|
|
|
2017-07-07 23:30:46 +03:00
|
|
|
assert(len < NBD_MAX_BUFFER_SIZE);
|
2017-07-07 18:29:11 +03:00
|
|
|
|
2018-01-11 02:08:25 +03:00
|
|
|
set_be_option_rep(&rep, client->opt, type, len);
|
|
|
|
return nbd_write(client->ioc, &rep, sizeof(rep), errp);
|
2014-06-07 04:32:31 +04:00
|
|
|
}
|
2012-08-23 16:57:11 +04:00
|
|
|
|
2016-10-14 21:33:08 +03:00
|
|
|
/* Send a reply header with default 0 length.
|
|
|
|
* Return -errno on error, 0 on success. */
|
2018-01-11 02:08:21 +03:00
|
|
|
static int nbd_negotiate_send_rep(NBDClient *client, uint32_t type,
|
2017-07-07 18:29:11 +03:00
|
|
|
Error **errp)
|
2016-10-14 21:33:08 +03:00
|
|
|
{
|
2018-01-11 02:08:21 +03:00
|
|
|
return nbd_negotiate_send_rep_len(client, type, 0, errp);
|
2016-10-14 21:33:08 +03:00
|
|
|
}
|
|
|
|
|
2016-10-14 21:33:09 +03:00
|
|
|
/* Send an error reply.
|
|
|
|
* Return -errno on error, 0 on success. */
|
2022-02-20 19:39:25 +03:00
|
|
|
static int G_GNUC_PRINTF(4, 0)
|
2018-01-11 02:08:23 +03:00
|
|
|
nbd_negotiate_send_rep_verr(NBDClient *client, uint32_t type,
|
|
|
|
Error **errp, const char *fmt, va_list va)
|
2016-10-14 21:33:09 +03:00
|
|
|
{
|
nbd: Use ERRP_GUARD()
If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
&error_fatal, this means that we don't break error_abort
(we'll abort on error_set, not on error_propagate)
If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call). Fix several such cases, e.g. in nbd_read().
This commit is generated by command
sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \
MAINTAINERS | \
xargs git ls-files | grep '\.[hc]$' | \
xargs spatch \
--sp-file scripts/coccinelle/errp-guard.cocci \
--macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-8-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci. Commit message
tweaked again.]
2020-07-07 19:50:36 +03:00
|
|
|
ERRP_GUARD();
|
2019-08-24 20:28:12 +03:00
|
|
|
g_autofree char *msg = NULL;
|
2016-10-14 21:33:09 +03:00
|
|
|
int ret;
|
|
|
|
size_t len;
|
|
|
|
|
|
|
|
msg = g_strdup_vprintf(fmt, va);
|
|
|
|
len = strlen(msg);
|
nbd/server: Avoid long error message assertions CVE-2020-10761
Ever since commit 36683283 (v2.8), the server code asserts that error
strings sent to the client are well-formed per the protocol by not
exceeding the maximum string length of 4096. At the time the server
first started sending error messages, the assertion could not be
triggered, because messages were completely under our control.
However, over the years, we have added latent scenarios where a client
could trigger the server to attempt an error message that would
include the client's information if it passed other checks first:
- requesting NBD_OPT_INFO/GO on an export name that is not present
(commit 0cfae925 in v2.12 echoes the name)
- requesting NBD_OPT_LIST/SET_META_CONTEXT on an export name that is
not present (commit e7b1948d in v2.12 echoes the name)
At the time, those were still safe because we flagged names larger
than 256 bytes with a different message; but that changed in commit
93676c88 (v4.2) when we raised the name limit to 4096 to match the NBD
string limit. (That commit also failed to change the magic number
4096 in nbd_negotiate_send_rep_err to the just-introduced named
constant.) So with that commit, long client names appended to server
text can now trigger the assertion, and thus be used as a denial of
service attack against a server. As a mitigating factor, if the
server requires TLS, the client cannot trigger the problematic paths
unless it first supplies TLS credentials, and such trusted clients are
less likely to try to intentionally crash the server.
We may later want to further sanitize the user-supplied strings we
place into our error messages, such as scrubbing out control
characters, but that is less important to the CVE fix, so it can be a
later patch to the new nbd_sanitize_name.
Consideration was given to changing the assertion in
nbd_negotiate_send_rep_verr to instead merely log a server error and
truncate the message, to avoid leaving a latent path that could
trigger a future CVE DoS on any new error message. However, this
merely complicates the code for something that is already (correctly)
flagging coding errors, and now that we are aware of the long message
pitfall, we are less likely to introduce such errors in the future,
which would make such error handling dead code.
Reported-by: Xueqiang Wei <xuwei@redhat.com>
CC: qemu-stable@nongnu.org
Fixes: https://bugzilla.redhat.com/1843684 CVE-2020-10761
Fixes: 93676c88d7
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200610163741.3745251-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-06-08 21:26:37 +03:00
|
|
|
assert(len < NBD_MAX_STRING_SIZE);
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_negotiate_send_rep_err(msg);
|
2018-01-11 02:08:21 +03:00
|
|
|
ret = nbd_negotiate_send_rep_len(client, type, len, errp);
|
2016-10-14 21:33:09 +03:00
|
|
|
if (ret < 0) {
|
2019-08-24 20:28:12 +03:00
|
|
|
return ret;
|
2016-10-14 21:33:09 +03:00
|
|
|
}
|
2018-01-11 02:08:21 +03:00
|
|
|
if (nbd_write(client->ioc, msg, len, errp) < 0) {
|
2017-07-07 18:29:11 +03:00
|
|
|
error_prepend(errp, "write failed (error message): ");
|
2019-08-24 20:28:12 +03:00
|
|
|
return -EIO;
|
2016-10-14 21:33:09 +03:00
|
|
|
}
|
2017-07-07 18:29:11 +03:00
|
|
|
|
2019-08-24 20:28:12 +03:00
|
|
|
return 0;
|
2016-10-14 21:33:09 +03:00
|
|
|
}
|
|
|
|
|
nbd/server: Avoid long error message assertions CVE-2020-10761
Ever since commit 36683283 (v2.8), the server code asserts that error
strings sent to the client are well-formed per the protocol by not
exceeding the maximum string length of 4096. At the time the server
first started sending error messages, the assertion could not be
triggered, because messages were completely under our control.
However, over the years, we have added latent scenarios where a client
could trigger the server to attempt an error message that would
include the client's information if it passed other checks first:
- requesting NBD_OPT_INFO/GO on an export name that is not present
(commit 0cfae925 in v2.12 echoes the name)
- requesting NBD_OPT_LIST/SET_META_CONTEXT on an export name that is
not present (commit e7b1948d in v2.12 echoes the name)
At the time, those were still safe because we flagged names larger
than 256 bytes with a different message; but that changed in commit
93676c88 (v4.2) when we raised the name limit to 4096 to match the NBD
string limit. (That commit also failed to change the magic number
4096 in nbd_negotiate_send_rep_err to the just-introduced named
constant.) So with that commit, long client names appended to server
text can now trigger the assertion, and thus be used as a denial of
service attack against a server. As a mitigating factor, if the
server requires TLS, the client cannot trigger the problematic paths
unless it first supplies TLS credentials, and such trusted clients are
less likely to try to intentionally crash the server.
We may later want to further sanitize the user-supplied strings we
place into our error messages, such as scrubbing out control
characters, but that is less important to the CVE fix, so it can be a
later patch to the new nbd_sanitize_name.
Consideration was given to changing the assertion in
nbd_negotiate_send_rep_verr to instead merely log a server error and
truncate the message, to avoid leaving a latent path that could
trigger a future CVE DoS on any new error message. However, this
merely complicates the code for something that is already (correctly)
flagging coding errors, and now that we are aware of the long message
pitfall, we are less likely to introduce such errors in the future,
which would make such error handling dead code.
Reported-by: Xueqiang Wei <xuwei@redhat.com>
CC: qemu-stable@nongnu.org
Fixes: https://bugzilla.redhat.com/1843684 CVE-2020-10761
Fixes: 93676c88d7
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200610163741.3745251-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-06-08 21:26:37 +03:00
|
|
|
/*
|
|
|
|
* Return a malloc'd copy of @name suitable for use in an error reply.
|
|
|
|
*/
|
|
|
|
static char *
|
|
|
|
nbd_sanitize_name(const char *name)
|
|
|
|
{
|
|
|
|
if (strnlen(name, 80) < 80) {
|
|
|
|
return g_strdup(name);
|
|
|
|
}
|
|
|
|
/* XXX Should we also try to sanitize any control characters? */
|
|
|
|
return g_strdup_printf("%.80s...", name);
|
|
|
|
}
|
|
|
|
|
2018-01-11 02:08:23 +03:00
|
|
|
/* Send an error reply.
|
|
|
|
* Return -errno on error, 0 on success. */
|
2022-02-20 19:39:25 +03:00
|
|
|
static int G_GNUC_PRINTF(4, 5)
|
2018-01-11 02:08:23 +03:00
|
|
|
nbd_negotiate_send_rep_err(NBDClient *client, uint32_t type,
|
|
|
|
Error **errp, const char *fmt, ...)
|
|
|
|
{
|
|
|
|
va_list va;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
va_start(va, fmt);
|
|
|
|
ret = nbd_negotiate_send_rep_verr(client, type, errp, fmt, va);
|
|
|
|
va_end(va);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-01-11 02:08:24 +03:00
|
|
|
/* Drop remainder of the current option, and send a reply with the
|
|
|
|
* given error type and message. Return -errno on read or write
|
|
|
|
* failure; or 0 if connection is still live. */
|
2022-02-20 19:39:25 +03:00
|
|
|
static int G_GNUC_PRINTF(4, 0)
|
2018-03-12 18:21:19 +03:00
|
|
|
nbd_opt_vdrop(NBDClient *client, uint32_t type, Error **errp,
|
|
|
|
const char *fmt, va_list va)
|
2018-01-11 02:08:24 +03:00
|
|
|
{
|
|
|
|
int ret = nbd_drop(client->ioc, client->optlen, errp);
|
|
|
|
|
|
|
|
client->optlen = 0;
|
|
|
|
if (!ret) {
|
|
|
|
ret = nbd_negotiate_send_rep_verr(client, type, errp, fmt, va);
|
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-02-20 19:39:25 +03:00
|
|
|
static int G_GNUC_PRINTF(4, 5)
|
2018-03-12 18:21:19 +03:00
|
|
|
nbd_opt_drop(NBDClient *client, uint32_t type, Error **errp,
|
|
|
|
const char *fmt, ...)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
va_list va;
|
|
|
|
|
|
|
|
va_start(va, fmt);
|
|
|
|
ret = nbd_opt_vdrop(client, type, errp, fmt, va);
|
|
|
|
va_end(va);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-02-20 19:39:25 +03:00
|
|
|
static int G_GNUC_PRINTF(3, 4)
|
2018-03-12 18:21:19 +03:00
|
|
|
nbd_opt_invalid(NBDClient *client, Error **errp, const char *fmt, ...)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
va_list va;
|
|
|
|
|
|
|
|
va_start(va, fmt);
|
|
|
|
ret = nbd_opt_vdrop(client, NBD_REP_ERR_INVALID, errp, fmt, va);
|
|
|
|
va_end(va);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-01-11 02:08:24 +03:00
|
|
|
/* Read size bytes from the unparsed payload of the current option.
|
2020-09-30 15:11:02 +03:00
|
|
|
* If @check_nul, require that no NUL bytes appear in buffer.
|
2018-01-11 02:08:24 +03:00
|
|
|
* Return -errno on I/O error, 0 if option was completely handled by
|
|
|
|
* sending a reply about inconsistent lengths, or 1 on success. */
|
|
|
|
static int nbd_opt_read(NBDClient *client, void *buffer, size_t size,
|
2020-09-30 15:11:02 +03:00
|
|
|
bool check_nul, Error **errp)
|
2018-01-11 02:08:24 +03:00
|
|
|
{
|
|
|
|
if (size > client->optlen) {
|
2018-03-12 18:21:19 +03:00
|
|
|
return nbd_opt_invalid(client, errp,
|
|
|
|
"Inconsistent lengths in option %s",
|
|
|
|
nbd_opt_lookup(client->opt));
|
2018-01-11 02:08:24 +03:00
|
|
|
}
|
|
|
|
client->optlen -= size;
|
2020-09-30 15:11:02 +03:00
|
|
|
if (qio_channel_read_all(client->ioc, buffer, size, errp) < 0) {
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (check_nul && strnlen(buffer, size) != size) {
|
|
|
|
return nbd_opt_invalid(client, errp,
|
|
|
|
"Unexpected embedded NUL in option %s",
|
|
|
|
nbd_opt_lookup(client->opt));
|
|
|
|
}
|
|
|
|
return 1;
|
2018-01-11 02:08:24 +03:00
|
|
|
}
|
|
|
|
|
2018-03-12 18:21:21 +03:00
|
|
|
/* Drop size bytes from the unparsed payload of the current option.
|
|
|
|
* Return -errno on I/O error, 0 if option was completely handled by
|
|
|
|
* sending a reply about inconsistent lengths, or 1 on success. */
|
|
|
|
static int nbd_opt_skip(NBDClient *client, size_t size, Error **errp)
|
|
|
|
{
|
|
|
|
if (size > client->optlen) {
|
|
|
|
return nbd_opt_invalid(client, errp,
|
|
|
|
"Inconsistent lengths in option %s",
|
|
|
|
nbd_opt_lookup(client->opt));
|
|
|
|
}
|
|
|
|
client->optlen -= size;
|
|
|
|
return nbd_drop(client->ioc, size, errp) < 0 ? -EIO : 1;
|
|
|
|
}
|
|
|
|
|
2018-03-12 18:21:20 +03:00
|
|
|
/* nbd_opt_read_name
|
|
|
|
*
|
|
|
|
* Read a string with the format:
|
2019-11-14 05:46:34 +03:00
|
|
|
* uint32_t len (<= NBD_MAX_STRING_SIZE)
|
2018-03-12 18:21:20 +03:00
|
|
|
* len bytes string (not 0-terminated)
|
|
|
|
*
|
2019-11-14 05:46:32 +03:00
|
|
|
* On success, @name will be allocated.
|
2018-03-12 18:21:20 +03:00
|
|
|
* If @length is non-null, it will be set to the actual string length.
|
|
|
|
*
|
|
|
|
* Return -errno on I/O error, 0 if option was completely handled by
|
|
|
|
* sending a reply about inconsistent lengths, or 1 on success.
|
|
|
|
*/
|
2019-11-14 05:46:32 +03:00
|
|
|
static int nbd_opt_read_name(NBDClient *client, char **name, uint32_t *length,
|
2018-03-12 18:21:20 +03:00
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
uint32_t len;
|
2019-11-14 05:46:32 +03:00
|
|
|
g_autofree char *local_name = NULL;
|
2018-03-12 18:21:20 +03:00
|
|
|
|
2019-11-14 05:46:32 +03:00
|
|
|
*name = NULL;
|
2020-09-30 15:11:02 +03:00
|
|
|
ret = nbd_opt_read(client, &len, sizeof(len), false, errp);
|
2018-03-12 18:21:20 +03:00
|
|
|
if (ret <= 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
2018-09-27 19:42:00 +03:00
|
|
|
len = cpu_to_be32(len);
|
2018-03-12 18:21:20 +03:00
|
|
|
|
2019-11-14 05:46:34 +03:00
|
|
|
if (len > NBD_MAX_STRING_SIZE) {
|
2018-03-12 18:21:20 +03:00
|
|
|
return nbd_opt_invalid(client, errp,
|
|
|
|
"Invalid name length: %" PRIu32, len);
|
|
|
|
}
|
|
|
|
|
2019-11-14 05:46:32 +03:00
|
|
|
local_name = g_malloc(len + 1);
|
2020-09-30 15:11:02 +03:00
|
|
|
ret = nbd_opt_read(client, local_name, len, true, errp);
|
2018-03-12 18:21:20 +03:00
|
|
|
if (ret <= 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
2019-11-14 05:46:32 +03:00
|
|
|
local_name[len] = '\0';
|
2018-03-12 18:21:20 +03:00
|
|
|
|
|
|
|
if (length) {
|
|
|
|
*length = len;
|
|
|
|
}
|
2019-11-14 05:46:32 +03:00
|
|
|
*name = g_steal_pointer(&local_name);
|
2018-03-12 18:21:20 +03:00
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2016-10-14 21:33:08 +03:00
|
|
|
/* Send a single NBD_REP_SERVER reply to NBD_OPT_LIST, including payload.
|
|
|
|
* Return -errno on error, 0 on success. */
|
2018-01-11 02:08:21 +03:00
|
|
|
static int nbd_negotiate_send_rep_list(NBDClient *client, NBDExport *exp,
|
2017-07-07 18:29:11 +03:00
|
|
|
Error **errp)
|
2014-06-07 04:32:32 +04:00
|
|
|
{
|
nbd: Use ERRP_GUARD()
If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
&error_fatal, this means that we don't break error_abort
(we'll abort on error_set, not on error_propagate)
If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call). Fix several such cases, e.g. in nbd_read().
This commit is generated by command
sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \
MAINTAINERS | \
xargs git ls-files | grep '\.[hc]$' | \
xargs spatch \
--sp-file scripts/coccinelle/errp-guard.cocci \
--macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-8-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci. Commit message
tweaked again.]
2020-07-07 19:50:36 +03:00
|
|
|
ERRP_GUARD();
|
2016-10-14 21:33:03 +03:00
|
|
|
size_t name_len, desc_len;
|
2016-10-14 21:33:08 +03:00
|
|
|
uint32_t len;
|
2016-10-14 21:33:03 +03:00
|
|
|
const char *name = exp->name ? exp->name : "";
|
|
|
|
const char *desc = exp->description ? exp->description : "";
|
2018-01-11 02:08:21 +03:00
|
|
|
QIOChannel *ioc = client->ioc;
|
2017-06-02 18:01:49 +03:00
|
|
|
int ret;
|
2014-06-07 04:32:32 +04:00
|
|
|
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_negotiate_send_rep_list(name, desc);
|
2016-10-14 21:33:03 +03:00
|
|
|
name_len = strlen(name);
|
|
|
|
desc_len = strlen(desc);
|
2019-11-14 05:46:34 +03:00
|
|
|
assert(name_len <= NBD_MAX_STRING_SIZE && desc_len <= NBD_MAX_STRING_SIZE);
|
2016-10-14 21:33:08 +03:00
|
|
|
len = name_len + desc_len + sizeof(len);
|
2018-01-11 02:08:21 +03:00
|
|
|
ret = nbd_negotiate_send_rep_len(client, NBD_REP_SERVER, len, errp);
|
2017-06-02 18:01:49 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
2014-06-07 04:32:32 +04:00
|
|
|
}
|
2016-10-14 21:33:08 +03:00
|
|
|
|
2014-06-07 04:32:32 +04:00
|
|
|
len = cpu_to_be32(name_len);
|
2017-07-07 18:29:11 +03:00
|
|
|
if (nbd_write(ioc, &len, sizeof(len), errp) < 0) {
|
|
|
|
error_prepend(errp, "write failed (name length): ");
|
2016-10-14 21:33:03 +03:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
2017-07-07 18:29:11 +03:00
|
|
|
|
|
|
|
if (nbd_write(ioc, name, name_len, errp) < 0) {
|
|
|
|
error_prepend(errp, "write failed (name buffer): ");
|
2014-06-07 04:32:32 +04:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
2017-07-07 18:29:11 +03:00
|
|
|
|
|
|
|
if (nbd_write(ioc, desc, desc_len, errp) < 0) {
|
|
|
|
error_prepend(errp, "write failed (description buffer): ");
|
2014-06-07 04:32:32 +04:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
2017-07-07 18:29:11 +03:00
|
|
|
|
2014-06-07 04:32:32 +04:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-10-14 21:33:08 +03:00
|
|
|
/* Process the NBD_OPT_LIST command, with a potential series of replies.
|
|
|
|
* Return -errno on error, 0 on success. */
|
2017-10-27 13:40:31 +03:00
|
|
|
static int nbd_negotiate_handle_list(NBDClient *client, Error **errp)
|
2014-06-07 04:32:32 +04:00
|
|
|
{
|
|
|
|
NBDExport *exp;
|
2018-01-11 02:08:21 +03:00
|
|
|
assert(client->opt == NBD_OPT_LIST);
|
2014-06-07 04:32:32 +04:00
|
|
|
|
|
|
|
/* For each export, send a NBD_REP_SERVER reply. */
|
|
|
|
QTAILQ_FOREACH(exp, &exports, next) {
|
2018-01-11 02:08:21 +03:00
|
|
|
if (nbd_negotiate_send_rep_list(client, exp, errp)) {
|
2014-06-07 04:32:32 +04:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* Finish with a NBD_REP_ACK. */
|
2018-01-11 02:08:21 +03:00
|
|
|
return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
|
2014-06-07 04:32:32 +04:00
|
|
|
}
|
|
|
|
|
2023-09-25 22:22:40 +03:00
|
|
|
static void nbd_check_meta_export(NBDClient *client, NBDExport *exp)
|
2018-03-12 18:21:21 +03:00
|
|
|
{
|
2023-09-25 22:22:40 +03:00
|
|
|
if (exp != client->contexts.exp) {
|
|
|
|
client->contexts.count = 0;
|
2020-10-27 08:05:51 +03:00
|
|
|
}
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
2017-07-07 23:30:46 +03:00
|
|
|
/* Send a reply to NBD_OPT_EXPORT_NAME.
|
|
|
|
* Return -errno on error, 0 on success. */
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
|
2017-07-07 18:29:11 +03:00
|
|
|
Error **errp)
|
2014-06-07 04:32:31 +04:00
|
|
|
{
|
nbd: Use ERRP_GUARD()
If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
&error_fatal, this means that we don't break error_abort
(we'll abort on error_set, not on error_propagate)
If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call). Fix several such cases, e.g. in nbd_read().
This commit is generated by command
sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \
MAINTAINERS | \
xargs git ls-files | grep '\.[hc]$' | \
xargs spatch \
--sp-file scripts/coccinelle/errp-guard.cocci \
--macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-8-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci. Commit message
tweaked again.]
2020-07-07 19:50:36 +03:00
|
|
|
ERRP_GUARD();
|
2019-11-14 05:46:32 +03:00
|
|
|
g_autofree char *name = NULL;
|
2017-07-17 22:26:35 +03:00
|
|
|
char buf[NBD_REPLY_EXPORT_NAME_SIZE] = "";
|
2017-07-07 23:30:45 +03:00
|
|
|
size_t len;
|
|
|
|
int ret;
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
uint16_t myflags;
|
2012-08-23 16:57:11 +04:00
|
|
|
|
2014-06-07 04:32:31 +04:00
|
|
|
/* Client sends:
|
|
|
|
[20 .. xx] export name (length bytes)
|
2017-07-17 22:26:35 +03:00
|
|
|
Server replies:
|
|
|
|
[ 0 .. 7] size
|
|
|
|
[ 8 .. 9] export flags
|
|
|
|
[10 .. 133] reserved (0) [unless no_zeroes]
|
2014-06-07 04:32:31 +04:00
|
|
|
*/
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_negotiate_handle_export_name();
|
2023-09-25 22:22:35 +03:00
|
|
|
if (client->mode >= NBD_MODE_EXTENDED) {
|
|
|
|
error_setg(errp, "Extended headers already negotiated");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2019-11-14 05:46:34 +03:00
|
|
|
if (client->optlen > NBD_MAX_STRING_SIZE) {
|
2017-07-07 18:29:11 +03:00
|
|
|
error_setg(errp, "Bad length received");
|
2017-06-02 18:01:48 +03:00
|
|
|
return -EINVAL;
|
2012-08-23 16:57:11 +04:00
|
|
|
}
|
2019-11-14 05:46:32 +03:00
|
|
|
name = g_malloc(client->optlen + 1);
|
2019-01-28 19:58:30 +03:00
|
|
|
if (nbd_read(client->ioc, name, client->optlen, "export name", errp) < 0) {
|
2018-01-11 02:08:22 +03:00
|
|
|
return -EIO;
|
2012-08-23 16:57:11 +04:00
|
|
|
}
|
2018-01-11 02:08:21 +03:00
|
|
|
name[client->optlen] = '\0';
|
|
|
|
client->optlen = 0;
|
2012-08-23 16:57:11 +04:00
|
|
|
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_negotiate_handle_export_name_request(name);
|
2016-02-10 21:41:09 +03:00
|
|
|
|
2012-08-23 16:57:11 +04:00
|
|
|
client->exp = nbd_export_find(name);
|
|
|
|
if (!client->exp) {
|
2017-07-07 18:29:11 +03:00
|
|
|
error_setg(errp, "export not found");
|
2017-06-02 18:01:48 +03:00
|
|
|
return -EINVAL;
|
2012-08-23 16:57:11 +04:00
|
|
|
}
|
2023-09-25 22:22:40 +03:00
|
|
|
nbd_check_meta_export(client, client->exp);
|
2012-08-23 16:57:11 +04:00
|
|
|
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
myflags = client->exp->nbdflags;
|
2023-08-29 20:58:28 +03:00
|
|
|
if (client->mode >= NBD_MODE_STRUCTURED) {
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
myflags |= NBD_FLAG_SEND_DF;
|
|
|
|
}
|
nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
Allow a client to request a subset of negotiated meta contexts. For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).
Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads. Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.
This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.
Of note: qemu will always advertise the new feature bit during
NBD_OPT_INFO if extended headers have alreay been negotiated
(regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
occurred); but for NBD_OPT_GO, qemu only advertises the feature if
block status is also enabled (that is, if the client does not
negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
the feature is not advertised).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-26-eblake@redhat.com>
[eblake: fix logic to reject unnegotiated contexts]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:42 +03:00
|
|
|
if (client->mode >= NBD_MODE_EXTENDED && client->contexts.count) {
|
|
|
|
myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
|
|
|
|
}
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags);
|
2017-07-07 23:30:45 +03:00
|
|
|
stq_be_p(buf, client->exp->size);
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
stw_be_p(buf + 8, myflags);
|
2017-07-07 23:30:45 +03:00
|
|
|
len = no_zeroes ? 10 : sizeof(buf);
|
|
|
|
ret = nbd_write(client->ioc, buf, len, errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
error_prepend(errp, "write failed: ");
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2012-08-23 16:57:11 +04:00
|
|
|
QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
|
2020-09-24 18:26:59 +03:00
|
|
|
blk_exp_ref(&client->exp->common);
|
2017-06-02 18:01:48 +03:00
|
|
|
|
|
|
|
return 0;
|
2012-08-23 16:57:11 +04:00
|
|
|
}
|
|
|
|
|
2017-07-07 23:30:46 +03:00
|
|
|
/* Send a single NBD_REP_INFO, with a buffer @buf of @length bytes.
|
|
|
|
* The buffer does NOT include the info type prefix.
|
|
|
|
* Return -errno on error, 0 if ready to send more. */
|
2018-01-11 02:08:21 +03:00
|
|
|
static int nbd_negotiate_send_info(NBDClient *client,
|
2017-07-07 23:30:46 +03:00
|
|
|
uint16_t info, uint32_t length, void *buf,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
trace_nbd_negotiate_send_info(info, nbd_info_lookup(info), length);
|
2018-01-11 02:08:21 +03:00
|
|
|
rc = nbd_negotiate_send_rep_len(client, NBD_REP_INFO,
|
2017-07-07 23:30:46 +03:00
|
|
|
sizeof(info) + length, errp);
|
|
|
|
if (rc < 0) {
|
|
|
|
return rc;
|
|
|
|
}
|
2018-09-27 19:42:00 +03:00
|
|
|
info = cpu_to_be16(info);
|
2017-07-07 23:30:46 +03:00
|
|
|
if (nbd_write(client->ioc, &info, sizeof(info), errp) < 0) {
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
if (nbd_write(client->ioc, buf, length, errp) < 0) {
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-01-11 02:08:20 +03:00
|
|
|
/* nbd_reject_length: Handle any unexpected payload.
|
|
|
|
* @fatal requests that we quit talking to the client, even if we are able
|
|
|
|
* to successfully send an error reply.
|
|
|
|
* Return:
|
|
|
|
* -errno transmission error occurred or @fatal was requested, errp is set
|
|
|
|
* 0 error message successfully sent to client, errp is not set
|
|
|
|
*/
|
2018-01-11 02:08:21 +03:00
|
|
|
static int nbd_reject_length(NBDClient *client, bool fatal, Error **errp)
|
2018-01-11 02:08:20 +03:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2018-01-11 02:08:21 +03:00
|
|
|
assert(client->optlen);
|
2018-03-12 18:21:19 +03:00
|
|
|
ret = nbd_opt_invalid(client, errp, "option '%s' has unexpected length",
|
|
|
|
nbd_opt_lookup(client->opt));
|
2018-01-11 02:08:20 +03:00
|
|
|
if (fatal && !ret) {
|
2018-01-11 02:08:24 +03:00
|
|
|
error_setg(errp, "option '%s' has unexpected length",
|
2018-01-11 02:08:21 +03:00
|
|
|
nbd_opt_lookup(client->opt));
|
2018-01-11 02:08:20 +03:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-07-07 23:30:46 +03:00
|
|
|
/* Handle NBD_OPT_INFO and NBD_OPT_GO.
|
|
|
|
* Return -errno on error, 0 if ready for next option, and 1 to move
|
|
|
|
* into transmission phase. */
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
|
2017-07-07 23:30:46 +03:00
|
|
|
{
|
|
|
|
int rc;
|
2019-11-14 05:46:32 +03:00
|
|
|
g_autofree char *name = NULL;
|
2017-07-07 23:30:46 +03:00
|
|
|
NBDExport *exp;
|
|
|
|
uint16_t requests;
|
|
|
|
uint16_t request;
|
2020-09-30 18:58:57 +03:00
|
|
|
uint32_t namelen = 0;
|
2017-07-07 23:30:46 +03:00
|
|
|
bool sendname = false;
|
nbd: Implement NBD_INFO_BLOCK_SIZE on server
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.
Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512
We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.
We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-9-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07 23:30:48 +03:00
|
|
|
bool blocksize = false;
|
|
|
|
uint32_t sizes[3];
|
2017-07-07 23:30:46 +03:00
|
|
|
char buf[sizeof(uint64_t) + sizeof(uint16_t)];
|
nbd/server: Trace client noncompliance on unaligned requests
We've recently added traces for clients to flag server non-compliance;
let's do the same for servers to flag client non-compliance. According
to the spec, if the client requests NBD_INFO_BLOCK_SIZE, it is
promising to send all requests aligned to those boundaries. Of
course, if the client does not request NBD_INFO_BLOCK_SIZE, then it
made no promises so we shouldn't flag anything; and because we are
willing to handle clients that made no promises (the spec allows us to
use NBD_REP_ERR_BLOCK_SIZE_REQD if we had been unwilling), we already
have to handle unaligned requests (which the block layer already does
on our behalf). So even though the spec allows us to return EINVAL
for clients that promised to behave, it's easier to always answer
unaligned requests. Still, flagging non-compliance can be useful in
debugging a client that is trying to be maximally portable.
Qemu as client used to have one spot where it sent non-compliant
requests: if the server sends an unaligned reply to
NBD_CMD_BLOCK_STATUS, and the client was iterating over the entire
disk, the next request would start at that unaligned point; this was
fixed in commit a39286dd when the client was taught to work around
server non-compliance; but is equally fixed if the server is patched
to not send unaligned replies in the first place (yes, qemu 4.0 as
server still has few such bugs, although they will be patched in
4.1). Fortunately, I did not find any more spots where qemu as client
was non-compliant. I was able to test the patch by using the following
hack to convince qemu-io to run various unaligned commands, coupled
with serving 512-byte alignment by intentionally omitting '-f raw' on
the server while viewing server traces.
| diff --git i/nbd/client.c w/nbd/client.c
| index 427980bdd22..1858b2aac35 100644
| --- i/nbd/client.c
| +++ w/nbd/client.c
| @@ -449,6 +449,7 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt,
| nbd_send_opt_abort(ioc);
| return -1;
| }
| + info->min_block = 1;//hack
| if (!is_power_of_2(info->min_block)) {
| error_setg(errp, "server minimum block size %" PRIu32
| " is not a power of two", info->min_block);
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190403030526.12258-3-eblake@redhat.com>
[eblake: address minor review nits]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-03 06:05:21 +03:00
|
|
|
uint32_t check_align = 0;
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
uint16_t myflags;
|
2017-07-07 23:30:46 +03:00
|
|
|
|
|
|
|
/* Client sends:
|
|
|
|
4 bytes: L, name length (can be 0)
|
|
|
|
L bytes: export name
|
|
|
|
2 bytes: N, number of requests (can be 0)
|
|
|
|
N * 2 bytes: N requests
|
|
|
|
*/
|
2019-11-14 05:46:32 +03:00
|
|
|
rc = nbd_opt_read_name(client, &name, &namelen, errp);
|
2018-01-11 02:08:24 +03:00
|
|
|
if (rc <= 0) {
|
|
|
|
return rc;
|
2017-07-07 23:30:46 +03:00
|
|
|
}
|
|
|
|
trace_nbd_negotiate_handle_export_name_request(name);
|
|
|
|
|
2020-09-30 15:11:02 +03:00
|
|
|
rc = nbd_opt_read(client, &requests, sizeof(requests), false, errp);
|
2018-01-11 02:08:24 +03:00
|
|
|
if (rc <= 0) {
|
|
|
|
return rc;
|
2017-07-07 23:30:46 +03:00
|
|
|
}
|
2018-09-27 19:42:00 +03:00
|
|
|
requests = be16_to_cpu(requests);
|
2017-07-07 23:30:46 +03:00
|
|
|
trace_nbd_negotiate_handle_info_requests(requests);
|
|
|
|
while (requests--) {
|
2020-09-30 15:11:02 +03:00
|
|
|
rc = nbd_opt_read(client, &request, sizeof(request), false, errp);
|
2018-01-11 02:08:24 +03:00
|
|
|
if (rc <= 0) {
|
|
|
|
return rc;
|
2017-07-07 23:30:46 +03:00
|
|
|
}
|
2018-09-27 19:42:00 +03:00
|
|
|
request = be16_to_cpu(request);
|
2017-07-07 23:30:46 +03:00
|
|
|
trace_nbd_negotiate_handle_info_request(request,
|
|
|
|
nbd_info_lookup(request));
|
nbd: Implement NBD_INFO_BLOCK_SIZE on server
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.
Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512
We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.
We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-9-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07 23:30:48 +03:00
|
|
|
/* We care about NBD_INFO_NAME and NBD_INFO_BLOCK_SIZE;
|
|
|
|
* everything else is either a request we don't know or
|
|
|
|
* something we send regardless of request */
|
|
|
|
switch (request) {
|
|
|
|
case NBD_INFO_NAME:
|
2017-07-07 23:30:46 +03:00
|
|
|
sendname = true;
|
nbd: Implement NBD_INFO_BLOCK_SIZE on server
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.
Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512
We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.
We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-9-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07 23:30:48 +03:00
|
|
|
break;
|
|
|
|
case NBD_INFO_BLOCK_SIZE:
|
|
|
|
blocksize = true;
|
|
|
|
break;
|
2017-07-07 23:30:46 +03:00
|
|
|
}
|
|
|
|
}
|
2018-01-11 02:08:24 +03:00
|
|
|
if (client->optlen) {
|
|
|
|
return nbd_reject_length(client, false, errp);
|
|
|
|
}
|
2017-07-07 23:30:46 +03:00
|
|
|
|
|
|
|
exp = nbd_export_find(name);
|
|
|
|
if (!exp) {
|
nbd/server: Avoid long error message assertions CVE-2020-10761
Ever since commit 36683283 (v2.8), the server code asserts that error
strings sent to the client are well-formed per the protocol by not
exceeding the maximum string length of 4096. At the time the server
first started sending error messages, the assertion could not be
triggered, because messages were completely under our control.
However, over the years, we have added latent scenarios where a client
could trigger the server to attempt an error message that would
include the client's information if it passed other checks first:
- requesting NBD_OPT_INFO/GO on an export name that is not present
(commit 0cfae925 in v2.12 echoes the name)
- requesting NBD_OPT_LIST/SET_META_CONTEXT on an export name that is
not present (commit e7b1948d in v2.12 echoes the name)
At the time, those were still safe because we flagged names larger
than 256 bytes with a different message; but that changed in commit
93676c88 (v4.2) when we raised the name limit to 4096 to match the NBD
string limit. (That commit also failed to change the magic number
4096 in nbd_negotiate_send_rep_err to the just-introduced named
constant.) So with that commit, long client names appended to server
text can now trigger the assertion, and thus be used as a denial of
service attack against a server. As a mitigating factor, if the
server requires TLS, the client cannot trigger the problematic paths
unless it first supplies TLS credentials, and such trusted clients are
less likely to try to intentionally crash the server.
We may later want to further sanitize the user-supplied strings we
place into our error messages, such as scrubbing out control
characters, but that is less important to the CVE fix, so it can be a
later patch to the new nbd_sanitize_name.
Consideration was given to changing the assertion in
nbd_negotiate_send_rep_verr to instead merely log a server error and
truncate the message, to avoid leaving a latent path that could
trigger a future CVE DoS on any new error message. However, this
merely complicates the code for something that is already (correctly)
flagging coding errors, and now that we are aware of the long message
pitfall, we are less likely to introduce such errors in the future,
which would make such error handling dead code.
Reported-by: Xueqiang Wei <xuwei@redhat.com>
CC: qemu-stable@nongnu.org
Fixes: https://bugzilla.redhat.com/1843684 CVE-2020-10761
Fixes: 93676c88d7
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200610163741.3745251-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-06-08 21:26:37 +03:00
|
|
|
g_autofree char *sane_name = nbd_sanitize_name(name);
|
|
|
|
|
2018-01-11 02:08:21 +03:00
|
|
|
return nbd_negotiate_send_rep_err(client, NBD_REP_ERR_UNKNOWN,
|
|
|
|
errp, "export '%s' not present",
|
nbd/server: Avoid long error message assertions CVE-2020-10761
Ever since commit 36683283 (v2.8), the server code asserts that error
strings sent to the client are well-formed per the protocol by not
exceeding the maximum string length of 4096. At the time the server
first started sending error messages, the assertion could not be
triggered, because messages were completely under our control.
However, over the years, we have added latent scenarios where a client
could trigger the server to attempt an error message that would
include the client's information if it passed other checks first:
- requesting NBD_OPT_INFO/GO on an export name that is not present
(commit 0cfae925 in v2.12 echoes the name)
- requesting NBD_OPT_LIST/SET_META_CONTEXT on an export name that is
not present (commit e7b1948d in v2.12 echoes the name)
At the time, those were still safe because we flagged names larger
than 256 bytes with a different message; but that changed in commit
93676c88 (v4.2) when we raised the name limit to 4096 to match the NBD
string limit. (That commit also failed to change the magic number
4096 in nbd_negotiate_send_rep_err to the just-introduced named
constant.) So with that commit, long client names appended to server
text can now trigger the assertion, and thus be used as a denial of
service attack against a server. As a mitigating factor, if the
server requires TLS, the client cannot trigger the problematic paths
unless it first supplies TLS credentials, and such trusted clients are
less likely to try to intentionally crash the server.
We may later want to further sanitize the user-supplied strings we
place into our error messages, such as scrubbing out control
characters, but that is less important to the CVE fix, so it can be a
later patch to the new nbd_sanitize_name.
Consideration was given to changing the assertion in
nbd_negotiate_send_rep_verr to instead merely log a server error and
truncate the message, to avoid leaving a latent path that could
trigger a future CVE DoS on any new error message. However, this
merely complicates the code for something that is already (correctly)
flagging coding errors, and now that we are aware of the long message
pitfall, we are less likely to introduce such errors in the future,
which would make such error handling dead code.
Reported-by: Xueqiang Wei <xuwei@redhat.com>
CC: qemu-stable@nongnu.org
Fixes: https://bugzilla.redhat.com/1843684 CVE-2020-10761
Fixes: 93676c88d7
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200610163741.3745251-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-06-08 21:26:37 +03:00
|
|
|
sane_name);
|
2017-07-07 23:30:46 +03:00
|
|
|
}
|
2023-09-25 22:22:40 +03:00
|
|
|
if (client->opt == NBD_OPT_GO) {
|
|
|
|
nbd_check_meta_export(client, exp);
|
|
|
|
}
|
2017-07-07 23:30:46 +03:00
|
|
|
|
|
|
|
/* Don't bother sending NBD_INFO_NAME unless client requested it */
|
|
|
|
if (sendname) {
|
2018-01-11 02:08:21 +03:00
|
|
|
rc = nbd_negotiate_send_info(client, NBD_INFO_NAME, namelen, name,
|
2017-07-07 23:30:46 +03:00
|
|
|
errp);
|
|
|
|
if (rc < 0) {
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Send NBD_INFO_DESCRIPTION only if available, regardless of
|
|
|
|
* client request */
|
|
|
|
if (exp->description) {
|
|
|
|
size_t len = strlen(exp->description);
|
|
|
|
|
2019-11-14 05:46:34 +03:00
|
|
|
assert(len <= NBD_MAX_STRING_SIZE);
|
2018-01-11 02:08:21 +03:00
|
|
|
rc = nbd_negotiate_send_info(client, NBD_INFO_DESCRIPTION,
|
2017-07-07 23:30:46 +03:00
|
|
|
len, exp->description, errp);
|
|
|
|
if (rc < 0) {
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
nbd: Implement NBD_INFO_BLOCK_SIZE on server
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.
Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512
We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.
We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-9-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07 23:30:48 +03:00
|
|
|
/* Send NBD_INFO_BLOCK_SIZE always, but tweak the minimum size
|
|
|
|
* according to whether the client requested it, and according to
|
|
|
|
* whether this is OPT_INFO or OPT_GO. */
|
nbd/server: Advertise actual minimum block size
Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split their
reply according to bdrv_block_status() boundaries. If the block device
has a request_alignment smaller than 512, but we advertise a block
alignment of 512 to the client, then this can result in the server
reply violating client expectations by reporting a smaller region of
the export than what the client is permitted to address (although this
is less of an issue for qemu 4.0 clients, given recent client patches
to overlook our non-compliance at EOF). Since it's always better to
be strict in what we send, it is worth advertising the actual minimum
block limit rather than blindly rounding it up to 512.
Note that this patch is not foolproof - it is still possible to
provoke non-compliant server behavior using:
$ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file
That is arguably a bug in the blkdebug driver (it should never pass
back block status smaller than its alignment, even if it has to make
multiple bdrv_get_status calls and determine the
least-common-denominator status among the group to return). It may
also be possible to observe issues with a backing layer with smaller
alignment than the active layer, although so far I have been unable to
write a reliable iotest for that scenario (but again, an issue like
that could be argued to be a bug in the block layer, or something
where we need a flag to bdrv_block_status() to state whether the
result must be aligned to the current layer's limits or can be
subdivided for accuracy when chasing backing files).
Anyways, as blkdebug is not normally used, and as this patch makes our
server more interoperable with qemu 3.1 clients, it is worth applying
now, even while we still work on a larger patch series for the 4.1
timeframe to have byte-accurate file lengths.
Note that the iotests output changes - for 223 and 233, we can see the
server's better granularity advertisement; and for 241, the three test
cases have the following effects:
- natural alignment: the server's smaller alignment is now advertised,
and the hole reported at EOF is now the right result; we've gotten rid
of the server's non-compliance
- forced server alignment: the server still advertises 512 bytes, but
still sends a mid-sector hole. This is still a server compliance bug,
which needs to be fixed in the block layer in a later patch; output
does not change because the client is already being tolerant of the
non-compliance
- forced client alignment: the server's smaller alignment means that
the client now sees the server's status change mid-sector without any
protocol violations, but the fact that the map shows an unaligned
mid-sector hole is evidence of the block layer problems with aligned
block status, to be fixed in a later patch
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-7-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: rebase to enhanced iotest 241 coverage]
2019-03-31 04:36:36 +03:00
|
|
|
/* minimum - 1 for back-compat, or actual if client will obey it. */
|
|
|
|
if (client->opt == NBD_OPT_INFO || blocksize) {
|
2020-09-24 18:27:08 +03:00
|
|
|
check_align = sizes[0] = blk_get_request_alignment(exp->common.blk);
|
nbd/server: Advertise actual minimum block size
Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split their
reply according to bdrv_block_status() boundaries. If the block device
has a request_alignment smaller than 512, but we advertise a block
alignment of 512 to the client, then this can result in the server
reply violating client expectations by reporting a smaller region of
the export than what the client is permitted to address (although this
is less of an issue for qemu 4.0 clients, given recent client patches
to overlook our non-compliance at EOF). Since it's always better to
be strict in what we send, it is worth advertising the actual minimum
block limit rather than blindly rounding it up to 512.
Note that this patch is not foolproof - it is still possible to
provoke non-compliant server behavior using:
$ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file
That is arguably a bug in the blkdebug driver (it should never pass
back block status smaller than its alignment, even if it has to make
multiple bdrv_get_status calls and determine the
least-common-denominator status among the group to return). It may
also be possible to observe issues with a backing layer with smaller
alignment than the active layer, although so far I have been unable to
write a reliable iotest for that scenario (but again, an issue like
that could be argued to be a bug in the block layer, or something
where we need a flag to bdrv_block_status() to state whether the
result must be aligned to the current layer's limits or can be
subdivided for accuracy when chasing backing files).
Anyways, as blkdebug is not normally used, and as this patch makes our
server more interoperable with qemu 3.1 clients, it is worth applying
now, even while we still work on a larger patch series for the 4.1
timeframe to have byte-accurate file lengths.
Note that the iotests output changes - for 223 and 233, we can see the
server's better granularity advertisement; and for 241, the three test
cases have the following effects:
- natural alignment: the server's smaller alignment is now advertised,
and the hole reported at EOF is now the right result; we've gotten rid
of the server's non-compliance
- forced server alignment: the server still advertises 512 bytes, but
still sends a mid-sector hole. This is still a server compliance bug,
which needs to be fixed in the block layer in a later patch; output
does not change because the client is already being tolerant of the
non-compliance
- forced client alignment: the server's smaller alignment means that
the client now sees the server's status change mid-sector without any
protocol violations, but the fact that the map shows an unaligned
mid-sector hole is evidence of the block layer problems with aligned
block status, to be fixed in a later patch
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-7-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: rebase to enhanced iotest 241 coverage]
2019-03-31 04:36:36 +03:00
|
|
|
} else {
|
|
|
|
sizes[0] = 1;
|
|
|
|
}
|
|
|
|
assert(sizes[0] <= NBD_MAX_BUFFER_SIZE);
|
nbd: Implement NBD_INFO_BLOCK_SIZE on server
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.
Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512
We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.
We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-9-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07 23:30:48 +03:00
|
|
|
/* preferred - Hard-code to 4096 for now.
|
|
|
|
* TODO: is blk_bs(blk)->bl.opt_transfer appropriate? */
|
nbd/server: Advertise actual minimum block size
Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split their
reply according to bdrv_block_status() boundaries. If the block device
has a request_alignment smaller than 512, but we advertise a block
alignment of 512 to the client, then this can result in the server
reply violating client expectations by reporting a smaller region of
the export than what the client is permitted to address (although this
is less of an issue for qemu 4.0 clients, given recent client patches
to overlook our non-compliance at EOF). Since it's always better to
be strict in what we send, it is worth advertising the actual minimum
block limit rather than blindly rounding it up to 512.
Note that this patch is not foolproof - it is still possible to
provoke non-compliant server behavior using:
$ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file
That is arguably a bug in the blkdebug driver (it should never pass
back block status smaller than its alignment, even if it has to make
multiple bdrv_get_status calls and determine the
least-common-denominator status among the group to return). It may
also be possible to observe issues with a backing layer with smaller
alignment than the active layer, although so far I have been unable to
write a reliable iotest for that scenario (but again, an issue like
that could be argued to be a bug in the block layer, or something
where we need a flag to bdrv_block_status() to state whether the
result must be aligned to the current layer's limits or can be
subdivided for accuracy when chasing backing files).
Anyways, as blkdebug is not normally used, and as this patch makes our
server more interoperable with qemu 3.1 clients, it is worth applying
now, even while we still work on a larger patch series for the 4.1
timeframe to have byte-accurate file lengths.
Note that the iotests output changes - for 223 and 233, we can see the
server's better granularity advertisement; and for 241, the three test
cases have the following effects:
- natural alignment: the server's smaller alignment is now advertised,
and the hole reported at EOF is now the right result; we've gotten rid
of the server's non-compliance
- forced server alignment: the server still advertises 512 bytes, but
still sends a mid-sector hole. This is still a server compliance bug,
which needs to be fixed in the block layer in a later patch; output
does not change because the client is already being tolerant of the
non-compliance
- forced client alignment: the server's smaller alignment means that
the client now sees the server's status change mid-sector without any
protocol violations, but the fact that the map shows an unaligned
mid-sector hole is evidence of the block layer problems with aligned
block status, to be fixed in a later patch
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-7-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: rebase to enhanced iotest 241 coverage]
2019-03-31 04:36:36 +03:00
|
|
|
sizes[1] = MAX(4096, sizes[0]);
|
nbd: Implement NBD_INFO_BLOCK_SIZE on server
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.
Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512
We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.
We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-9-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07 23:30:48 +03:00
|
|
|
/* maximum - At most 32M, but smaller as appropriate. */
|
2020-09-24 18:27:08 +03:00
|
|
|
sizes[2] = MIN(blk_get_max_transfer(exp->common.blk), NBD_MAX_BUFFER_SIZE);
|
nbd: Implement NBD_INFO_BLOCK_SIZE on server
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.
Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512
We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.
We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-9-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07 23:30:48 +03:00
|
|
|
trace_nbd_negotiate_handle_info_block_size(sizes[0], sizes[1], sizes[2]);
|
2018-09-27 19:42:00 +03:00
|
|
|
sizes[0] = cpu_to_be32(sizes[0]);
|
|
|
|
sizes[1] = cpu_to_be32(sizes[1]);
|
|
|
|
sizes[2] = cpu_to_be32(sizes[2]);
|
2018-01-11 02:08:21 +03:00
|
|
|
rc = nbd_negotiate_send_info(client, NBD_INFO_BLOCK_SIZE,
|
nbd: Implement NBD_INFO_BLOCK_SIZE on server
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.
Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512
We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.
We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-9-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07 23:30:48 +03:00
|
|
|
sizeof(sizes), sizes, errp);
|
|
|
|
if (rc < 0) {
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2017-07-07 23:30:46 +03:00
|
|
|
/* Send NBD_INFO_EXPORT always */
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
myflags = exp->nbdflags;
|
2023-08-29 20:58:28 +03:00
|
|
|
if (client->mode >= NBD_MODE_STRUCTURED) {
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
myflags |= NBD_FLAG_SEND_DF;
|
|
|
|
}
|
nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
Allow a client to request a subset of negotiated meta contexts. For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).
Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads. Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.
This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.
Of note: qemu will always advertise the new feature bit during
NBD_OPT_INFO if extended headers have alreay been negotiated
(regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
occurred); but for NBD_OPT_GO, qemu only advertises the feature if
block status is also enabled (that is, if the client does not
negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
the feature is not advertised).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-26-eblake@redhat.com>
[eblake: fix logic to reject unnegotiated contexts]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:42 +03:00
|
|
|
if (client->mode >= NBD_MODE_EXTENDED &&
|
|
|
|
(client->contexts.count || client->opt == NBD_OPT_INFO)) {
|
|
|
|
myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
|
|
|
|
}
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
trace_nbd_negotiate_new_style_size_flags(exp->size, myflags);
|
2017-07-07 23:30:46 +03:00
|
|
|
stq_be_p(buf, exp->size);
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
stw_be_p(buf + 8, myflags);
|
2018-01-11 02:08:21 +03:00
|
|
|
rc = nbd_negotiate_send_info(client, NBD_INFO_EXPORT,
|
2017-07-07 23:30:46 +03:00
|
|
|
sizeof(buf), buf, errp);
|
|
|
|
if (rc < 0) {
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2019-04-03 06:05:22 +03:00
|
|
|
/*
|
|
|
|
* If the client is just asking for NBD_OPT_INFO, but forgot to
|
|
|
|
* request block sizes in a situation that would impact
|
|
|
|
* performance, then return an error. But for NBD_OPT_GO, we
|
|
|
|
* tolerate all clients, regardless of alignments.
|
|
|
|
*/
|
|
|
|
if (client->opt == NBD_OPT_INFO && !blocksize &&
|
2020-09-24 18:27:08 +03:00
|
|
|
blk_get_request_alignment(exp->common.blk) > 1) {
|
2018-01-11 02:08:21 +03:00
|
|
|
return nbd_negotiate_send_rep_err(client,
|
|
|
|
NBD_REP_ERR_BLOCK_SIZE_REQD,
|
nbd: Implement NBD_INFO_BLOCK_SIZE on server
The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server that it intends to
obey block sizes.
Thanks to a recent fix (commit df7b97ff), our real minimum
transfer size is always 1 (the block layer takes care of
read-modify-write on our behalf), but we're still more efficient
if we advertise 512 when the client supports it, as follows:
- OPT_INFO, but no NBD_INFO_BLOCK_SIZE: advertise 512, then
fail with NBD_REP_ERR_BLOCK_SIZE_REQD; client is free to try
something else since we don't disconnect
- OPT_INFO with NBD_INFO_BLOCK_SIZE: advertise 512
- OPT_GO, but no NBD_INFO_BLOCK_SIZE: advertise 1
- OPT_GO with NBD_INFO_BLOCK_SIZE: advertise 512
We can also advertise the optimum block size (presumably the
cluster size, when exporting a qcow2 file), and our absolute
maximum transfer size of 32M, to help newer clients avoid
EINVAL failures or abrupt disconnects on oversize requests.
We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
we are no worse off for those clients than we used to be.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-9-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07 23:30:48 +03:00
|
|
|
errp,
|
|
|
|
"request NBD_INFO_BLOCK_SIZE to "
|
|
|
|
"use this export");
|
|
|
|
}
|
|
|
|
|
2017-07-07 23:30:46 +03:00
|
|
|
/* Final reply */
|
2018-01-11 02:08:21 +03:00
|
|
|
rc = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
|
2017-07-07 23:30:46 +03:00
|
|
|
if (rc < 0) {
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2018-01-11 02:08:21 +03:00
|
|
|
if (client->opt == NBD_OPT_GO) {
|
2017-07-07 23:30:46 +03:00
|
|
|
client->exp = exp;
|
nbd/server: Trace client noncompliance on unaligned requests
We've recently added traces for clients to flag server non-compliance;
let's do the same for servers to flag client non-compliance. According
to the spec, if the client requests NBD_INFO_BLOCK_SIZE, it is
promising to send all requests aligned to those boundaries. Of
course, if the client does not request NBD_INFO_BLOCK_SIZE, then it
made no promises so we shouldn't flag anything; and because we are
willing to handle clients that made no promises (the spec allows us to
use NBD_REP_ERR_BLOCK_SIZE_REQD if we had been unwilling), we already
have to handle unaligned requests (which the block layer already does
on our behalf). So even though the spec allows us to return EINVAL
for clients that promised to behave, it's easier to always answer
unaligned requests. Still, flagging non-compliance can be useful in
debugging a client that is trying to be maximally portable.
Qemu as client used to have one spot where it sent non-compliant
requests: if the server sends an unaligned reply to
NBD_CMD_BLOCK_STATUS, and the client was iterating over the entire
disk, the next request would start at that unaligned point; this was
fixed in commit a39286dd when the client was taught to work around
server non-compliance; but is equally fixed if the server is patched
to not send unaligned replies in the first place (yes, qemu 4.0 as
server still has few such bugs, although they will be patched in
4.1). Fortunately, I did not find any more spots where qemu as client
was non-compliant. I was able to test the patch by using the following
hack to convince qemu-io to run various unaligned commands, coupled
with serving 512-byte alignment by intentionally omitting '-f raw' on
the server while viewing server traces.
| diff --git i/nbd/client.c w/nbd/client.c
| index 427980bdd22..1858b2aac35 100644
| --- i/nbd/client.c
| +++ w/nbd/client.c
| @@ -449,6 +449,7 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt,
| nbd_send_opt_abort(ioc);
| return -1;
| }
| + info->min_block = 1;//hack
| if (!is_power_of_2(info->min_block)) {
| error_setg(errp, "server minimum block size %" PRIu32
| " is not a power of two", info->min_block);
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190403030526.12258-3-eblake@redhat.com>
[eblake: address minor review nits]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-03 06:05:21 +03:00
|
|
|
client->check_align = check_align;
|
2017-07-07 23:30:46 +03:00
|
|
|
QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
|
2020-09-24 18:26:59 +03:00
|
|
|
blk_exp_ref(&client->exp->common);
|
2017-07-07 23:30:46 +03:00
|
|
|
rc = 1;
|
|
|
|
}
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2016-10-14 21:33:09 +03:00
|
|
|
/* Handle NBD_OPT_STARTTLS. Return NULL to drop connection, or else the
|
|
|
|
* new channel for all further (now-encrypted) communication. */
|
2016-02-10 21:41:11 +03:00
|
|
|
static QIOChannel *nbd_negotiate_handle_starttls(NBDClient *client,
|
2017-07-07 18:29:11 +03:00
|
|
|
Error **errp)
|
2016-02-10 21:41:11 +03:00
|
|
|
{
|
|
|
|
QIOChannel *ioc;
|
|
|
|
QIOChannelTLS *tioc;
|
|
|
|
struct NBDTLSHandshakeData data = { 0 };
|
|
|
|
|
2018-01-11 02:08:21 +03:00
|
|
|
assert(client->opt == NBD_OPT_STARTTLS);
|
|
|
|
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_negotiate_handle_starttls();
|
2016-02-10 21:41:11 +03:00
|
|
|
ioc = client->ioc;
|
|
|
|
|
2018-01-11 02:08:21 +03:00
|
|
|
if (nbd_negotiate_send_rep(client, NBD_REP_ACK, errp) < 0) {
|
2016-05-12 01:39:36 +03:00
|
|
|
return NULL;
|
|
|
|
}
|
2016-02-10 21:41:11 +03:00
|
|
|
|
|
|
|
tioc = qio_channel_tls_new_server(ioc,
|
|
|
|
client->tlscreds,
|
qemu-nbd: add support for authorization of TLS clients
Currently any client which can complete the TLS handshake is able to use
the NBD server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509 certificate.
This means the client will have to acquire a certificate from the CA
before they are permitted to use the NBD server. This is still a fairly
low bar to cross.
This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
takes the ID of a previously added 'QAuthZ' object instance. This will
be used to validate the client's x509 distinguished name. Clients
failing the authorization check will not be permitted to use the NBD
server.
For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name is
CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB
escape the commas in the name and use:
qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
endpoint=server,verify-peer=yes \
--object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\
O=Example Org,,L=London,,ST=London,,C=GB' \
--tls-creds tls0 \
--tls-authz authz0 \
....other qemu-nbd args...
NB: a real shell command line would not have leading whitespace after
the line continuation, it is just included here for clarity.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20190227162035.18543-2-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: split long line in --help text, tweak 233 to show that whitespace
after ,, in identity= portion is actually okay]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-27 19:20:33 +03:00
|
|
|
client->tlsauthz,
|
2017-07-07 18:29:11 +03:00
|
|
|
errp);
|
2016-02-10 21:41:11 +03:00
|
|
|
if (!tioc) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2016-09-30 13:57:14 +03:00
|
|
|
qio_channel_set_name(QIO_CHANNEL(tioc), "nbd-server-tls");
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_negotiate_handle_starttls_handshake();
|
2016-02-10 21:41:11 +03:00
|
|
|
data.loop = g_main_loop_new(g_main_context_default(), FALSE);
|
|
|
|
qio_channel_tls_handshake(tioc,
|
|
|
|
nbd_tls_handshake,
|
|
|
|
&data,
|
2018-03-05 09:43:24 +03:00
|
|
|
NULL,
|
2016-02-10 21:41:11 +03:00
|
|
|
NULL);
|
|
|
|
|
|
|
|
if (!data.complete) {
|
|
|
|
g_main_loop_run(data.loop);
|
|
|
|
}
|
|
|
|
g_main_loop_unref(data.loop);
|
|
|
|
if (data.error) {
|
|
|
|
object_unref(OBJECT(tioc));
|
2017-07-07 18:29:11 +03:00
|
|
|
error_propagate(errp, data.error);
|
2016-02-10 21:41:11 +03:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return QIO_CHANNEL(tioc);
|
|
|
|
}
|
|
|
|
|
2018-03-12 18:21:21 +03:00
|
|
|
/* nbd_negotiate_send_meta_context
|
|
|
|
*
|
|
|
|
* Send one chunk of reply to NBD_OPT_{LIST,SET}_META_CONTEXT
|
|
|
|
*
|
|
|
|
* For NBD_OPT_LIST_META_CONTEXT @context_id is ignored, 0 is used instead.
|
|
|
|
*/
|
|
|
|
static int nbd_negotiate_send_meta_context(NBDClient *client,
|
|
|
|
const char *context,
|
|
|
|
uint32_t context_id,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
NBDOptionReplyMetaContext opt;
|
|
|
|
struct iovec iov[] = {
|
|
|
|
{.iov_base = &opt, .iov_len = sizeof(opt)},
|
|
|
|
{.iov_base = (void *)context, .iov_len = strlen(context)}
|
|
|
|
};
|
|
|
|
|
2019-11-14 05:46:34 +03:00
|
|
|
assert(iov[1].iov_len <= NBD_MAX_STRING_SIZE);
|
2018-03-12 18:21:21 +03:00
|
|
|
if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
|
|
|
|
context_id = 0;
|
|
|
|
}
|
|
|
|
|
2018-03-30 16:09:50 +03:00
|
|
|
trace_nbd_negotiate_meta_query_reply(context, context_id);
|
2018-03-12 18:21:21 +03:00
|
|
|
set_be_option_rep(&opt.h, client->opt, NBD_REP_META_CONTEXT,
|
|
|
|
sizeof(opt) - sizeof(opt.h) + iov[1].iov_len);
|
|
|
|
stl_be_p(&opt.context_id, context_id);
|
|
|
|
|
|
|
|
return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? -EIO : 0;
|
|
|
|
}
|
|
|
|
|
2020-09-30 15:11:03 +03:00
|
|
|
/*
|
|
|
|
* Return true if @query matches @pattern, or if @query is empty when
|
|
|
|
* the @client is performing _LIST_.
|
2018-06-09 18:17:53 +03:00
|
|
|
*/
|
2020-09-30 15:11:03 +03:00
|
|
|
static bool nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern,
|
|
|
|
const char *query)
|
2018-03-12 18:21:21 +03:00
|
|
|
{
|
2020-09-30 15:11:03 +03:00
|
|
|
if (!*query) {
|
|
|
|
trace_nbd_negotiate_meta_query_parse("empty");
|
|
|
|
return client->opt == NBD_OPT_LIST_META_CONTEXT;
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
2020-09-30 15:11:03 +03:00
|
|
|
if (strcmp(query, pattern) == 0) {
|
2018-06-20 00:55:09 +03:00
|
|
|
trace_nbd_negotiate_meta_query_parse(pattern);
|
2020-09-30 15:11:03 +03:00
|
|
|
return true;
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
2020-09-30 15:11:03 +03:00
|
|
|
trace_nbd_negotiate_meta_query_skip("pattern not matched");
|
|
|
|
return false;
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
2018-06-20 00:55:09 +03:00
|
|
|
/*
|
2020-09-30 15:11:03 +03:00
|
|
|
* Return true and adjust @str in place if it begins with @prefix.
|
2018-06-20 00:55:09 +03:00
|
|
|
*/
|
2020-09-30 15:11:03 +03:00
|
|
|
static bool nbd_strshift(const char **str, const char *prefix)
|
2018-06-20 00:55:09 +03:00
|
|
|
{
|
2020-09-30 15:11:03 +03:00
|
|
|
size_t len = strlen(prefix);
|
2018-06-20 00:55:09 +03:00
|
|
|
|
2020-09-30 15:11:03 +03:00
|
|
|
if (strncmp(*str, prefix, len) == 0) {
|
|
|
|
*str += len;
|
|
|
|
return true;
|
2018-06-20 00:55:09 +03:00
|
|
|
}
|
2020-09-30 15:11:03 +03:00
|
|
|
return false;
|
2018-06-20 00:55:09 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* nbd_meta_base_query
|
|
|
|
*
|
|
|
|
* Handle queries to 'base' namespace. For now, only the base:allocation
|
2020-09-30 15:11:03 +03:00
|
|
|
* context is available. Return true if @query has been handled.
|
2018-06-20 00:55:09 +03:00
|
|
|
*/
|
2023-09-25 22:22:40 +03:00
|
|
|
static bool nbd_meta_base_query(NBDClient *client, NBDMetaContexts *meta,
|
2020-09-30 15:11:03 +03:00
|
|
|
const char *query)
|
2018-06-20 00:55:09 +03:00
|
|
|
{
|
2020-09-30 15:11:03 +03:00
|
|
|
if (!nbd_strshift(&query, "base:")) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
trace_nbd_negotiate_meta_query_parse("base:");
|
|
|
|
|
|
|
|
if (nbd_meta_empty_or_pattern(client, "allocation", query)) {
|
|
|
|
meta->base_allocation = true;
|
|
|
|
}
|
|
|
|
return true;
|
2018-06-20 00:55:09 +03:00
|
|
|
}
|
|
|
|
|
2020-09-30 15:11:03 +03:00
|
|
|
/* nbd_meta_qemu_query
|
2018-06-09 18:17:56 +03:00
|
|
|
*
|
2020-09-30 15:11:03 +03:00
|
|
|
* Handle queries to 'qemu' namespace. For now, only the qemu:dirty-bitmap:
|
2020-10-27 08:05:54 +03:00
|
|
|
* and qemu:allocation-depth contexts are available. Return true if @query
|
|
|
|
* has been handled.
|
2020-09-30 15:11:03 +03:00
|
|
|
*/
|
2023-09-25 22:22:40 +03:00
|
|
|
static bool nbd_meta_qemu_query(NBDClient *client, NBDMetaContexts *meta,
|
2020-09-30 15:11:03 +03:00
|
|
|
const char *query)
|
2018-06-09 18:17:56 +03:00
|
|
|
{
|
2020-10-27 08:05:52 +03:00
|
|
|
size_t i;
|
|
|
|
|
2020-09-30 15:11:03 +03:00
|
|
|
if (!nbd_strshift(&query, "qemu:")) {
|
|
|
|
return false;
|
2018-06-09 18:17:56 +03:00
|
|
|
}
|
2020-09-30 15:11:03 +03:00
|
|
|
trace_nbd_negotiate_meta_query_parse("qemu:");
|
2018-06-09 18:17:56 +03:00
|
|
|
|
2020-09-30 15:11:03 +03:00
|
|
|
if (!*query) {
|
2018-06-09 18:17:56 +03:00
|
|
|
if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
|
2020-10-27 08:05:54 +03:00
|
|
|
meta->allocation_depth = meta->exp->allocation_depth;
|
2021-11-16 01:39:43 +03:00
|
|
|
if (meta->exp->nr_export_bitmaps) {
|
|
|
|
memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
|
|
|
|
}
|
2018-06-09 18:17:56 +03:00
|
|
|
}
|
|
|
|
trace_nbd_negotiate_meta_query_parse("empty");
|
2020-09-30 15:11:03 +03:00
|
|
|
return true;
|
2018-06-09 18:17:56 +03:00
|
|
|
}
|
|
|
|
|
2020-10-27 08:05:54 +03:00
|
|
|
if (strcmp(query, "allocation-depth") == 0) {
|
|
|
|
trace_nbd_negotiate_meta_query_parse("allocation-depth");
|
|
|
|
meta->allocation_depth = meta->exp->allocation_depth;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2020-09-30 15:11:03 +03:00
|
|
|
if (nbd_strshift(&query, "dirty-bitmap:")) {
|
|
|
|
trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
|
2020-10-27 08:05:52 +03:00
|
|
|
if (!*query) {
|
2021-11-16 01:39:43 +03:00
|
|
|
if (client->opt == NBD_OPT_LIST_META_CONTEXT &&
|
|
|
|
meta->exp->nr_export_bitmaps) {
|
2020-10-27 08:05:52 +03:00
|
|
|
memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
|
|
|
|
}
|
|
|
|
trace_nbd_negotiate_meta_query_parse("empty");
|
2020-09-30 15:11:03 +03:00
|
|
|
return true;
|
|
|
|
}
|
2020-10-27 08:05:52 +03:00
|
|
|
|
|
|
|
for (i = 0; i < meta->exp->nr_export_bitmaps; i++) {
|
|
|
|
const char *bm_name;
|
|
|
|
|
|
|
|
bm_name = bdrv_dirty_bitmap_name(meta->exp->export_bitmaps[i]);
|
|
|
|
if (strcmp(bm_name, query) == 0) {
|
|
|
|
meta->bitmaps[i] = true;
|
|
|
|
trace_nbd_negotiate_meta_query_parse(query);
|
|
|
|
return true;
|
|
|
|
}
|
2020-09-30 15:11:03 +03:00
|
|
|
}
|
2020-10-27 08:05:52 +03:00
|
|
|
trace_nbd_negotiate_meta_query_skip("no dirty-bitmap match");
|
2020-09-30 15:11:03 +03:00
|
|
|
return true;
|
2018-06-09 18:17:56 +03:00
|
|
|
}
|
|
|
|
|
2020-10-27 08:05:54 +03:00
|
|
|
trace_nbd_negotiate_meta_query_skip("unknown qemu context");
|
2020-09-30 15:11:03 +03:00
|
|
|
return true;
|
2018-06-09 18:17:56 +03:00
|
|
|
}
|
|
|
|
|
2018-03-12 18:21:21 +03:00
|
|
|
/* nbd_negotiate_meta_query
|
|
|
|
*
|
|
|
|
* Parse namespace name and call corresponding function to parse body of the
|
|
|
|
* query.
|
|
|
|
*
|
2019-11-14 05:46:34 +03:00
|
|
|
* The only supported namespaces are 'base' and 'qemu'.
|
2018-03-12 18:21:21 +03:00
|
|
|
*
|
|
|
|
* Return -errno on I/O error, 0 if option was completely handled by
|
|
|
|
* sending a reply about inconsistent lengths, or 1 on success. */
|
|
|
|
static int nbd_negotiate_meta_query(NBDClient *client,
|
2023-09-25 22:22:40 +03:00
|
|
|
NBDMetaContexts *meta, Error **errp)
|
2018-03-12 18:21:21 +03:00
|
|
|
{
|
|
|
|
int ret;
|
2020-09-30 15:11:03 +03:00
|
|
|
g_autofree char *query = NULL;
|
2018-03-12 18:21:21 +03:00
|
|
|
uint32_t len;
|
|
|
|
|
2020-09-30 15:11:02 +03:00
|
|
|
ret = nbd_opt_read(client, &len, sizeof(len), false, errp);
|
2018-03-12 18:21:21 +03:00
|
|
|
if (ret <= 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
2018-09-27 19:42:00 +03:00
|
|
|
len = cpu_to_be32(len);
|
2018-03-12 18:21:21 +03:00
|
|
|
|
2019-11-14 05:46:34 +03:00
|
|
|
if (len > NBD_MAX_STRING_SIZE) {
|
|
|
|
trace_nbd_negotiate_meta_query_skip("length too long");
|
|
|
|
return nbd_opt_skip(client, len, errp);
|
|
|
|
}
|
2018-03-12 18:21:21 +03:00
|
|
|
|
2020-09-30 15:11:03 +03:00
|
|
|
query = g_malloc(len + 1);
|
|
|
|
ret = nbd_opt_read(client, query, len, true, errp);
|
2018-03-12 18:21:21 +03:00
|
|
|
if (ret <= 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
2020-09-30 15:11:03 +03:00
|
|
|
query[len] = '\0';
|
2018-06-09 18:17:56 +03:00
|
|
|
|
2020-09-30 15:11:03 +03:00
|
|
|
if (nbd_meta_base_query(client, meta, query)) {
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
if (nbd_meta_qemu_query(client, meta, query)) {
|
|
|
|
return 1;
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
2018-06-09 18:17:56 +03:00
|
|
|
trace_nbd_negotiate_meta_query_skip("unknown namespace");
|
2020-09-30 15:11:03 +03:00
|
|
|
return 1;
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* nbd_negotiate_meta_queries
|
|
|
|
* Handle NBD_OPT_LIST_META_CONTEXT and NBD_OPT_SET_META_CONTEXT
|
|
|
|
*
|
|
|
|
* Return -errno on I/O error, or 0 if option was completely handled. */
|
2023-09-25 22:22:40 +03:00
|
|
|
static int nbd_negotiate_meta_queries(NBDClient *client, Error **errp)
|
2018-03-12 18:21:21 +03:00
|
|
|
{
|
|
|
|
int ret;
|
2019-11-14 05:46:32 +03:00
|
|
|
g_autofree char *export_name = NULL;
|
2021-07-13 16:58:41 +03:00
|
|
|
/* Mark unused to work around https://bugs.llvm.org/show_bug.cgi?id=3888 */
|
|
|
|
g_autofree G_GNUC_UNUSED bool *bitmaps = NULL;
|
2023-09-25 22:22:40 +03:00
|
|
|
NBDMetaContexts local_meta = {0};
|
|
|
|
NBDMetaContexts *meta;
|
2018-03-12 18:21:21 +03:00
|
|
|
uint32_t nb_queries;
|
2020-10-27 08:05:52 +03:00
|
|
|
size_t i;
|
2020-10-27 08:05:51 +03:00
|
|
|
size_t count = 0;
|
2018-03-12 18:21:21 +03:00
|
|
|
|
2023-08-29 20:58:28 +03:00
|
|
|
if (client->opt == NBD_OPT_SET_META_CONTEXT &&
|
|
|
|
client->mode < NBD_MODE_STRUCTURED) {
|
2018-03-12 18:21:21 +03:00
|
|
|
return nbd_opt_invalid(client, errp,
|
|
|
|
"request option '%s' when structured reply "
|
|
|
|
"is not negotiated",
|
|
|
|
nbd_opt_lookup(client->opt));
|
|
|
|
}
|
|
|
|
|
|
|
|
if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
|
|
|
|
/* Only change the caller's meta on SET. */
|
|
|
|
meta = &local_meta;
|
2023-09-25 22:22:40 +03:00
|
|
|
} else {
|
|
|
|
meta = &client->contexts;
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
2020-10-27 08:05:52 +03:00
|
|
|
g_free(meta->bitmaps);
|
2018-03-12 18:21:21 +03:00
|
|
|
memset(meta, 0, sizeof(*meta));
|
|
|
|
|
2019-11-14 05:46:32 +03:00
|
|
|
ret = nbd_opt_read_name(client, &export_name, NULL, errp);
|
2018-03-12 18:21:21 +03:00
|
|
|
if (ret <= 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-06-09 18:17:54 +03:00
|
|
|
meta->exp = nbd_export_find(export_name);
|
|
|
|
if (meta->exp == NULL) {
|
nbd/server: Avoid long error message assertions CVE-2020-10761
Ever since commit 36683283 (v2.8), the server code asserts that error
strings sent to the client are well-formed per the protocol by not
exceeding the maximum string length of 4096. At the time the server
first started sending error messages, the assertion could not be
triggered, because messages were completely under our control.
However, over the years, we have added latent scenarios where a client
could trigger the server to attempt an error message that would
include the client's information if it passed other checks first:
- requesting NBD_OPT_INFO/GO on an export name that is not present
(commit 0cfae925 in v2.12 echoes the name)
- requesting NBD_OPT_LIST/SET_META_CONTEXT on an export name that is
not present (commit e7b1948d in v2.12 echoes the name)
At the time, those were still safe because we flagged names larger
than 256 bytes with a different message; but that changed in commit
93676c88 (v4.2) when we raised the name limit to 4096 to match the NBD
string limit. (That commit also failed to change the magic number
4096 in nbd_negotiate_send_rep_err to the just-introduced named
constant.) So with that commit, long client names appended to server
text can now trigger the assertion, and thus be used as a denial of
service attack against a server. As a mitigating factor, if the
server requires TLS, the client cannot trigger the problematic paths
unless it first supplies TLS credentials, and such trusted clients are
less likely to try to intentionally crash the server.
We may later want to further sanitize the user-supplied strings we
place into our error messages, such as scrubbing out control
characters, but that is less important to the CVE fix, so it can be a
later patch to the new nbd_sanitize_name.
Consideration was given to changing the assertion in
nbd_negotiate_send_rep_verr to instead merely log a server error and
truncate the message, to avoid leaving a latent path that could
trigger a future CVE DoS on any new error message. However, this
merely complicates the code for something that is already (correctly)
flagging coding errors, and now that we are aware of the long message
pitfall, we are less likely to introduce such errors in the future,
which would make such error handling dead code.
Reported-by: Xueqiang Wei <xuwei@redhat.com>
CC: qemu-stable@nongnu.org
Fixes: https://bugzilla.redhat.com/1843684 CVE-2020-10761
Fixes: 93676c88d7
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200610163741.3745251-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-06-08 21:26:37 +03:00
|
|
|
g_autofree char *sane_name = nbd_sanitize_name(export_name);
|
|
|
|
|
2018-03-12 18:21:21 +03:00
|
|
|
return nbd_opt_drop(client, NBD_REP_ERR_UNKNOWN, errp,
|
nbd/server: Avoid long error message assertions CVE-2020-10761
Ever since commit 36683283 (v2.8), the server code asserts that error
strings sent to the client are well-formed per the protocol by not
exceeding the maximum string length of 4096. At the time the server
first started sending error messages, the assertion could not be
triggered, because messages were completely under our control.
However, over the years, we have added latent scenarios where a client
could trigger the server to attempt an error message that would
include the client's information if it passed other checks first:
- requesting NBD_OPT_INFO/GO on an export name that is not present
(commit 0cfae925 in v2.12 echoes the name)
- requesting NBD_OPT_LIST/SET_META_CONTEXT on an export name that is
not present (commit e7b1948d in v2.12 echoes the name)
At the time, those were still safe because we flagged names larger
than 256 bytes with a different message; but that changed in commit
93676c88 (v4.2) when we raised the name limit to 4096 to match the NBD
string limit. (That commit also failed to change the magic number
4096 in nbd_negotiate_send_rep_err to the just-introduced named
constant.) So with that commit, long client names appended to server
text can now trigger the assertion, and thus be used as a denial of
service attack against a server. As a mitigating factor, if the
server requires TLS, the client cannot trigger the problematic paths
unless it first supplies TLS credentials, and such trusted clients are
less likely to try to intentionally crash the server.
We may later want to further sanitize the user-supplied strings we
place into our error messages, such as scrubbing out control
characters, but that is less important to the CVE fix, so it can be a
later patch to the new nbd_sanitize_name.
Consideration was given to changing the assertion in
nbd_negotiate_send_rep_verr to instead merely log a server error and
truncate the message, to avoid leaving a latent path that could
trigger a future CVE DoS on any new error message. However, this
merely complicates the code for something that is already (correctly)
flagging coding errors, and now that we are aware of the long message
pitfall, we are less likely to introduce such errors in the future,
which would make such error handling dead code.
Reported-by: Xueqiang Wei <xuwei@redhat.com>
CC: qemu-stable@nongnu.org
Fixes: https://bugzilla.redhat.com/1843684 CVE-2020-10761
Fixes: 93676c88d7
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200610163741.3745251-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-06-08 21:26:37 +03:00
|
|
|
"export '%s' not present", sane_name);
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
2020-10-27 08:05:52 +03:00
|
|
|
meta->bitmaps = g_new0(bool, meta->exp->nr_export_bitmaps);
|
|
|
|
if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
|
|
|
|
bitmaps = meta->bitmaps;
|
|
|
|
}
|
2018-03-12 18:21:21 +03:00
|
|
|
|
2020-09-30 15:11:02 +03:00
|
|
|
ret = nbd_opt_read(client, &nb_queries, sizeof(nb_queries), false, errp);
|
2018-03-12 18:21:21 +03:00
|
|
|
if (ret <= 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
2018-09-27 19:42:00 +03:00
|
|
|
nb_queries = cpu_to_be32(nb_queries);
|
2018-03-30 16:09:50 +03:00
|
|
|
trace_nbd_negotiate_meta_context(nbd_opt_lookup(client->opt),
|
2018-06-09 18:17:54 +03:00
|
|
|
export_name, nb_queries);
|
2018-03-12 18:21:21 +03:00
|
|
|
|
|
|
|
if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
|
|
|
|
/* enable all known contexts */
|
|
|
|
meta->base_allocation = true;
|
2020-10-27 08:05:54 +03:00
|
|
|
meta->allocation_depth = meta->exp->allocation_depth;
|
2021-11-16 01:39:43 +03:00
|
|
|
if (meta->exp->nr_export_bitmaps) {
|
|
|
|
memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
|
|
|
|
}
|
2018-03-12 18:21:21 +03:00
|
|
|
} else {
|
|
|
|
for (i = 0; i < nb_queries; ++i) {
|
|
|
|
ret = nbd_negotiate_meta_query(client, meta, errp);
|
|
|
|
if (ret <= 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (meta->base_allocation) {
|
|
|
|
ret = nbd_negotiate_send_meta_context(client, "base:allocation",
|
|
|
|
NBD_META_ID_BASE_ALLOCATION,
|
|
|
|
errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
2020-10-27 08:05:51 +03:00
|
|
|
count++;
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
2020-10-27 08:05:54 +03:00
|
|
|
if (meta->allocation_depth) {
|
|
|
|
ret = nbd_negotiate_send_meta_context(client, "qemu:allocation-depth",
|
|
|
|
NBD_META_ID_ALLOCATION_DEPTH,
|
|
|
|
errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
count++;
|
|
|
|
}
|
|
|
|
|
2020-10-27 08:05:52 +03:00
|
|
|
for (i = 0; i < meta->exp->nr_export_bitmaps; i++) {
|
|
|
|
const char *bm_name;
|
|
|
|
g_autofree char *context = NULL;
|
|
|
|
|
|
|
|
if (!meta->bitmaps[i]) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
bm_name = bdrv_dirty_bitmap_name(meta->exp->export_bitmaps[i]);
|
|
|
|
context = g_strdup_printf("qemu:dirty-bitmap:%s", bm_name);
|
2020-10-27 08:05:50 +03:00
|
|
|
|
|
|
|
ret = nbd_negotiate_send_meta_context(client, context,
|
2020-10-27 08:05:52 +03:00
|
|
|
NBD_META_ID_DIRTY_BITMAP + i,
|
2018-06-09 18:17:56 +03:00
|
|
|
errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
2020-10-27 08:05:51 +03:00
|
|
|
count++;
|
2018-06-09 18:17:56 +03:00
|
|
|
}
|
|
|
|
|
2018-03-12 18:21:21 +03:00
|
|
|
ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
|
|
|
|
if (ret == 0) {
|
2020-10-27 08:05:51 +03:00
|
|
|
meta->count = count;
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-07-07 18:29:09 +03:00
|
|
|
/* nbd_negotiate_options
|
2017-07-07 23:30:46 +03:00
|
|
|
* Process all NBD_OPT_* client option commands, during fixed newstyle
|
|
|
|
* negotiation.
|
2017-07-07 18:29:09 +03:00
|
|
|
* Return:
|
2017-07-07 18:29:11 +03:00
|
|
|
* -errno on error, errp is set
|
|
|
|
* 0 on successful negotiation, errp is not set
|
|
|
|
* 1 if client sent NBD_OPT_ABORT, i.e. on valid disconnect,
|
|
|
|
* errp is not set
|
2017-07-07 18:29:09 +03:00
|
|
|
*/
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
static int nbd_negotiate_options(NBDClient *client, Error **errp)
|
2014-06-07 04:32:31 +04:00
|
|
|
{
|
2015-02-25 21:08:31 +03:00
|
|
|
uint32_t flags;
|
2016-02-10 21:41:06 +03:00
|
|
|
bool fixedNewstyle = false;
|
2017-07-07 23:30:45 +03:00
|
|
|
bool no_zeroes = false;
|
2015-02-25 21:08:31 +03:00
|
|
|
|
|
|
|
/* Client sends:
|
|
|
|
[ 0 .. 3] client flags
|
|
|
|
|
2017-07-07 23:30:46 +03:00
|
|
|
Then we loop until NBD_OPT_EXPORT_NAME or NBD_OPT_GO:
|
2015-02-25 21:08:31 +03:00
|
|
|
[ 0 .. 7] NBD_OPTS_MAGIC
|
|
|
|
[ 8 .. 11] NBD option
|
|
|
|
[12 .. 15] Data length
|
|
|
|
... Rest of request
|
|
|
|
|
|
|
|
[ 0 .. 7] NBD_OPTS_MAGIC
|
|
|
|
[ 8 .. 11] Second NBD option
|
|
|
|
[12 .. 15] Data length
|
|
|
|
... Rest of request
|
|
|
|
*/
|
|
|
|
|
2019-01-28 19:58:30 +03:00
|
|
|
if (nbd_read32(client->ioc, &flags, "flags", errp) < 0) {
|
2015-02-25 21:08:31 +03:00
|
|
|
return -EIO;
|
|
|
|
}
|
2023-08-29 20:58:28 +03:00
|
|
|
client->mode = NBD_MODE_EXPORT_NAME;
|
2017-07-07 23:30:44 +03:00
|
|
|
trace_nbd_negotiate_options_flags(flags);
|
2016-02-10 21:41:06 +03:00
|
|
|
if (flags & NBD_FLAG_C_FIXED_NEWSTYLE) {
|
|
|
|
fixedNewstyle = true;
|
|
|
|
flags &= ~NBD_FLAG_C_FIXED_NEWSTYLE;
|
2023-08-29 20:58:28 +03:00
|
|
|
client->mode = NBD_MODE_SIMPLE;
|
2016-02-10 21:41:06 +03:00
|
|
|
}
|
2016-10-14 21:33:14 +03:00
|
|
|
if (flags & NBD_FLAG_C_NO_ZEROES) {
|
2017-07-07 23:30:45 +03:00
|
|
|
no_zeroes = true;
|
2016-10-14 21:33:14 +03:00
|
|
|
flags &= ~NBD_FLAG_C_NO_ZEROES;
|
|
|
|
}
|
2016-02-10 21:41:06 +03:00
|
|
|
if (flags != 0) {
|
2017-07-07 18:29:11 +03:00
|
|
|
error_setg(errp, "Unknown client flags 0x%" PRIx32 " received", flags);
|
2017-07-07 23:30:44 +03:00
|
|
|
return -EINVAL;
|
2015-02-25 21:08:31 +03:00
|
|
|
}
|
|
|
|
|
2014-06-07 04:32:31 +04:00
|
|
|
while (1) {
|
2015-02-25 21:08:31 +03:00
|
|
|
int ret;
|
2017-07-07 18:29:16 +03:00
|
|
|
uint32_t option, length;
|
2014-06-07 04:32:31 +04:00
|
|
|
uint64_t magic;
|
|
|
|
|
2019-01-28 19:58:30 +03:00
|
|
|
if (nbd_read64(client->ioc, &magic, "opts magic", errp) < 0) {
|
2014-06-07 04:32:31 +04:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_negotiate_options_check_magic(magic);
|
|
|
|
if (magic != NBD_OPTS_MAGIC) {
|
2017-07-07 18:29:11 +03:00
|
|
|
error_setg(errp, "Bad magic received");
|
2014-06-07 04:32:31 +04:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2019-01-28 19:58:30 +03:00
|
|
|
if (nbd_read32(client->ioc, &option, "option", errp) < 0) {
|
2014-06-07 04:32:31 +04:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
2018-01-11 02:08:21 +03:00
|
|
|
client->opt = option;
|
2014-06-07 04:32:31 +04:00
|
|
|
|
2019-01-28 19:58:30 +03:00
|
|
|
if (nbd_read32(client->ioc, &length, "option length", errp) < 0) {
|
2014-06-07 04:32:31 +04:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
2018-01-11 02:08:24 +03:00
|
|
|
assert(!client->optlen);
|
2018-01-11 02:08:21 +03:00
|
|
|
client->optlen = length;
|
2014-06-07 04:32:31 +04:00
|
|
|
|
nbd/server: CVE-2017-15119 Reject options larger than 32M
The NBD spec gives us permission to abruptly disconnect on clients
that send outrageously large option requests, rather than having
to spend the time reading to the end of the option. No real
option request requires that much data anyways; and meanwhile, we
already have the practice of abruptly dropping the connection on
any client that sends NBD_CMD_WRITE with a payload larger than 32M.
For comparison, nbdkit drops the connection on any request with
more than 4096 bytes; however, that limit is probably too low
(as the NBD spec states an export name can theoretically be up
to 4096 bytes, which means a valid NBD_OPT_INFO could be even
longer) - even if qemu doesn't permit exports longer than 256
bytes.
It could be argued that a malicious client trying to get us to
read nearly 4G of data on a bad request is a form of denial of
service. In particular, if the server requires TLS, but a client
that does not know the TLS credentials sends any option (other
than NBD_OPT_STARTTLS or NBD_OPT_EXPORT_NAME) with a stated
payload of nearly 4G, then the server was keeping the connection
alive trying to read all the payload, tying up resources that it
would rather be spending on a client that can get past the TLS
handshake. Hence, this warranted a CVE.
Present since at least 2.5 when handling known options, and made
worse in 2.6 when fixing support for NBD_FLAG_C_FIXED_NEWSTYLE
to handle unknown options.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-11-23 01:25:16 +03:00
|
|
|
if (length > NBD_MAX_BUFFER_SIZE) {
|
2023-08-29 20:58:31 +03:00
|
|
|
error_setg(errp, "len (%" PRIu32 ") is larger than max len (%u)",
|
nbd/server: CVE-2017-15119 Reject options larger than 32M
The NBD spec gives us permission to abruptly disconnect on clients
that send outrageously large option requests, rather than having
to spend the time reading to the end of the option. No real
option request requires that much data anyways; and meanwhile, we
already have the practice of abruptly dropping the connection on
any client that sends NBD_CMD_WRITE with a payload larger than 32M.
For comparison, nbdkit drops the connection on any request with
more than 4096 bytes; however, that limit is probably too low
(as the NBD spec states an export name can theoretically be up
to 4096 bytes, which means a valid NBD_OPT_INFO could be even
longer) - even if qemu doesn't permit exports longer than 256
bytes.
It could be argued that a malicious client trying to get us to
read nearly 4G of data on a bad request is a form of denial of
service. In particular, if the server requires TLS, but a client
that does not know the TLS credentials sends any option (other
than NBD_OPT_STARTTLS or NBD_OPT_EXPORT_NAME) with a stated
payload of nearly 4G, then the server was keeping the connection
alive trying to read all the payload, tying up resources that it
would rather be spending on a client that can get past the TLS
handshake. Hence, this warranted a CVE.
Present since at least 2.5 when handling known options, and made
worse in 2.6 when fixing support for NBD_FLAG_C_FIXED_NEWSTYLE
to handle unknown options.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-11-23 01:25:16 +03:00
|
|
|
length, NBD_MAX_BUFFER_SIZE);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2017-07-07 23:30:43 +03:00
|
|
|
trace_nbd_negotiate_options_check_option(option,
|
|
|
|
nbd_opt_lookup(option));
|
2016-02-10 21:41:11 +03:00
|
|
|
if (client->tlscreds &&
|
|
|
|
client->ioc == (QIOChannel *)client->sioc) {
|
|
|
|
QIOChannel *tioc;
|
|
|
|
if (!fixedNewstyle) {
|
2017-07-07 18:29:16 +03:00
|
|
|
error_setg(errp, "Unsupported option 0x%" PRIx32, option);
|
2016-02-10 21:41:11 +03:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
2017-07-07 18:29:16 +03:00
|
|
|
switch (option) {
|
2016-02-10 21:41:11 +03:00
|
|
|
case NBD_OPT_STARTTLS:
|
2017-10-27 13:40:31 +03:00
|
|
|
if (length) {
|
|
|
|
/* Unconditionally drop the connection if the client
|
|
|
|
* can't start a TLS negotiation correctly */
|
2018-01-11 02:08:21 +03:00
|
|
|
return nbd_reject_length(client, true, errp);
|
2017-10-27 13:40:31 +03:00
|
|
|
}
|
|
|
|
tioc = nbd_negotiate_handle_starttls(client, errp);
|
2016-02-10 21:41:11 +03:00
|
|
|
if (!tioc) {
|
|
|
|
return -EIO;
|
|
|
|
}
|
2017-10-27 13:40:30 +03:00
|
|
|
ret = 0;
|
2016-02-10 21:41:11 +03:00
|
|
|
object_unref(OBJECT(client->ioc));
|
2023-06-01 12:34:52 +03:00
|
|
|
client->ioc = tioc;
|
2016-02-10 21:41:11 +03:00
|
|
|
break;
|
|
|
|
|
nbd: Don't kill server on client that doesn't request TLS
Upstream NBD documents (as of commit 4feebc95) that servers MAY
choose to operate in a conditional mode, where it is up to the
client whether to use TLS. For qemu's case, we want to always be
in FORCEDTLS mode, because of the risk of man-in-the-middle
attacks, and since we never export more than one device; likewise,
the qemu client will ALWAYS send NBD_OPT_STARTTLS as its first
option. But now that SELECTIVETLS servers exist, it is feasible
to encounter a (non-qemu) client that is programmed to talk to
such a server, and does not do NBD_OPT_STARTTLS first, but rather
wants to probe if it can use a non-encrypted export.
The NBD protocol documents that we should let such a client
continue trying, on the grounds that maybe the client will get the
hint to send NBD_OPT_STARTTLS, rather than immediately dropping
the connection.
Note that NBD_OPT_EXPORT_NAME is a special case: since it is the
only option request that can't have an error return, we have to
(continue to) drop the connection on that one; rather, what we are
fixing here is that all other replies prior to TLS initiation tell
the client NBD_REP_ERR_TLS_REQD, but keep the connection alive.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1460671343-18485-1-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 01:02:23 +03:00
|
|
|
case NBD_OPT_EXPORT_NAME:
|
|
|
|
/* No way to return an error to client, so drop connection */
|
2017-07-07 18:29:11 +03:00
|
|
|
error_setg(errp, "Option 0x%x not permitted before TLS",
|
2017-07-07 18:29:16 +03:00
|
|
|
option);
|
nbd: Don't kill server on client that doesn't request TLS
Upstream NBD documents (as of commit 4feebc95) that servers MAY
choose to operate in a conditional mode, where it is up to the
client whether to use TLS. For qemu's case, we want to always be
in FORCEDTLS mode, because of the risk of man-in-the-middle
attacks, and since we never export more than one device; likewise,
the qemu client will ALWAYS send NBD_OPT_STARTTLS as its first
option. But now that SELECTIVETLS servers exist, it is feasible
to encounter a (non-qemu) client that is programmed to talk to
such a server, and does not do NBD_OPT_STARTTLS first, but rather
wants to probe if it can use a non-encrypted export.
The NBD protocol documents that we should let such a client
continue trying, on the grounds that maybe the client will get the
hint to send NBD_OPT_STARTTLS, rather than immediately dropping
the connection.
Note that NBD_OPT_EXPORT_NAME is a special case: since it is the
only option request that can't have an error return, we have to
(continue to) drop the connection on that one; rather, what we are
fixing here is that all other replies prior to TLS initiation tell
the client NBD_REP_ERR_TLS_REQD, but keep the connection alive.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1460671343-18485-1-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 01:02:23 +03:00
|
|
|
return -EINVAL;
|
|
|
|
|
2016-02-10 21:41:11 +03:00
|
|
|
default:
|
2018-11-18 01:32:21 +03:00
|
|
|
/* Let the client keep trying, unless they asked to
|
|
|
|
* quit. Always try to give an error back to the
|
|
|
|
* client; but when replying to OPT_ABORT, be aware
|
|
|
|
* that the client may hang up before receiving the
|
|
|
|
* error, in which case we are fine ignoring the
|
|
|
|
* resulting EPIPE. */
|
|
|
|
ret = nbd_opt_drop(client, NBD_REP_ERR_TLS_REQD,
|
|
|
|
option == NBD_OPT_ABORT ? NULL : errp,
|
2018-01-11 02:08:24 +03:00
|
|
|
"Option 0x%" PRIx32
|
2018-11-16 18:53:20 +03:00
|
|
|
" not permitted before TLS", option);
|
2017-07-07 18:29:16 +03:00
|
|
|
if (option == NBD_OPT_ABORT) {
|
2017-07-07 18:29:09 +03:00
|
|
|
return 1;
|
2016-10-14 21:33:16 +03:00
|
|
|
}
|
nbd: Don't kill server on client that doesn't request TLS
Upstream NBD documents (as of commit 4feebc95) that servers MAY
choose to operate in a conditional mode, where it is up to the
client whether to use TLS. For qemu's case, we want to always be
in FORCEDTLS mode, because of the risk of man-in-the-middle
attacks, and since we never export more than one device; likewise,
the qemu client will ALWAYS send NBD_OPT_STARTTLS as its first
option. But now that SELECTIVETLS servers exist, it is feasible
to encounter a (non-qemu) client that is programmed to talk to
such a server, and does not do NBD_OPT_STARTTLS first, but rather
wants to probe if it can use a non-encrypted export.
The NBD protocol documents that we should let such a client
continue trying, on the grounds that maybe the client will get the
hint to send NBD_OPT_STARTTLS, rather than immediately dropping
the connection.
Note that NBD_OPT_EXPORT_NAME is a special case: since it is the
only option request that can't have an error return, we have to
(continue to) drop the connection on that one; rather, what we are
fixing here is that all other replies prior to TLS initiation tell
the client NBD_REP_ERR_TLS_REQD, but keep the connection alive.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1460671343-18485-1-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 01:02:23 +03:00
|
|
|
break;
|
2016-02-10 21:41:11 +03:00
|
|
|
}
|
|
|
|
} else if (fixedNewstyle) {
|
2017-07-07 18:29:16 +03:00
|
|
|
switch (option) {
|
2016-02-10 21:41:06 +03:00
|
|
|
case NBD_OPT_LIST:
|
2017-10-27 13:40:31 +03:00
|
|
|
if (length) {
|
2018-01-11 02:08:21 +03:00
|
|
|
ret = nbd_reject_length(client, false, errp);
|
2017-10-27 13:40:31 +03:00
|
|
|
} else {
|
|
|
|
ret = nbd_negotiate_handle_list(client, errp);
|
|
|
|
}
|
2016-02-10 21:41:06 +03:00
|
|
|
break;
|
|
|
|
|
|
|
|
case NBD_OPT_ABORT:
|
2016-10-14 21:33:16 +03:00
|
|
|
/* NBD spec says we must try to reply before
|
|
|
|
* disconnecting, but that we must also tolerate
|
|
|
|
* guests that don't wait for our reply. */
|
2018-01-11 02:08:21 +03:00
|
|
|
nbd_negotiate_send_rep(client, NBD_REP_ACK, NULL);
|
2017-07-07 18:29:09 +03:00
|
|
|
return 1;
|
2016-02-10 21:41:06 +03:00
|
|
|
|
|
|
|
case NBD_OPT_EXPORT_NAME:
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
return nbd_negotiate_handle_export_name(client, no_zeroes,
|
2017-07-07 23:30:45 +03:00
|
|
|
errp);
|
2016-02-10 21:41:06 +03:00
|
|
|
|
2017-07-07 23:30:46 +03:00
|
|
|
case NBD_OPT_INFO:
|
|
|
|
case NBD_OPT_GO:
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
ret = nbd_negotiate_handle_info(client, errp);
|
2017-07-07 23:30:46 +03:00
|
|
|
if (ret == 1) {
|
|
|
|
assert(option == NBD_OPT_GO);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
2016-02-10 21:41:11 +03:00
|
|
|
case NBD_OPT_STARTTLS:
|
2017-10-27 13:40:31 +03:00
|
|
|
if (length) {
|
2018-01-11 02:08:21 +03:00
|
|
|
ret = nbd_reject_length(client, false, errp);
|
2017-10-27 13:40:31 +03:00
|
|
|
} else if (client->tlscreds) {
|
2018-01-11 02:08:21 +03:00
|
|
|
ret = nbd_negotiate_send_rep_err(client,
|
|
|
|
NBD_REP_ERR_INVALID, errp,
|
2016-10-14 21:33:09 +03:00
|
|
|
"TLS already enabled");
|
2016-02-10 21:41:11 +03:00
|
|
|
} else {
|
2018-01-11 02:08:21 +03:00
|
|
|
ret = nbd_negotiate_send_rep_err(client,
|
|
|
|
NBD_REP_ERR_POLICY, errp,
|
2016-10-14 21:33:09 +03:00
|
|
|
"TLS not configured");
|
2016-05-12 01:39:36 +03:00
|
|
|
}
|
nbd: Don't kill server on client that doesn't request TLS
Upstream NBD documents (as of commit 4feebc95) that servers MAY
choose to operate in a conditional mode, where it is up to the
client whether to use TLS. For qemu's case, we want to always be
in FORCEDTLS mode, because of the risk of man-in-the-middle
attacks, and since we never export more than one device; likewise,
the qemu client will ALWAYS send NBD_OPT_STARTTLS as its first
option. But now that SELECTIVETLS servers exist, it is feasible
to encounter a (non-qemu) client that is programmed to talk to
such a server, and does not do NBD_OPT_STARTTLS first, but rather
wants to probe if it can use a non-encrypted export.
The NBD protocol documents that we should let such a client
continue trying, on the grounds that maybe the client will get the
hint to send NBD_OPT_STARTTLS, rather than immediately dropping
the connection.
Note that NBD_OPT_EXPORT_NAME is a special case: since it is the
only option request that can't have an error return, we have to
(continue to) drop the connection on that one; rather, what we are
fixing here is that all other replies prior to TLS initiation tell
the client NBD_REP_ERR_TLS_REQD, but keep the connection alive.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1460671343-18485-1-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 01:02:23 +03:00
|
|
|
break;
|
2017-10-27 13:40:32 +03:00
|
|
|
|
|
|
|
case NBD_OPT_STRUCTURED_REPLY:
|
|
|
|
if (length) {
|
2018-01-11 02:08:21 +03:00
|
|
|
ret = nbd_reject_length(client, false, errp);
|
2023-09-25 22:22:35 +03:00
|
|
|
} else if (client->mode >= NBD_MODE_EXTENDED) {
|
|
|
|
ret = nbd_negotiate_send_rep_err(
|
|
|
|
client, NBD_REP_ERR_EXT_HEADER_REQD, errp,
|
|
|
|
"extended headers already negotiated");
|
2023-08-29 20:58:28 +03:00
|
|
|
} else if (client->mode >= NBD_MODE_STRUCTURED) {
|
2017-10-27 13:40:32 +03:00
|
|
|
ret = nbd_negotiate_send_rep_err(
|
2018-01-11 02:08:21 +03:00
|
|
|
client, NBD_REP_ERR_INVALID, errp,
|
2017-10-27 13:40:32 +03:00
|
|
|
"structured reply already negotiated");
|
|
|
|
} else {
|
2018-01-11 02:08:21 +03:00
|
|
|
ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
|
2023-08-29 20:58:28 +03:00
|
|
|
client->mode = NBD_MODE_STRUCTURED;
|
2017-10-27 13:40:32 +03:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
2018-03-12 18:21:21 +03:00
|
|
|
case NBD_OPT_LIST_META_CONTEXT:
|
|
|
|
case NBD_OPT_SET_META_CONTEXT:
|
2023-09-25 22:22:40 +03:00
|
|
|
ret = nbd_negotiate_meta_queries(client, errp);
|
2018-03-12 18:21:21 +03:00
|
|
|
break;
|
|
|
|
|
2023-09-25 22:22:35 +03:00
|
|
|
case NBD_OPT_EXTENDED_HEADERS:
|
|
|
|
if (length) {
|
|
|
|
ret = nbd_reject_length(client, false, errp);
|
|
|
|
} else if (client->mode >= NBD_MODE_EXTENDED) {
|
|
|
|
ret = nbd_negotiate_send_rep_err(
|
|
|
|
client, NBD_REP_ERR_INVALID, errp,
|
|
|
|
"extended headers already negotiated");
|
|
|
|
} else {
|
|
|
|
ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
|
|
|
|
client->mode = NBD_MODE_EXTENDED;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
2016-02-10 21:41:06 +03:00
|
|
|
default:
|
2018-01-11 02:08:24 +03:00
|
|
|
ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
|
2018-02-15 16:51:43 +03:00
|
|
|
"Unsupported option %" PRIu32 " (%s)",
|
2018-01-11 02:08:24 +03:00
|
|
|
option, nbd_opt_lookup(option));
|
2016-04-07 01:48:38 +03:00
|
|
|
break;
|
2016-02-10 21:41:06 +03:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* If broken new-style we should drop the connection
|
|
|
|
* for anything except NBD_OPT_EXPORT_NAME
|
|
|
|
*/
|
2017-07-07 18:29:16 +03:00
|
|
|
switch (option) {
|
2016-02-10 21:41:06 +03:00
|
|
|
case NBD_OPT_EXPORT_NAME:
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
return nbd_negotiate_handle_export_name(client, no_zeroes,
|
2017-07-07 23:30:45 +03:00
|
|
|
errp);
|
2016-02-10 21:41:06 +03:00
|
|
|
|
|
|
|
default:
|
2018-02-15 16:51:43 +03:00
|
|
|
error_setg(errp, "Unsupported option %" PRIu32 " (%s)",
|
2017-07-07 23:30:43 +03:00
|
|
|
option, nbd_opt_lookup(option));
|
2016-02-10 21:41:06 +03:00
|
|
|
return -EINVAL;
|
2014-06-07 04:32:32 +04:00
|
|
|
}
|
2014-06-07 04:32:31 +04:00
|
|
|
}
|
2017-10-27 13:40:30 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
2014-06-07 04:32:31 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-07-07 18:29:09 +03:00
|
|
|
/* nbd_negotiate
|
|
|
|
* Return:
|
2017-07-07 18:29:11 +03:00
|
|
|
* -errno on error, errp is set
|
|
|
|
* 0 on successful negotiation, errp is not set
|
|
|
|
* 1 if client sent NBD_OPT_ABORT, i.e. on valid disconnect,
|
|
|
|
* errp is not set
|
2017-07-07 18:29:09 +03:00
|
|
|
*/
|
2017-07-07 18:29:11 +03:00
|
|
|
static coroutine_fn int nbd_negotiate(NBDClient *client, Error **errp)
|
2008-05-28 01:13:40 +04:00
|
|
|
{
|
nbd: Use ERRP_GUARD()
If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
&error_fatal, this means that we don't break error_abort
(we'll abort on error_set, not on error_propagate)
If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call). Fix several such cases, e.g. in nbd_read().
This commit is generated by command
sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \
MAINTAINERS | \
xargs git ls-files | grep '\.[hc]$' | \
xargs spatch \
--sp-file scripts/coccinelle/errp-guard.cocci \
--macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-8-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci. Commit message
tweaked again.]
2020-07-07 19:50:36 +03:00
|
|
|
ERRP_GUARD();
|
2017-07-17 22:26:35 +03:00
|
|
|
char buf[NBD_OLDSTYLE_NEGOTIATE_SIZE] = "";
|
2017-06-02 18:01:49 +03:00
|
|
|
int ret;
|
2011-02-22 18:44:51 +03:00
|
|
|
|
2017-07-17 22:26:35 +03:00
|
|
|
/* Old style negotiation header, no room for options
|
2012-08-23 16:57:11 +04:00
|
|
|
[ 0 .. 7] passwd ("NBDMAGIC")
|
|
|
|
[ 8 .. 15] magic (NBD_CLIENT_MAGIC)
|
2011-02-22 18:44:51 +03:00
|
|
|
[16 .. 23] size
|
2017-07-17 22:26:35 +03:00
|
|
|
[24 .. 27] export flags (zero-extended)
|
2012-08-23 16:57:11 +04:00
|
|
|
[28 .. 151] reserved (0)
|
|
|
|
|
2017-07-17 22:26:35 +03:00
|
|
|
New style negotiation header, client can send options
|
2012-08-23 16:57:11 +04:00
|
|
|
[ 0 .. 7] passwd ("NBDMAGIC")
|
|
|
|
[ 8 .. 15] magic (NBD_OPTS_MAGIC)
|
|
|
|
[16 .. 17] server flags (0)
|
2017-07-07 23:30:46 +03:00
|
|
|
....options sent, ending in NBD_OPT_EXPORT_NAME or NBD_OPT_GO....
|
2011-02-22 18:44:51 +03:00
|
|
|
*/
|
|
|
|
|
2016-02-10 21:41:04 +03:00
|
|
|
qio_channel_set_blocking(client->ioc, false, NULL);
|
2023-08-31 01:48:02 +03:00
|
|
|
qio_channel_set_follow_coroutine_ctx(client->ioc, true);
|
2012-03-05 11:56:10 +04:00
|
|
|
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_negotiate_begin();
|
2011-02-22 18:44:51 +03:00
|
|
|
memcpy(buf, "NBDMAGIC", 8);
|
2016-02-10 21:41:11 +03:00
|
|
|
|
2018-10-03 20:02:28 +03:00
|
|
|
stq_be_p(buf + 8, NBD_OPTS_MAGIC);
|
|
|
|
stw_be_p(buf + 16, NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES);
|
2011-02-22 18:44:51 +03:00
|
|
|
|
2018-10-03 20:02:28 +03:00
|
|
|
if (nbd_write(client->ioc, buf, 18, errp) < 0) {
|
|
|
|
error_prepend(errp, "write failed: ");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
ret = nbd_negotiate_options(client, errp);
|
2018-10-03 20:02:28 +03:00
|
|
|
if (ret != 0) {
|
|
|
|
if (ret < 0) {
|
|
|
|
error_prepend(errp, "option negotiation failed: ");
|
2012-08-23 16:57:11 +04:00
|
|
|
}
|
2018-10-03 20:02:28 +03:00
|
|
|
return ret;
|
2011-02-22 18:44:51 +03:00
|
|
|
}
|
|
|
|
|
2018-01-11 02:08:21 +03:00
|
|
|
assert(!client->optlen);
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_negotiate_success();
|
2017-06-02 18:01:48 +03:00
|
|
|
|
|
|
|
return 0;
|
2008-05-28 01:13:40 +04:00
|
|
|
}
|
|
|
|
|
2020-12-14 20:05:18 +03:00
|
|
|
/* nbd_read_eof
|
|
|
|
* Tries to read @size bytes from @ioc. This is a local implementation of
|
|
|
|
* qio_channel_readv_all_eof. We have it here because we need it to be
|
|
|
|
* interruptible and to know when the coroutine is yielding.
|
|
|
|
* Returns 1 on success
|
|
|
|
* 0 on eof, when no data was read (errp is not set)
|
|
|
|
* negative errno on failure (errp is set)
|
|
|
|
*/
|
|
|
|
static inline int coroutine_fn
|
|
|
|
nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
|
|
|
|
{
|
|
|
|
bool partial = false;
|
|
|
|
|
|
|
|
assert(size);
|
|
|
|
while (size > 0) {
|
|
|
|
struct iovec iov = { .iov_base = buffer, .iov_len = size };
|
|
|
|
ssize_t len;
|
|
|
|
|
|
|
|
len = qio_channel_readv(client->ioc, &iov, 1, errp);
|
|
|
|
if (len == QIO_CHANNEL_ERR_BLOCK) {
|
2023-12-21 22:24:52 +03:00
|
|
|
WITH_QEMU_LOCK_GUARD(&client->lock) {
|
|
|
|
client->read_yielding = true;
|
|
|
|
|
|
|
|
/* Prompt main loop thread to re-run nbd_drained_poll() */
|
|
|
|
aio_wait_kick();
|
|
|
|
}
|
2020-12-14 20:05:18 +03:00
|
|
|
qio_channel_yield(client->ioc, G_IO_IN);
|
2023-12-21 22:24:52 +03:00
|
|
|
WITH_QEMU_LOCK_GUARD(&client->lock) {
|
|
|
|
client->read_yielding = false;
|
|
|
|
if (client->quiescing) {
|
|
|
|
return -EAGAIN;
|
|
|
|
}
|
2020-12-14 20:05:18 +03:00
|
|
|
}
|
|
|
|
continue;
|
|
|
|
} else if (len < 0) {
|
|
|
|
return -EIO;
|
|
|
|
} else if (len == 0) {
|
|
|
|
if (partial) {
|
|
|
|
error_setg(errp,
|
|
|
|
"Unexpected end-of-file before all bytes were read");
|
|
|
|
return -EIO;
|
|
|
|
} else {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
partial = true;
|
|
|
|
size -= len;
|
|
|
|
buffer = (uint8_t *) buffer + len;
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2023-03-09 11:44:51 +03:00
|
|
|
static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *request,
|
|
|
|
Error **errp)
|
2008-07-03 17:41:03 +04:00
|
|
|
{
|
2023-09-25 22:22:32 +03:00
|
|
|
uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
|
|
|
|
uint32_t magic, expect;
|
2017-06-02 18:01:42 +03:00
|
|
|
int ret;
|
2023-09-25 22:22:32 +03:00
|
|
|
size_t size = client->mode >= NBD_MODE_EXTENDED ?
|
|
|
|
NBD_EXTENDED_REQUEST_SIZE : NBD_REQUEST_SIZE;
|
2011-02-22 18:44:51 +03:00
|
|
|
|
2023-09-25 22:22:32 +03:00
|
|
|
ret = nbd_read_eof(client, buf, size, errp);
|
2012-03-05 11:56:10 +04:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
nbd/server: Don't complain on certain client disconnects
When a client disconnects abruptly, but did not have any pending
requests (for example, when using nbdsh without calling h.shutdown),
we used to output the following message:
$ qemu-nbd -f raw file
$ nbdsh -u 'nbd://localhost:10809' -c 'h.trim(1,0)'
qemu-nbd: Disconnect client, due to: Failed to read request: Unexpected end-of-file before all bytes were read
Then in commit f148ae7, we refactored nbd_receive_request() to use
nbd_read_eof(); when this returns 0, we regressed into tracing
uninitialized memory (if tracing is enabled) and reporting a
less-specific:
qemu-nbd: Disconnect client, due to: Request handling failed in intermediate state
Note that with Unix sockets, we have yet another error message,
unchanged by the 6.0 regression:
$ qemu-nbd -k /tmp/sock -f raw file
$ nbdsh -u 'nbd+unix:///?socket=/tmp/sock' -c 'h.trim(1,0)'
qemu-nbd: Disconnect client, due to: Failed to send reply: Unable to write to socket: Broken pipe
But in all cases, the error message goes away if the client performs a
soft shutdown by using NBD_CMD_DISC, rather than a hard shutdown by
abrupt disconnect:
$ nbdsh -u 'nbd://localhost:10809' -c 'h.trim(1,0)' -c 'h.shutdown()'
This patch fixes things to avoid uninitialized memory, and in general
avoids warning about a client that does a hard shutdown when not in
the middle of a packet. A client that aborts mid-request, or which
does not read the full server's reply, can still result in warnings,
but those are indeed much more unusual situations.
CC: qemu-stable@nongnu.org
Fixes: f148ae7d36 ("nbd/server: Quiesce coroutines on context switch", v6.0.0)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: defer unrelated typo fixes to later patch]
Message-Id: <20211117170230.1128262-2-eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-11-17 20:02:29 +03:00
|
|
|
if (ret == 0) {
|
|
|
|
return -EIO;
|
|
|
|
}
|
2012-03-05 11:56:10 +04:00
|
|
|
|
2023-09-25 22:22:32 +03:00
|
|
|
/*
|
|
|
|
* Compact request
|
|
|
|
* [ 0 .. 3] magic (NBD_REQUEST_MAGIC)
|
|
|
|
* [ 4 .. 5] flags (NBD_CMD_FLAG_FUA, ...)
|
|
|
|
* [ 6 .. 7] type (NBD_CMD_READ, ...)
|
|
|
|
* [ 8 .. 15] cookie
|
|
|
|
* [16 .. 23] from
|
|
|
|
* [24 .. 27] len
|
|
|
|
* Extended request
|
|
|
|
* [ 0 .. 3] magic (NBD_EXTENDED_REQUEST_MAGIC)
|
|
|
|
* [ 4 .. 5] flags (NBD_CMD_FLAG_FUA, NBD_CMD_FLAG_PAYLOAD_LEN, ...)
|
|
|
|
* [ 6 .. 7] type (NBD_CMD_READ, ...)
|
|
|
|
* [ 8 .. 15] cookie
|
|
|
|
* [16 .. 23] from
|
|
|
|
* [24 .. 31] len
|
2011-02-22 18:44:51 +03:00
|
|
|
*/
|
|
|
|
|
2016-06-10 18:00:36 +03:00
|
|
|
magic = ldl_be_p(buf);
|
2016-10-14 21:33:04 +03:00
|
|
|
request->flags = lduw_be_p(buf + 4);
|
|
|
|
request->type = lduw_be_p(buf + 6);
|
2023-06-08 16:56:34 +03:00
|
|
|
request->cookie = ldq_be_p(buf + 8);
|
2016-06-10 18:00:36 +03:00
|
|
|
request->from = ldq_be_p(buf + 16);
|
2023-09-25 22:22:32 +03:00
|
|
|
if (client->mode >= NBD_MODE_EXTENDED) {
|
|
|
|
request->len = ldq_be_p(buf + 24);
|
|
|
|
expect = NBD_EXTENDED_REQUEST_MAGIC;
|
|
|
|
} else {
|
|
|
|
request->len = (uint32_t)ldl_be_p(buf + 24); /* widen 32 to 64 bits */
|
|
|
|
expect = NBD_REQUEST_MAGIC;
|
|
|
|
}
|
2011-02-22 18:44:51 +03:00
|
|
|
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_receive_request(magic, request->flags, request->type,
|
|
|
|
request->from, request->len);
|
2011-02-22 18:44:51 +03:00
|
|
|
|
2023-09-25 22:22:32 +03:00
|
|
|
if (magic != expect) {
|
|
|
|
error_setg(errp, "invalid magic (got 0x%" PRIx32 ", expected 0x%"
|
|
|
|
PRIx32 ")", magic, expect);
|
2012-03-05 11:56:10 +04:00
|
|
|
return -EINVAL;
|
2011-02-22 18:44:51 +03:00
|
|
|
}
|
|
|
|
return 0;
|
2008-07-03 17:41:03 +04:00
|
|
|
}
|
|
|
|
|
2011-09-19 17:25:40 +04:00
|
|
|
#define MAX_NBD_REQUESTS 16
|
|
|
|
|
2023-12-21 22:24:51 +03:00
|
|
|
/* Runs in export AioContext and main loop thread */
|
2012-09-18 15:17:52 +04:00
|
|
|
void nbd_client_get(NBDClient *client)
|
2011-09-19 16:33:23 +04:00
|
|
|
{
|
2023-12-21 22:24:51 +03:00
|
|
|
qatomic_inc(&client->refcount);
|
2011-09-19 16:33:23 +04:00
|
|
|
}
|
|
|
|
|
2012-09-18 15:17:52 +04:00
|
|
|
void nbd_client_put(NBDClient *client)
|
2011-09-19 16:33:23 +04:00
|
|
|
{
|
2023-12-21 22:24:51 +03:00
|
|
|
assert(qemu_in_main_thread());
|
|
|
|
|
|
|
|
if (qatomic_fetch_dec(&client->refcount) == 1) {
|
2012-08-22 20:45:12 +04:00
|
|
|
/* The last reference should be dropped by client->close,
|
2015-02-07 00:06:16 +03:00
|
|
|
* which is called by client_close.
|
2012-08-22 20:45:12 +04:00
|
|
|
*/
|
|
|
|
assert(client->closing);
|
|
|
|
|
2016-02-10 21:41:04 +03:00
|
|
|
object_unref(OBJECT(client->sioc));
|
|
|
|
object_unref(OBJECT(client->ioc));
|
2016-02-10 21:41:11 +03:00
|
|
|
if (client->tlscreds) {
|
|
|
|
object_unref(OBJECT(client->tlscreds));
|
|
|
|
}
|
qemu-nbd: add support for authorization of TLS clients
Currently any client which can complete the TLS handshake is able to use
the NBD server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509 certificate.
This means the client will have to acquire a certificate from the CA
before they are permitted to use the NBD server. This is still a fairly
low bar to cross.
This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
takes the ID of a previously added 'QAuthZ' object instance. This will
be used to validate the client's x509 distinguished name. Clients
failing the authorization check will not be permitted to use the NBD
server.
For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name is
CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB
escape the commas in the name and use:
qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
endpoint=server,verify-peer=yes \
--object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\
O=Example Org,,L=London,,ST=London,,C=GB' \
--tls-creds tls0 \
--tls-authz authz0 \
....other qemu-nbd args...
NB: a real shell command line would not have leading whitespace after
the line continuation, it is just included here for clarity.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20190227162035.18543-2-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: split long line in --help text, tweak 233 to show that whitespace
after ,, in identity= portion is actually okay]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-27 19:20:33 +03:00
|
|
|
g_free(client->tlsauthz);
|
2012-08-23 16:57:11 +04:00
|
|
|
if (client->exp) {
|
|
|
|
QTAILQ_REMOVE(&client->exp->clients, client, next);
|
2020-09-24 18:26:59 +03:00
|
|
|
blk_exp_unref(&client->exp->common);
|
2012-08-23 16:57:11 +04:00
|
|
|
}
|
2023-09-25 22:22:40 +03:00
|
|
|
g_free(client->contexts.bitmaps);
|
2023-12-21 22:24:52 +03:00
|
|
|
qemu_mutex_destroy(&client->lock);
|
2011-09-19 16:33:23 +04:00
|
|
|
g_free(client);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-12-21 22:24:51 +03:00
|
|
|
/*
|
|
|
|
* Tries to release the reference to @client, but only if other references
|
|
|
|
* remain. This is an optimization for the common case where we want to avoid
|
|
|
|
* the expense of scheduling nbd_client_put() in the main loop thread.
|
|
|
|
*
|
|
|
|
* Returns true upon success or false if the reference was not released because
|
|
|
|
* it is the last reference.
|
|
|
|
*/
|
|
|
|
static bool nbd_client_put_nonzero(NBDClient *client)
|
|
|
|
{
|
|
|
|
int old = qatomic_read(&client->refcount);
|
|
|
|
int expected;
|
|
|
|
|
|
|
|
do {
|
|
|
|
if (old == 1) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
expected = old;
|
|
|
|
old = qatomic_cmpxchg(&client->refcount, expected, expected - 1);
|
|
|
|
} while (old != expected);
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-09 01:26:17 +03:00
|
|
|
static void client_close(NBDClient *client, bool negotiated)
|
2011-09-19 16:33:23 +04:00
|
|
|
{
|
2023-12-21 22:24:51 +03:00
|
|
|
assert(qemu_in_main_thread());
|
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
WITH_QEMU_LOCK_GUARD(&client->lock) {
|
|
|
|
if (client->closing) {
|
|
|
|
return;
|
|
|
|
}
|
2012-08-22 20:45:12 +04:00
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
client->closing = true;
|
|
|
|
}
|
2012-08-22 20:45:12 +04:00
|
|
|
|
|
|
|
/* Force requests to finish. They will drop their own references,
|
|
|
|
* then we'll close the socket and free the NBDClient.
|
|
|
|
*/
|
2016-02-10 21:41:04 +03:00
|
|
|
qio_channel_shutdown(client->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
|
|
|
|
NULL);
|
2012-08-22 20:45:12 +04:00
|
|
|
|
|
|
|
/* Also tell the client, so that they release their reference. */
|
nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-09 01:26:17 +03:00
|
|
|
if (client->close_fn) {
|
|
|
|
client->close_fn(client, negotiated);
|
2011-09-19 16:33:23 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
/* Runs in export AioContext with client->lock held */
|
2016-10-14 21:33:05 +03:00
|
|
|
static NBDRequestData *nbd_request_get(NBDClient *client)
|
2011-09-19 16:18:33 +04:00
|
|
|
{
|
2016-10-14 21:33:05 +03:00
|
|
|
NBDRequestData *req;
|
2011-10-07 18:47:56 +04:00
|
|
|
|
2011-09-19 17:25:40 +04:00
|
|
|
assert(client->nb_requests <= MAX_NBD_REQUESTS - 1);
|
|
|
|
client->nb_requests++;
|
|
|
|
|
2016-10-14 21:33:05 +03:00
|
|
|
req = g_new0(NBDRequestData, 1);
|
2011-10-07 18:47:56 +04:00
|
|
|
req->client = client;
|
2011-09-19 16:18:33 +04:00
|
|
|
return req;
|
|
|
|
}
|
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
/* Runs in export AioContext with client->lock held */
|
2016-10-14 21:33:05 +03:00
|
|
|
static void nbd_request_put(NBDRequestData *req)
|
2011-09-19 16:18:33 +04:00
|
|
|
{
|
2011-10-07 18:47:56 +04:00
|
|
|
NBDClient *client = req->client;
|
2013-05-02 16:23:07 +04:00
|
|
|
|
2013-05-02 16:23:08 +04:00
|
|
|
if (req->data) {
|
|
|
|
qemu_vfree(req->data);
|
|
|
|
}
|
2015-10-01 13:59:08 +03:00
|
|
|
g_free(req);
|
2013-05-02 16:23:07 +04:00
|
|
|
|
2014-06-20 23:57:32 +04:00
|
|
|
client->nb_requests--;
|
2021-06-02 09:05:52 +03:00
|
|
|
|
|
|
|
if (client->quiescing && client->nb_requests == 0) {
|
|
|
|
aio_wait_kick();
|
|
|
|
}
|
|
|
|
|
2017-02-13 16:52:24 +03:00
|
|
|
nbd_client_receive_next_request(client);
|
2011-09-19 16:18:33 +04:00
|
|
|
}
|
|
|
|
|
2014-11-18 14:21:18 +03:00
|
|
|
static void blk_aio_attached(AioContext *ctx, void *opaque)
|
2014-06-20 23:57:34 +04:00
|
|
|
{
|
|
|
|
NBDExport *exp = opaque;
|
|
|
|
NBDClient *client;
|
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
assert(qemu_in_main_thread());
|
|
|
|
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_blk_aio_attached(exp->name, ctx);
|
2014-06-20 23:57:34 +04:00
|
|
|
|
2020-09-24 18:27:00 +03:00
|
|
|
exp->common.ctx = ctx;
|
2014-06-20 23:57:34 +04:00
|
|
|
|
|
|
|
QTAILQ_FOREACH(client, &exp->clients, next) {
|
2023-12-21 22:24:52 +03:00
|
|
|
WITH_QEMU_LOCK_GUARD(&client->lock) {
|
|
|
|
assert(client->nb_requests == 0);
|
|
|
|
assert(client->recv_coroutine == NULL);
|
|
|
|
assert(client->send_coroutine == NULL);
|
|
|
|
}
|
2020-12-14 20:05:18 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
static void blk_aio_detach(void *opaque)
|
2020-12-14 20:05:18 +03:00
|
|
|
{
|
|
|
|
NBDExport *exp = opaque;
|
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
assert(qemu_in_main_thread());
|
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
|
|
|
|
|
|
|
|
exp->common.ctx = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void nbd_drained_begin(void *opaque)
|
|
|
|
{
|
|
|
|
NBDExport *exp = opaque;
|
|
|
|
NBDClient *client;
|
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
assert(qemu_in_main_thread());
|
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
QTAILQ_FOREACH(client, &exp->clients, next) {
|
2023-12-21 22:24:52 +03:00
|
|
|
WITH_QEMU_LOCK_GUARD(&client->lock) {
|
|
|
|
client->quiescing = true;
|
|
|
|
}
|
2021-06-02 09:05:52 +03:00
|
|
|
}
|
|
|
|
}
|
2020-12-14 20:05:18 +03:00
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
static void nbd_drained_end(void *opaque)
|
|
|
|
{
|
|
|
|
NBDExport *exp = opaque;
|
|
|
|
NBDClient *client;
|
2020-12-14 20:05:18 +03:00
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
assert(qemu_in_main_thread());
|
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
QTAILQ_FOREACH(client, &exp->clients, next) {
|
2023-12-21 22:24:52 +03:00
|
|
|
WITH_QEMU_LOCK_GUARD(&client->lock) {
|
|
|
|
client->quiescing = false;
|
|
|
|
nbd_client_receive_next_request(client);
|
|
|
|
}
|
2014-06-20 23:57:34 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
/* Runs in export AioContext */
|
|
|
|
static void nbd_wake_read_bh(void *opaque)
|
|
|
|
{
|
|
|
|
NBDClient *client = opaque;
|
|
|
|
qio_channel_wake_read(client->ioc);
|
|
|
|
}
|
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
static bool nbd_drained_poll(void *opaque)
|
2014-06-20 23:57:34 +04:00
|
|
|
{
|
|
|
|
NBDExport *exp = opaque;
|
2021-06-02 09:05:52 +03:00
|
|
|
NBDClient *client;
|
2014-06-20 23:57:34 +04:00
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
assert(qemu_in_main_thread());
|
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
QTAILQ_FOREACH(client, &exp->clients, next) {
|
2023-12-21 22:24:52 +03:00
|
|
|
WITH_QEMU_LOCK_GUARD(&client->lock) {
|
|
|
|
if (client->nb_requests != 0) {
|
|
|
|
/*
|
|
|
|
* If there's a coroutine waiting for a request on nbd_read_eof()
|
|
|
|
* enter it here so we don't depend on the client to wake it up.
|
|
|
|
*
|
|
|
|
* Schedule a BH in the export AioContext to avoid missing the
|
|
|
|
* wake up due to the race between qio_channel_wake_read() and
|
|
|
|
* qio_channel_yield().
|
|
|
|
*/
|
|
|
|
if (client->recv_coroutine != NULL && client->read_yielding) {
|
|
|
|
aio_bh_schedule_oneshot(nbd_export_aio_context(client->exp),
|
|
|
|
nbd_wake_read_bh, client);
|
|
|
|
}
|
2014-06-20 23:57:34 +04:00
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
return true;
|
|
|
|
}
|
2021-06-02 09:05:52 +03:00
|
|
|
}
|
|
|
|
}
|
2014-06-20 23:57:34 +04:00
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
return false;
|
2014-06-20 23:57:34 +04:00
|
|
|
}
|
|
|
|
|
2016-01-29 18:36:06 +03:00
|
|
|
static void nbd_eject_notifier(Notifier *n, void *data)
|
|
|
|
{
|
|
|
|
NBDExport *exp = container_of(n, NBDExport, eject_notifier);
|
2019-09-17 05:39:17 +03:00
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
assert(qemu_in_main_thread());
|
|
|
|
|
2020-09-24 18:27:03 +03:00
|
|
|
blk_exp_request_shutdown(&exp->common);
|
2016-01-29 18:36:06 +03:00
|
|
|
}
|
|
|
|
|
2020-09-24 18:26:53 +03:00
|
|
|
void nbd_export_set_on_eject_blk(BlockExport *exp, BlockBackend *blk)
|
|
|
|
{
|
|
|
|
NBDExport *nbd_exp = container_of(exp, NBDExport, common);
|
|
|
|
assert(exp->drv == &blk_exp_nbd);
|
|
|
|
assert(nbd_exp->eject_notifier_blk == NULL);
|
|
|
|
|
|
|
|
blk_ref(blk);
|
|
|
|
nbd_exp->eject_notifier_blk = blk;
|
|
|
|
nbd_exp->eject_notifier.notify = nbd_eject_notifier;
|
|
|
|
blk_add_remove_bs_notifier(blk, &nbd_exp->eject_notifier);
|
|
|
|
}
|
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
static const BlockDevOps nbd_block_ops = {
|
|
|
|
.drained_begin = nbd_drained_begin,
|
|
|
|
.drained_end = nbd_drained_end,
|
|
|
|
.drained_poll = nbd_drained_poll,
|
|
|
|
};
|
|
|
|
|
2020-09-24 18:27:12 +03:00
|
|
|
static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
|
|
|
|
Error **errp)
|
2011-09-19 16:03:37 +04:00
|
|
|
{
|
2020-09-24 18:27:02 +03:00
|
|
|
NBDExport *exp = container_of(blk_exp, NBDExport, common);
|
2020-09-24 18:27:12 +03:00
|
|
|
BlockExportOptionsNbd *arg = &exp_args->u.nbd;
|
2022-11-04 19:06:51 +03:00
|
|
|
const char *name = arg->name ?: exp_args->node_name;
|
2020-09-24 18:27:09 +03:00
|
|
|
BlockBackend *blk = blk_exp->blk;
|
2020-09-24 18:26:52 +03:00
|
|
|
int64_t size;
|
2020-09-24 18:27:09 +03:00
|
|
|
uint64_t perm, shared_perm;
|
2020-09-24 18:27:12 +03:00
|
|
|
bool readonly = !exp_args->writable;
|
qapi: nbd-export: allow select bitmaps by node/name pair
Hi all! Current logic of relying on search through backing chain is not
safe neither convenient.
Sometimes it leads to necessity of extra bitmap copying. Also, we are
going to add "snapshot-access" driver, to access some snapshot state
through NBD. And this driver is not formally a filter, and of course
it's not a COW format driver. So, searching through backing chain will
not work. Instead of widening the workaround of bitmap searching, let's
extend the interface so that user can select bitmap precisely.
Note, that checking for bitmap active status is not copied to the new
API, I don't see a reason for it, user should understand the risks. And
anyway, bitmap from other node is unrelated to this export being
read-only or read-write.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-03-15 00:32:25 +03:00
|
|
|
BlockDirtyBitmapOrStrList *bitmaps;
|
2020-10-27 08:05:52 +03:00
|
|
|
size_t i;
|
2017-01-13 21:02:32 +03:00
|
|
|
int ret;
|
2016-07-06 12:22:39 +03:00
|
|
|
|
2023-10-27 18:53:15 +03:00
|
|
|
GLOBAL_STATE_CODE();
|
2020-09-24 18:27:12 +03:00
|
|
|
assert(exp_args->type == BLOCK_EXPORT_TYPE_NBD);
|
|
|
|
|
|
|
|
if (!nbd_server_is_running()) {
|
|
|
|
error_setg(errp, "NBD server not running");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2022-11-04 19:06:51 +03:00
|
|
|
if (strlen(name) > NBD_MAX_STRING_SIZE) {
|
|
|
|
error_setg(errp, "export name '%s' too long", name);
|
2020-09-24 18:27:12 +03:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (arg->description && strlen(arg->description) > NBD_MAX_STRING_SIZE) {
|
|
|
|
error_setg(errp, "description '%s' too long", arg->description);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2022-11-04 19:06:51 +03:00
|
|
|
if (nbd_export_find(name)) {
|
|
|
|
error_setg(errp, "NBD server already has export named '%s'", name);
|
2020-09-24 18:27:12 +03:00
|
|
|
return -EEXIST;
|
|
|
|
}
|
|
|
|
|
2020-09-24 18:27:09 +03:00
|
|
|
size = blk_getlength(blk);
|
2020-09-24 18:26:52 +03:00
|
|
|
if (size < 0) {
|
|
|
|
error_setg_errno(errp, -size,
|
|
|
|
"Failed to determine the NBD export's length");
|
2020-09-24 18:27:02 +03:00
|
|
|
return size;
|
2020-09-24 18:26:52 +03:00
|
|
|
}
|
|
|
|
|
2017-02-09 17:43:38 +03:00
|
|
|
/* Don't allow resize while the NBD server is running, otherwise we don't
|
|
|
|
* care what happens with the node. */
|
2020-09-24 18:27:09 +03:00
|
|
|
blk_get_perm(blk, &perm, &shared_perm);
|
|
|
|
ret = blk_set_perm(blk, perm, shared_perm & ~BLK_PERM_RESIZE, errp);
|
2017-01-13 21:02:32 +03:00
|
|
|
if (ret < 0) {
|
2020-09-24 18:27:09 +03:00
|
|
|
return ret;
|
2017-01-13 21:02:32 +03:00
|
|
|
}
|
2020-09-24 18:27:09 +03:00
|
|
|
|
2012-09-18 15:58:25 +04:00
|
|
|
QTAILQ_INIT(&exp->clients);
|
2022-11-04 19:06:51 +03:00
|
|
|
exp->name = g_strdup(name);
|
2020-09-24 18:27:12 +03:00
|
|
|
exp->description = g_strdup(arg->description);
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
exp->nbdflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH |
|
|
|
|
NBD_FLAG_SEND_FUA | NBD_FLAG_SEND_CACHE);
|
nbd/server: Allow MULTI_CONN for shared writable exports
According to the NBD spec, a server that advertises
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.
We always satisfy these conditions in qemu - even when we support
multiple clients, ALL clients go through a single point of reference
into the block layer, with no local caching. The effect of one client
is instantly visible to the next client. Even if our backend were a
network device, we argue that any multi-path caching effects that
would cause inconsistencies in back-to-back actions not seeing the
effect of previous actions would be a bug in that backend, and not the
fault of caching in qemu. As such, it is safe to unconditionally
advertise CAN_MULTI_CONN for any qemu NBD server situation that
supports parallel clients.
Note, however, that we don't want to advertise CAN_MULTI_CONN when we
know that a second client cannot connect (for historical reasons,
qemu-nbd defaults to a single connection while nbd-server-add and QMP
commands default to unlimited connections; but we already have
existing means to let either style of NBD server creation alter those
defaults). This is visible by no longer advertising MULTI_CONN for
'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation.
The harder part of this patch is setting up an iotest to demonstrate
behavior of multiple NBD clients to a single server. It might be
possible with parallel qemu-io processes, but I found it easier to do
in python with the help of libnbd, and help from Nir and Vladimir in
writing the test.
Signed-off-by: Eric Blake <eblake@redhat.com>
Suggested-by: Nir Soffer <nsoffer@redhat.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
Message-Id: <20220512004924.417153-3-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-12 03:49:24 +03:00
|
|
|
|
|
|
|
if (nbd_server_max_connections() != 1) {
|
|
|
|
exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
|
|
|
|
}
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
if (readonly) {
|
|
|
|
exp->nbdflags |= NBD_FLAG_READ_ONLY;
|
|
|
|
} else {
|
2019-08-23 17:37:25 +03:00
|
|
|
exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES |
|
|
|
|
NBD_FLAG_SEND_FAST_ZERO);
|
nbd: Improve per-export flag handling in server
When creating a read-only image, we are still advertising support for
TRIM and WRITE_ZEROES to the client, even though the client should not
be issuing those commands. But seeing this requires looking across
multiple functions:
All callers to nbd_export_new() passed a single flag based solely on
whether the export allows writes. Later, we then pass a constant set
of flags to nbd_negotiate_options() (namely, the set of flags which we
always support, at least for writable images), which is then further
dynamically modified with NBD_FLAG_SEND_DF based on client requests
for structured options. Finally, when processing NBD_OPT_EXPORT_NAME
or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the
runtime set of flags we've built up over several functions.
Let's refactor things to instead compute a baseline of flags as soon
as possible which gets shared between multiple clients, in
nbd_export_new(), and changing the signature for the callers to pass
in a simpler bool rather than having to figure out flags. We can then
get rid of the 'myflags' parameter to various functions, and instead
refer to client for everything we need (we still have to perform a
bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and
NBD_OPT_EXPORT_GO, but it's easier to see what is being computed).
This lets us quit advertising senseless flags for read-only images, as
well as making the next patch for exposing FAST_ZERO support easier to
write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190823143726.27062-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: improve commit message, update iotest 223]
2019-08-23 17:37:22 +03:00
|
|
|
}
|
2019-01-17 22:36:42 +03:00
|
|
|
exp->size = QEMU_ALIGN_DOWN(size, BDRV_SECTOR_SIZE);
|
2015-02-25 21:08:21 +03:00
|
|
|
|
2023-10-27 18:53:15 +03:00
|
|
|
bdrv_graph_rdlock_main_loop();
|
|
|
|
|
2020-10-27 08:05:49 +03:00
|
|
|
for (bitmaps = arg->bitmaps; bitmaps; bitmaps = bitmaps->next) {
|
2020-10-27 08:05:52 +03:00
|
|
|
exp->nr_export_bitmaps++;
|
|
|
|
}
|
|
|
|
exp->export_bitmaps = g_new0(BdrvDirtyBitmap *, exp->nr_export_bitmaps);
|
|
|
|
for (i = 0, bitmaps = arg->bitmaps; bitmaps;
|
qapi: nbd-export: allow select bitmaps by node/name pair
Hi all! Current logic of relying on search through backing chain is not
safe neither convenient.
Sometimes it leads to necessity of extra bitmap copying. Also, we are
going to add "snapshot-access" driver, to access some snapshot state
through NBD. And this driver is not formally a filter, and of course
it's not a COW format driver. So, searching through backing chain will
not work. Instead of widening the workaround of bitmap searching, let's
extend the interface so that user can select bitmap precisely.
Note, that checking for bitmap active status is not copied to the new
API, I don't see a reason for it, user should understand the risks. And
anyway, bitmap from other node is unrelated to this export being
read-only or read-write.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-03-15 00:32:25 +03:00
|
|
|
i++, bitmaps = bitmaps->next)
|
|
|
|
{
|
|
|
|
const char *bitmap;
|
2020-09-24 18:27:09 +03:00
|
|
|
BlockDriverState *bs = blk_bs(blk);
|
2019-01-11 22:47:19 +03:00
|
|
|
BdrvDirtyBitmap *bm = NULL;
|
|
|
|
|
qapi: nbd-export: allow select bitmaps by node/name pair
Hi all! Current logic of relying on search through backing chain is not
safe neither convenient.
Sometimes it leads to necessity of extra bitmap copying. Also, we are
going to add "snapshot-access" driver, to access some snapshot state
through NBD. And this driver is not formally a filter, and of course
it's not a COW format driver. So, searching through backing chain will
not work. Instead of widening the workaround of bitmap searching, let's
extend the interface so that user can select bitmap precisely.
Note, that checking for bitmap active status is not copied to the new
API, I don't see a reason for it, user should understand the risks. And
anyway, bitmap from other node is unrelated to this export being
read-only or read-write.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-03-15 00:32:25 +03:00
|
|
|
switch (bitmaps->value->type) {
|
|
|
|
case QTYPE_QSTRING:
|
|
|
|
bitmap = bitmaps->value->u.local;
|
|
|
|
while (bs) {
|
|
|
|
bm = bdrv_find_dirty_bitmap(bs, bitmap);
|
|
|
|
if (bm != NULL) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
bs = bdrv_filter_or_cow_bs(bs);
|
2019-01-11 22:47:19 +03:00
|
|
|
}
|
|
|
|
|
qapi: nbd-export: allow select bitmaps by node/name pair
Hi all! Current logic of relying on search through backing chain is not
safe neither convenient.
Sometimes it leads to necessity of extra bitmap copying. Also, we are
going to add "snapshot-access" driver, to access some snapshot state
through NBD. And this driver is not formally a filter, and of course
it's not a COW format driver. So, searching through backing chain will
not work. Instead of widening the workaround of bitmap searching, let's
extend the interface so that user can select bitmap precisely.
Note, that checking for bitmap active status is not copied to the new
API, I don't see a reason for it, user should understand the risks. And
anyway, bitmap from other node is unrelated to this export being
read-only or read-write.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-03-15 00:32:25 +03:00
|
|
|
if (bm == NULL) {
|
|
|
|
ret = -ENOENT;
|
|
|
|
error_setg(errp, "Bitmap '%s' is not found",
|
|
|
|
bitmaps->value->u.local);
|
|
|
|
goto fail;
|
|
|
|
}
|
2019-01-11 22:47:19 +03:00
|
|
|
|
qapi: nbd-export: allow select bitmaps by node/name pair
Hi all! Current logic of relying on search through backing chain is not
safe neither convenient.
Sometimes it leads to necessity of extra bitmap copying. Also, we are
going to add "snapshot-access" driver, to access some snapshot state
through NBD. And this driver is not formally a filter, and of course
it's not a COW format driver. So, searching through backing chain will
not work. Instead of widening the workaround of bitmap searching, let's
extend the interface so that user can select bitmap precisely.
Note, that checking for bitmap active status is not copied to the new
API, I don't see a reason for it, user should understand the risks. And
anyway, bitmap from other node is unrelated to this export being
read-only or read-write.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-03-15 00:32:25 +03:00
|
|
|
if (readonly && bdrv_is_writable(bs) &&
|
|
|
|
bdrv_dirty_bitmap_enabled(bm)) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
error_setg(errp, "Enabled bitmap '%s' incompatible with "
|
|
|
|
"readonly export", bitmap);
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case QTYPE_QDICT:
|
|
|
|
bitmap = bitmaps->value->u.external.name;
|
|
|
|
bm = block_dirty_bitmap_lookup(bitmaps->value->u.external.node,
|
|
|
|
bitmap, NULL, errp);
|
|
|
|
if (!bm) {
|
|
|
|
ret = -ENOENT;
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
abort();
|
2019-01-11 22:47:19 +03:00
|
|
|
}
|
|
|
|
|
qapi: nbd-export: allow select bitmaps by node/name pair
Hi all! Current logic of relying on search through backing chain is not
safe neither convenient.
Sometimes it leads to necessity of extra bitmap copying. Also, we are
going to add "snapshot-access" driver, to access some snapshot state
through NBD. And this driver is not formally a filter, and of course
it's not a COW format driver. So, searching through backing chain will
not work. Instead of widening the workaround of bitmap searching, let's
extend the interface so that user can select bitmap precisely.
Note, that checking for bitmap active status is not copied to the new
API, I don't see a reason for it, user should understand the risks. And
anyway, bitmap from other node is unrelated to this export being
read-only or read-write.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-03-15 00:32:25 +03:00
|
|
|
assert(bm);
|
2019-03-12 19:05:48 +03:00
|
|
|
|
qapi: nbd-export: allow select bitmaps by node/name pair
Hi all! Current logic of relying on search through backing chain is not
safe neither convenient.
Sometimes it leads to necessity of extra bitmap copying. Also, we are
going to add "snapshot-access" driver, to access some snapshot state
through NBD. And this driver is not formally a filter, and of course
it's not a COW format driver. So, searching through backing chain will
not work. Instead of widening the workaround of bitmap searching, let's
extend the interface so that user can select bitmap precisely.
Note, that checking for bitmap active status is not copied to the new
API, I don't see a reason for it, user should understand the risks. And
anyway, bitmap from other node is unrelated to this export being
read-only or read-write.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-03-15 00:32:25 +03:00
|
|
|
if (bdrv_dirty_bitmap_check(bm, BDRV_BITMAP_ALLOW_RO, errp)) {
|
2020-09-24 18:27:02 +03:00
|
|
|
ret = -EINVAL;
|
2019-01-11 22:47:19 +03:00
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
2020-10-27 08:05:52 +03:00
|
|
|
exp->export_bitmaps[i] = bm;
|
2020-10-27 08:05:49 +03:00
|
|
|
assert(strlen(bitmap) <= BDRV_BITMAP_MAX_NAME_SIZE);
|
2019-01-11 22:47:19 +03:00
|
|
|
}
|
|
|
|
|
2020-10-27 08:05:52 +03:00
|
|
|
/* Mark bitmaps busy in a separate loop, to simplify roll-back concerns. */
|
|
|
|
for (i = 0; i < exp->nr_export_bitmaps; i++) {
|
|
|
|
bdrv_dirty_bitmap_set_busy(exp->export_bitmaps[i], true);
|
|
|
|
}
|
|
|
|
|
2020-10-27 08:05:55 +03:00
|
|
|
exp->allocation_depth = arg->allocation_depth;
|
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
/*
|
|
|
|
* We need to inhibit request queuing in the block layer to ensure we can
|
|
|
|
* be properly quiesced when entering a drained section, as our coroutines
|
|
|
|
* servicing pending requests might enter blk_pread().
|
|
|
|
*/
|
|
|
|
blk_set_disable_request_queuing(blk, true);
|
|
|
|
|
2014-11-18 14:21:18 +03:00
|
|
|
blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);
|
2016-01-29 18:36:06 +03:00
|
|
|
|
2021-06-02 09:05:52 +03:00
|
|
|
blk_set_dev_ops(blk, &nbd_block_ops, exp);
|
|
|
|
|
nbd: Merge nbd_export_set_name into nbd_export_new
The existing NBD code had a weird split where nbd_export_new()
created an export but did not add it to the list of exported
names until a later nbd_export_set_name() came along and grabbed
a second reference on the object; later, the first call to
nbd_export_close() drops the second reference while removing
the export from the list. This is in part because the QAPI
NbdServerRemoveNode enum documents the possibility of adding a
mode where we could do a soft disconnect: preventing new clients,
but waiting for existing clients to gracefully quit, based on
the mode used when calling nbd_export_close().
But in spite of all that, note that we never change the name of
an NBD export while it is exposed, which means it is easier to
just inline the process of setting the name as part of creating
the export.
Inline the contents of nbd_export_set_name() and
nbd_export_set_description() into the two points in an export
lifecycle where they matter, then adjust both callers to pass
the name up front. Note that for creation, all callers pass a
non-NULL name, (passing NULL at creation was for old style
servers, but we removed support for that in commit 7f7dfe2a),
so we can add an assert and do things unconditionally; but for
cleanup, because of the dual nature of nbd_export_close(), we
still have to be careful to avoid use-after-free. Along the
way, add a comment reminding ourselves of the potential of
adding a middle mode disconnect.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190111194720.15671-5-eblake@redhat.com>
2019-01-11 22:47:16 +03:00
|
|
|
QTAILQ_INSERT_TAIL(&exports, exp, next);
|
2020-09-24 18:26:59 +03:00
|
|
|
|
2023-10-27 18:53:15 +03:00
|
|
|
bdrv_graph_rdunlock_main_loop();
|
|
|
|
|
2020-09-24 18:27:02 +03:00
|
|
|
return 0;
|
2015-02-25 21:08:21 +03:00
|
|
|
|
|
|
|
fail:
|
2023-10-27 18:53:15 +03:00
|
|
|
bdrv_graph_rdunlock_main_loop();
|
2020-10-27 08:05:52 +03:00
|
|
|
g_free(exp->export_bitmaps);
|
nbd: Merge nbd_export_set_name into nbd_export_new
The existing NBD code had a weird split where nbd_export_new()
created an export but did not add it to the list of exported
names until a later nbd_export_set_name() came along and grabbed
a second reference on the object; later, the first call to
nbd_export_close() drops the second reference while removing
the export from the list. This is in part because the QAPI
NbdServerRemoveNode enum documents the possibility of adding a
mode where we could do a soft disconnect: preventing new clients,
but waiting for existing clients to gracefully quit, based on
the mode used when calling nbd_export_close().
But in spite of all that, note that we never change the name of
an NBD export while it is exposed, which means it is easier to
just inline the process of setting the name as part of creating
the export.
Inline the contents of nbd_export_set_name() and
nbd_export_set_description() into the two points in an export
lifecycle where they matter, then adjust both callers to pass
the name up front. Note that for creation, all callers pass a
non-NULL name, (passing NULL at creation was for old style
servers, but we removed support for that in commit 7f7dfe2a),
so we can add an assert and do things unconditionally; but for
cleanup, because of the dual nature of nbd_export_close(), we
still have to be careful to avoid use-after-free. Along the
way, add a comment reminding ourselves of the potential of
adding a middle mode disconnect.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190111194720.15671-5-eblake@redhat.com>
2019-01-11 22:47:16 +03:00
|
|
|
g_free(exp->name);
|
|
|
|
g_free(exp->description);
|
2020-09-24 18:27:02 +03:00
|
|
|
return ret;
|
2011-09-19 16:03:37 +04:00
|
|
|
}
|
|
|
|
|
2012-08-22 17:59:23 +04:00
|
|
|
NBDExport *nbd_export_find(const char *name)
|
|
|
|
{
|
|
|
|
NBDExport *exp;
|
|
|
|
QTAILQ_FOREACH(exp, &exports, next) {
|
|
|
|
if (strcmp(name, exp->name) == 0) {
|
|
|
|
return exp;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2019-09-17 05:39:17 +03:00
|
|
|
AioContext *
|
|
|
|
nbd_export_aio_context(NBDExport *exp)
|
|
|
|
{
|
2020-09-24 18:27:00 +03:00
|
|
|
return exp->common.ctx;
|
2019-09-17 05:39:17 +03:00
|
|
|
}
|
|
|
|
|
2020-09-24 18:27:03 +03:00
|
|
|
static void nbd_export_request_shutdown(BlockExport *blk_exp)
|
2011-09-19 16:03:37 +04:00
|
|
|
{
|
2020-09-24 18:27:03 +03:00
|
|
|
NBDExport *exp = container_of(blk_exp, NBDExport, common);
|
2012-09-18 15:58:25 +04:00
|
|
|
NBDClient *client, *next;
|
2012-09-18 15:26:25 +04:00
|
|
|
|
2020-09-24 18:26:59 +03:00
|
|
|
blk_exp_ref(&exp->common);
|
nbd: Merge nbd_export_set_name into nbd_export_new
The existing NBD code had a weird split where nbd_export_new()
created an export but did not add it to the list of exported
names until a later nbd_export_set_name() came along and grabbed
a second reference on the object; later, the first call to
nbd_export_close() drops the second reference while removing
the export from the list. This is in part because the QAPI
NbdServerRemoveNode enum documents the possibility of adding a
mode where we could do a soft disconnect: preventing new clients,
but waiting for existing clients to gracefully quit, based on
the mode used when calling nbd_export_close().
But in spite of all that, note that we never change the name of
an NBD export while it is exposed, which means it is easier to
just inline the process of setting the name as part of creating
the export.
Inline the contents of nbd_export_set_name() and
nbd_export_set_description() into the two points in an export
lifecycle where they matter, then adjust both callers to pass
the name up front. Note that for creation, all callers pass a
non-NULL name, (passing NULL at creation was for old style
servers, but we removed support for that in commit 7f7dfe2a),
so we can add an assert and do things unconditionally; but for
cleanup, because of the dual nature of nbd_export_close(), we
still have to be careful to avoid use-after-free. Along the
way, add a comment reminding ourselves of the potential of
adding a middle mode disconnect.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190111194720.15671-5-eblake@redhat.com>
2019-01-11 22:47:16 +03:00
|
|
|
/*
|
|
|
|
* TODO: Should we expand QMP NbdServerRemoveNode enum to allow a
|
|
|
|
* close mode that stops advertising the export to new clients but
|
|
|
|
* still permits existing clients to run to completion? Because of
|
|
|
|
* that possibility, nbd_export_close() can be called more than
|
|
|
|
* once on an export.
|
|
|
|
*/
|
2012-09-18 15:58:25 +04:00
|
|
|
QTAILQ_FOREACH_SAFE(client, &exp->clients, next, next) {
|
nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-09 01:26:17 +03:00
|
|
|
client_close(client, true);
|
2012-09-18 15:58:25 +04:00
|
|
|
}
|
nbd: Merge nbd_export_set_name into nbd_export_new
The existing NBD code had a weird split where nbd_export_new()
created an export but did not add it to the list of exported
names until a later nbd_export_set_name() came along and grabbed
a second reference on the object; later, the first call to
nbd_export_close() drops the second reference while removing
the export from the list. This is in part because the QAPI
NbdServerRemoveNode enum documents the possibility of adding a
mode where we could do a soft disconnect: preventing new clients,
but waiting for existing clients to gracefully quit, based on
the mode used when calling nbd_export_close().
But in spite of all that, note that we never change the name of
an NBD export while it is exposed, which means it is easier to
just inline the process of setting the name as part of creating
the export.
Inline the contents of nbd_export_set_name() and
nbd_export_set_description() into the two points in an export
lifecycle where they matter, then adjust both callers to pass
the name up front. Note that for creation, all callers pass a
non-NULL name, (passing NULL at creation was for old style
servers, but we removed support for that in commit 7f7dfe2a),
so we can add an assert and do things unconditionally; but for
cleanup, because of the dual nature of nbd_export_close(), we
still have to be careful to avoid use-after-free. Along the
way, add a comment reminding ourselves of the potential of
adding a middle mode disconnect.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190111194720.15671-5-eblake@redhat.com>
2019-01-11 22:47:16 +03:00
|
|
|
if (exp->name) {
|
|
|
|
g_free(exp->name);
|
|
|
|
exp->name = NULL;
|
|
|
|
QTAILQ_REMOVE(&exports, exp, next);
|
|
|
|
}
|
2020-09-24 18:26:59 +03:00
|
|
|
blk_exp_unref(&exp->common);
|
2012-09-18 15:26:25 +04:00
|
|
|
}
|
|
|
|
|
2020-09-24 18:26:59 +03:00
|
|
|
static void nbd_export_delete(BlockExport *blk_exp)
|
2012-09-18 15:26:25 +04:00
|
|
|
{
|
2020-10-27 08:05:52 +03:00
|
|
|
size_t i;
|
2020-09-24 18:26:59 +03:00
|
|
|
NBDExport *exp = container_of(blk_exp, NBDExport, common);
|
2012-09-18 15:26:25 +04:00
|
|
|
|
2020-09-24 18:26:59 +03:00
|
|
|
assert(exp->name == NULL);
|
|
|
|
assert(QTAILQ_EMPTY(&exp->clients));
|
2015-09-16 11:35:46 +03:00
|
|
|
|
2020-09-24 18:26:59 +03:00
|
|
|
g_free(exp->description);
|
|
|
|
exp->description = NULL;
|
|
|
|
|
2022-12-02 01:49:57 +03:00
|
|
|
if (exp->eject_notifier_blk) {
|
|
|
|
notifier_remove(&exp->eject_notifier);
|
|
|
|
blk_unref(exp->eject_notifier_blk);
|
2020-09-24 18:26:59 +03:00
|
|
|
}
|
2022-12-02 01:49:57 +03:00
|
|
|
blk_remove_aio_context_notifier(exp->common.blk, blk_aio_attached,
|
|
|
|
blk_aio_detach, exp);
|
|
|
|
blk_set_disable_request_queuing(exp->common.blk, false);
|
2018-06-09 18:17:56 +03:00
|
|
|
|
2020-10-27 08:05:52 +03:00
|
|
|
for (i = 0; i < exp->nr_export_bitmaps; i++) {
|
|
|
|
bdrv_dirty_bitmap_set_busy(exp->export_bitmaps[i], false);
|
2012-09-18 15:26:25 +04:00
|
|
|
}
|
2011-09-19 16:03:37 +04:00
|
|
|
}
|
|
|
|
|
2020-09-24 18:26:50 +03:00
|
|
|
const BlockExportDriver blk_exp_nbd = {
|
|
|
|
.type = BLOCK_EXPORT_TYPE_NBD,
|
2020-09-24 18:27:02 +03:00
|
|
|
.instance_size = sizeof(NBDExport),
|
2020-09-24 18:26:50 +03:00
|
|
|
.create = nbd_export_create,
|
2020-09-24 18:26:59 +03:00
|
|
|
.delete = nbd_export_delete,
|
2020-09-24 18:27:03 +03:00
|
|
|
.request_shutdown = nbd_export_request_shutdown,
|
2020-09-24 18:26:50 +03:00
|
|
|
};
|
|
|
|
|
2017-10-13 01:29:00 +03:00
|
|
|
static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
|
|
|
|
unsigned niov, Error **errp)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
g_assert(qemu_in_coroutine());
|
|
|
|
qemu_co_mutex_lock(&client->send_lock);
|
|
|
|
client->send_coroutine = qemu_coroutine_self();
|
|
|
|
|
|
|
|
ret = qio_channel_writev_all(client->ioc, iov, niov, errp) < 0 ? -EIO : 0;
|
|
|
|
|
|
|
|
client->send_coroutine = NULL;
|
|
|
|
qemu_co_mutex_unlock(&client->send_lock);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-10-12 12:53:10 +03:00
|
|
|
static inline void set_be_simple_reply(NBDSimpleReply *reply, uint64_t error,
|
2023-06-08 16:56:34 +03:00
|
|
|
uint64_t cookie)
|
2017-10-12 12:53:10 +03:00
|
|
|
{
|
|
|
|
stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
|
|
|
|
stl_be_p(&reply->error, error);
|
2023-06-08 16:56:34 +03:00
|
|
|
stq_be_p(&reply->cookie, cookie);
|
2017-10-12 12:53:10 +03:00
|
|
|
}
|
|
|
|
|
2023-03-09 11:44:51 +03:00
|
|
|
static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBDRequest *request,
|
2023-03-09 11:44:51 +03:00
|
|
|
uint32_t error,
|
|
|
|
void *data,
|
2023-08-29 20:58:31 +03:00
|
|
|
uint64_t len,
|
2023-03-09 11:44:51 +03:00
|
|
|
Error **errp)
|
2011-09-19 16:25:30 +04:00
|
|
|
{
|
2017-10-13 01:29:00 +03:00
|
|
|
NBDSimpleReply reply;
|
2017-10-13 01:05:06 +03:00
|
|
|
int nbd_err = system_errno_to_nbd_errno(error);
|
2017-10-13 01:29:00 +03:00
|
|
|
struct iovec iov[] = {
|
|
|
|
{.iov_base = &reply, .iov_len = sizeof(reply)},
|
|
|
|
{.iov_base = data, .iov_len = len}
|
|
|
|
};
|
2017-07-07 18:29:17 +03:00
|
|
|
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
assert(!len || !nbd_err);
|
2023-08-29 20:58:31 +03:00
|
|
|
assert(len <= NBD_MAX_BUFFER_SIZE);
|
2023-08-29 20:58:28 +03:00
|
|
|
assert(client->mode < NBD_MODE_STRUCTURED ||
|
|
|
|
(client->mode == NBD_MODE_STRUCTURED &&
|
|
|
|
request->type != NBD_CMD_READ));
|
2023-06-08 16:56:34 +03:00
|
|
|
trace_nbd_co_send_simple_reply(request->cookie, nbd_err,
|
2023-06-08 16:56:33 +03:00
|
|
|
nbd_err_lookup(nbd_err), len);
|
2023-06-08 16:56:34 +03:00
|
|
|
set_be_simple_reply(&reply, nbd_err, request->cookie);
|
2011-09-19 17:19:27 +04:00
|
|
|
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
return nbd_co_send_iov(client, iov, 2, errp);
|
2011-09-19 16:25:30 +04:00
|
|
|
}
|
|
|
|
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
/*
|
|
|
|
* Prepare the header of a reply chunk for network transmission.
|
|
|
|
*
|
|
|
|
* On input, @iov is partially initialized: iov[0].iov_base must point
|
|
|
|
* to an uninitialized NBDReply, while the remaining @niov elements
|
|
|
|
* (if any) must be ready for transmission. This function then
|
|
|
|
* populates iov[0] for transmission.
|
|
|
|
*/
|
|
|
|
static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
|
|
|
|
size_t niov, uint16_t flags, uint16_t type,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBDRequest *request)
|
2017-10-27 13:40:32 +03:00
|
|
|
{
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
size_t i, length = 0;
|
|
|
|
|
|
|
|
for (i = 1; i < niov; i++) {
|
|
|
|
length += iov[i].iov_len;
|
|
|
|
}
|
|
|
|
assert(length <= NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData));
|
|
|
|
|
2023-09-25 22:22:33 +03:00
|
|
|
if (client->mode >= NBD_MODE_EXTENDED) {
|
|
|
|
NBDExtendedReplyChunk *chunk = iov->iov_base;
|
|
|
|
|
|
|
|
iov[0].iov_len = sizeof(*chunk);
|
|
|
|
stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC);
|
|
|
|
stw_be_p(&chunk->flags, flags);
|
|
|
|
stw_be_p(&chunk->type, type);
|
|
|
|
stq_be_p(&chunk->cookie, request->cookie);
|
|
|
|
stq_be_p(&chunk->offset, request->from);
|
|
|
|
stq_be_p(&chunk->length, length);
|
|
|
|
} else {
|
|
|
|
NBDStructuredReplyChunk *chunk = iov->iov_base;
|
|
|
|
|
|
|
|
iov[0].iov_len = sizeof(*chunk);
|
|
|
|
stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
|
|
|
|
stw_be_p(&chunk->flags, flags);
|
|
|
|
stw_be_p(&chunk->type, type);
|
|
|
|
stq_be_p(&chunk->cookie, request->cookie);
|
|
|
|
stl_be_p(&chunk->length, length);
|
|
|
|
}
|
2017-10-27 13:40:32 +03:00
|
|
|
}
|
|
|
|
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
static int coroutine_fn nbd_co_send_chunk_done(NBDClient *client,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBDRequest *request,
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
Error **errp)
|
2017-11-09 00:57:03 +03:00
|
|
|
{
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
NBDReply hdr;
|
2017-11-09 00:57:03 +03:00
|
|
|
struct iovec iov[] = {
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
{.iov_base = &hdr},
|
2017-11-09 00:57:03 +03:00
|
|
|
};
|
|
|
|
|
2023-06-08 16:56:34 +03:00
|
|
|
trace_nbd_co_send_chunk_done(request->cookie);
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
set_be_chunk(client, iov, 1, NBD_REPLY_FLAG_DONE,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBD_REPLY_TYPE_NONE, request);
|
2017-11-09 00:57:03 +03:00
|
|
|
return nbd_co_send_iov(client, iov, 1, errp);
|
|
|
|
}
|
|
|
|
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
static int coroutine_fn nbd_co_send_chunk_read(NBDClient *client,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBDRequest *request,
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
uint64_t offset,
|
|
|
|
void *data,
|
2023-08-29 20:58:31 +03:00
|
|
|
uint64_t size,
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
bool final,
|
|
|
|
Error **errp)
|
2017-10-27 13:40:32 +03:00
|
|
|
{
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
NBDReply hdr;
|
2017-11-09 00:57:00 +03:00
|
|
|
NBDStructuredReadData chunk;
|
2017-10-27 13:40:32 +03:00
|
|
|
struct iovec iov[] = {
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
{.iov_base = &hdr},
|
2017-10-27 13:40:32 +03:00
|
|
|
{.iov_base = &chunk, .iov_len = sizeof(chunk)},
|
|
|
|
{.iov_base = data, .iov_len = size}
|
|
|
|
};
|
|
|
|
|
2023-08-29 20:58:31 +03:00
|
|
|
assert(size && size <= NBD_MAX_BUFFER_SIZE);
|
2023-06-08 16:56:34 +03:00
|
|
|
trace_nbd_co_send_chunk_read(request->cookie, offset, data, size);
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
set_be_chunk(client, iov, 3, final ? NBD_REPLY_FLAG_DONE : 0,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBD_REPLY_TYPE_OFFSET_DATA, request);
|
2017-10-27 13:40:32 +03:00
|
|
|
stq_be_p(&chunk.offset, offset);
|
|
|
|
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
return nbd_co_send_iov(client, iov, 3, errp);
|
2017-10-27 13:40:32 +03:00
|
|
|
}
|
2023-08-29 20:58:28 +03:00
|
|
|
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
static int coroutine_fn nbd_co_send_chunk_error(NBDClient *client,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBDRequest *request,
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
uint32_t error,
|
|
|
|
const char *msg,
|
|
|
|
Error **errp)
|
2018-03-08 21:46:32 +03:00
|
|
|
{
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
NBDReply hdr;
|
2018-03-08 21:46:32 +03:00
|
|
|
NBDStructuredError chunk;
|
|
|
|
int nbd_err = system_errno_to_nbd_errno(error);
|
|
|
|
struct iovec iov[] = {
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
{.iov_base = &hdr},
|
2018-03-08 21:46:32 +03:00
|
|
|
{.iov_base = &chunk, .iov_len = sizeof(chunk)},
|
|
|
|
{.iov_base = (char *)msg, .iov_len = msg ? strlen(msg) : 0},
|
|
|
|
};
|
|
|
|
|
|
|
|
assert(nbd_err);
|
2023-06-08 16:56:34 +03:00
|
|
|
trace_nbd_co_send_chunk_error(request->cookie, nbd_err,
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
nbd_err_lookup(nbd_err), msg ? msg : "");
|
|
|
|
set_be_chunk(client, iov, 3, NBD_REPLY_FLAG_DONE,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBD_REPLY_TYPE_ERROR, request);
|
2018-03-08 21:46:32 +03:00
|
|
|
stl_be_p(&chunk.error, nbd_err);
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
stw_be_p(&chunk.message_length, iov[2].iov_len);
|
2018-03-08 21:46:32 +03:00
|
|
|
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
return nbd_co_send_iov(client, iov, 3, errp);
|
2018-03-08 21:46:32 +03:00
|
|
|
}
|
|
|
|
|
2018-03-08 21:46:33 +03:00
|
|
|
/* Do a sparse read and send the structured reply to the client.
|
2022-11-28 17:23:27 +03:00
|
|
|
* Returns -errno if sending fails. blk_co_block_status_above() failure is
|
2018-03-08 21:46:33 +03:00
|
|
|
* reported to the client, at which point this function succeeds.
|
|
|
|
*/
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBDRequest *request,
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
uint64_t offset,
|
|
|
|
uint8_t *data,
|
2023-08-29 20:58:31 +03:00
|
|
|
uint64_t size,
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
NBDExport *exp = client->exp;
|
|
|
|
size_t progress = 0;
|
|
|
|
|
2023-08-29 20:58:31 +03:00
|
|
|
assert(size <= NBD_MAX_BUFFER_SIZE);
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
while (progress < size) {
|
|
|
|
int64_t pnum;
|
2022-11-28 17:23:27 +03:00
|
|
|
int status = blk_co_block_status_above(exp->common.blk, NULL,
|
|
|
|
offset + progress,
|
|
|
|
size - progress, &pnum, NULL,
|
|
|
|
NULL);
|
2017-11-07 06:09:12 +03:00
|
|
|
bool final;
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
|
|
|
|
if (status < 0) {
|
2018-03-08 21:46:33 +03:00
|
|
|
char *msg = g_strdup_printf("unable to check for holes: %s",
|
|
|
|
strerror(-status));
|
|
|
|
|
2023-06-08 16:56:33 +03:00
|
|
|
ret = nbd_co_send_chunk_error(client, request, -status, msg, errp);
|
2018-03-08 21:46:33 +03:00
|
|
|
g_free(msg);
|
|
|
|
return ret;
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
}
|
|
|
|
assert(pnum && pnum <= size - progress);
|
2017-11-07 06:09:12 +03:00
|
|
|
final = progress + pnum == size;
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
if (status & BDRV_BLOCK_ZERO) {
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
NBDReply hdr;
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
NBDStructuredReadHole chunk;
|
|
|
|
struct iovec iov[] = {
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
{.iov_base = &hdr},
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
{.iov_base = &chunk, .iov_len = sizeof(chunk)},
|
|
|
|
};
|
|
|
|
|
2023-06-08 16:56:34 +03:00
|
|
|
trace_nbd_co_send_chunk_read_hole(request->cookie,
|
2023-06-08 16:56:33 +03:00
|
|
|
offset + progress, pnum);
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
set_be_chunk(client, iov, 2,
|
|
|
|
final ? NBD_REPLY_FLAG_DONE : 0,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBD_REPLY_TYPE_OFFSET_HOLE, request);
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
stq_be_p(&chunk.offset, offset + progress);
|
|
|
|
stl_be_p(&chunk.length, pnum);
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
ret = nbd_co_send_iov(client, iov, 2, errp);
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
} else {
|
2023-03-09 11:44:51 +03:00
|
|
|
ret = blk_co_pread(exp->common.blk, offset + progress, pnum,
|
|
|
|
data + progress, 0);
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
error_setg_errno(errp, -ret, "reading from file failed");
|
|
|
|
break;
|
|
|
|
}
|
2023-06-08 16:56:33 +03:00
|
|
|
ret = nbd_co_send_chunk_read(client, request, offset + progress,
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
data + progress, pnum, final, errp);
|
nbd/server: Implement sparse reads atop structured reply
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-07 06:09:11 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (ret < 0) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
progress += pnum;
|
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-02-05 14:20:39 +03:00
|
|
|
typedef struct NBDExtentArray {
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
NBDExtent64 *extents;
|
2020-02-05 14:20:39 +03:00
|
|
|
unsigned int nb_alloc;
|
|
|
|
unsigned int count;
|
|
|
|
uint64_t total_length;
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
bool extended;
|
2020-02-05 14:20:39 +03:00
|
|
|
bool can_add;
|
|
|
|
bool converted_to_be;
|
|
|
|
} NBDExtentArray;
|
|
|
|
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc,
|
|
|
|
NBDMode mode)
|
2020-02-05 14:20:39 +03:00
|
|
|
{
|
|
|
|
NBDExtentArray *ea = g_new0(NBDExtentArray, 1);
|
|
|
|
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
assert(mode >= NBD_MODE_STRUCTURED);
|
2020-02-05 14:20:39 +03:00
|
|
|
ea->nb_alloc = nb_alloc;
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
ea->extents = g_new(NBDExtent64, nb_alloc);
|
|
|
|
ea->extended = mode >= NBD_MODE_EXTENDED;
|
2020-02-05 14:20:39 +03:00
|
|
|
ea->can_add = true;
|
|
|
|
|
|
|
|
return ea;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void nbd_extent_array_free(NBDExtentArray *ea)
|
|
|
|
{
|
|
|
|
g_free(ea->extents);
|
|
|
|
g_free(ea);
|
|
|
|
}
|
2022-03-12 01:22:02 +03:00
|
|
|
G_DEFINE_AUTOPTR_CLEANUP_FUNC(NBDExtentArray, nbd_extent_array_free)
|
2020-02-05 14:20:39 +03:00
|
|
|
|
|
|
|
/* Further modifications of the array after conversion are abandoned */
|
|
|
|
static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
assert(!ea->converted_to_be);
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
assert(ea->extended);
|
2020-02-05 14:20:39 +03:00
|
|
|
ea->can_add = false;
|
|
|
|
ea->converted_to_be = true;
|
|
|
|
|
|
|
|
for (i = 0; i < ea->count; i++) {
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
ea->extents[i].length = cpu_to_be64(ea->extents[i].length);
|
|
|
|
ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags);
|
2020-02-05 14:20:39 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
/* Further modifications of the array after conversion are abandoned */
|
|
|
|
static NBDExtent32 *nbd_extent_array_convert_to_narrow(NBDExtentArray *ea)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
NBDExtent32 *extents = g_new(NBDExtent32, ea->count);
|
|
|
|
|
|
|
|
assert(!ea->converted_to_be);
|
|
|
|
assert(!ea->extended);
|
|
|
|
ea->can_add = false;
|
|
|
|
ea->converted_to_be = true;
|
|
|
|
|
|
|
|
for (i = 0; i < ea->count; i++) {
|
|
|
|
assert((ea->extents[i].length | ea->extents[i].flags) <= UINT32_MAX);
|
|
|
|
extents[i].length = cpu_to_be32(ea->extents[i].length);
|
|
|
|
extents[i].flags = cpu_to_be32(ea->extents[i].flags);
|
|
|
|
}
|
|
|
|
|
|
|
|
return extents;
|
|
|
|
}
|
|
|
|
|
2018-07-04 14:23:02 +03:00
|
|
|
/*
|
2020-02-05 14:20:39 +03:00
|
|
|
* Add extent to NBDExtentArray. If extent can't be added (no available space),
|
|
|
|
* return -1.
|
|
|
|
* For safety, when returning -1 for the first time, .can_add is set to false,
|
2021-12-04 02:15:26 +03:00
|
|
|
* and further calls to nbd_extent_array_add() will crash.
|
|
|
|
* (this avoids the situation where a caller ignores failure to add one extent,
|
|
|
|
* where adding another extent that would squash into the last array entry
|
|
|
|
* would result in an incorrect range reported to the client)
|
2018-07-04 14:23:02 +03:00
|
|
|
*/
|
2020-02-05 14:20:39 +03:00
|
|
|
static int nbd_extent_array_add(NBDExtentArray *ea,
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
uint64_t length, uint32_t flags)
|
2018-03-12 18:21:21 +03:00
|
|
|
{
|
2020-02-05 14:20:39 +03:00
|
|
|
assert(ea->can_add);
|
|
|
|
|
|
|
|
if (!length) {
|
|
|
|
return 0;
|
|
|
|
}
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
if (!ea->extended) {
|
|
|
|
assert(length <= UINT32_MAX);
|
|
|
|
}
|
2020-02-05 14:20:39 +03:00
|
|
|
|
|
|
|
/* Extend previous extent if flags are the same */
|
|
|
|
if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) {
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
uint64_t sum = length + ea->extents[ea->count - 1].length;
|
2020-02-05 14:20:39 +03:00
|
|
|
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
/*
|
|
|
|
* sum cannot overflow: the block layer bounds image size at
|
|
|
|
* 2^63, and ea->extents[].length comes from the block layer.
|
|
|
|
*/
|
|
|
|
assert(sum >= length);
|
|
|
|
if (sum <= UINT32_MAX || ea->extended) {
|
2020-02-05 14:20:39 +03:00
|
|
|
ea->extents[ea->count - 1].length = sum;
|
|
|
|
ea->total_length += length;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ea->count >= ea->nb_alloc) {
|
|
|
|
ea->can_add = false;
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
ea->total_length += length;
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
ea->extents[ea->count] = (NBDExtent64) {.length = length, .flags = flags};
|
2020-02-05 14:20:39 +03:00
|
|
|
ea->count++;
|
2018-03-12 18:21:21 +03:00
|
|
|
|
2020-02-05 14:20:39 +03:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-11-28 17:23:27 +03:00
|
|
|
static int coroutine_fn blockstatus_to_extents(BlockBackend *blk,
|
2022-11-28 17:23:26 +03:00
|
|
|
uint64_t offset, uint64_t bytes,
|
|
|
|
NBDExtentArray *ea)
|
2020-02-05 14:20:39 +03:00
|
|
|
{
|
|
|
|
while (bytes) {
|
2018-03-12 18:21:21 +03:00
|
|
|
uint32_t flags;
|
|
|
|
int64_t num;
|
2022-11-28 17:23:27 +03:00
|
|
|
int ret = blk_co_block_status_above(blk, NULL, offset, bytes, &num,
|
|
|
|
NULL, NULL);
|
2018-07-04 14:23:02 +03:00
|
|
|
|
2018-03-12 18:21:21 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
nbd: server: Report holes for raw images
When querying image extents for raw image, qemu-nbd reports holes as
zero:
$ qemu-nbd -t -r -f raw empty-6g.raw
$ qemu-img map --output json nbd://localhost
[{ "start": 0, "length": 6442450944, "depth": 0, "zero": true, "data": true, "offset": 0}]
$ qemu-img map --output json empty-6g.raw
[{ "start": 0, "length": 6442450944, "depth": 0, "zero": true, "data": false, "offset": 0}]
Turns out that qemu-img map reports a hole based on BDRV_BLOCK_DATA, but
nbd server reports a hole based on BDRV_BLOCK_ALLOCATED.
The NBD protocol says:
NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and
future writes to that area may cause fragmentation or encounter an
NBD_ENOSPC error); if clear, the block is allocated or the server
could not otherwise determine its status.
qemu-img manual says:
whether the sectors contain actual data or not (boolean field data;
if false, the sectors are either unallocated or stored as
optimized all-zero clusters);
To me, data=false looks compatible with NBD_STATE_HOLE. From user point
of view, getting same results from qemu-nbd and qemu-img is more
important than being more correct about allocation status.
Changing nbd server to report holes using BDRV_BLOCK_DATA makes qemu-nbd
results compatible with qemu-img map:
$ qemu-img map --output json nbd://localhost
[{ "start": 0, "length": 6442450944, "depth": 0, "zero": true, "data": false, "offset": 0}]
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-Id: <20210219160752.1826830-1-nsoffer@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-19 19:07:52 +03:00
|
|
|
flags = (ret & BDRV_BLOCK_DATA ? 0 : NBD_STATE_HOLE) |
|
|
|
|
(ret & BDRV_BLOCK_ZERO ? NBD_STATE_ZERO : 0);
|
2018-03-12 18:21:21 +03:00
|
|
|
|
2020-02-05 14:20:39 +03:00
|
|
|
if (nbd_extent_array_add(ea, num, flags) < 0) {
|
|
|
|
return 0;
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
2018-07-04 14:23:02 +03:00
|
|
|
|
2020-02-05 14:20:39 +03:00
|
|
|
offset += num;
|
|
|
|
bytes -= num;
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-11-28 17:23:27 +03:00
|
|
|
static int coroutine_fn blockalloc_to_extents(BlockBackend *blk,
|
2022-11-28 17:23:26 +03:00
|
|
|
uint64_t offset, uint64_t bytes,
|
|
|
|
NBDExtentArray *ea)
|
2020-10-27 08:05:54 +03:00
|
|
|
{
|
|
|
|
while (bytes) {
|
|
|
|
int64_t num;
|
2022-11-28 17:23:27 +03:00
|
|
|
int ret = blk_co_is_allocated_above(blk, NULL, false, offset, bytes,
|
|
|
|
&num);
|
2020-10-27 08:05:54 +03:00
|
|
|
|
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (nbd_extent_array_add(ea, num, ret) < 0) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
offset += num;
|
|
|
|
bytes -= num;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-02-05 14:20:39 +03:00
|
|
|
/*
|
|
|
|
* nbd_co_send_extents
|
2018-06-09 18:17:56 +03:00
|
|
|
*
|
2020-02-05 14:20:39 +03:00
|
|
|
* @ea is converted to BE by the function
|
|
|
|
* @last controls whether NBD_REPLY_FLAG_DONE is sent.
|
2018-06-09 18:17:56 +03:00
|
|
|
*/
|
2023-03-09 11:44:51 +03:00
|
|
|
static int coroutine_fn
|
2023-06-08 16:56:33 +03:00
|
|
|
nbd_co_send_extents(NBDClient *client, NBDRequest *request, NBDExtentArray *ea,
|
2023-03-09 11:44:51 +03:00
|
|
|
bool last, uint32_t context_id, Error **errp)
|
2018-03-12 18:21:21 +03:00
|
|
|
{
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
NBDReply hdr;
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
NBDStructuredMeta meta;
|
|
|
|
NBDExtendedMeta meta_ext;
|
|
|
|
g_autofree NBDExtent32 *extents = NULL;
|
|
|
|
uint16_t type;
|
|
|
|
struct iovec iov[] = { {.iov_base = &hdr}, {0}, {0} };
|
2018-03-12 18:21:21 +03:00
|
|
|
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
if (client->mode >= NBD_MODE_EXTENDED) {
|
|
|
|
type = NBD_REPLY_TYPE_BLOCK_STATUS_EXT;
|
|
|
|
|
|
|
|
iov[1].iov_base = &meta_ext;
|
|
|
|
iov[1].iov_len = sizeof(meta_ext);
|
|
|
|
stl_be_p(&meta_ext.context_id, context_id);
|
|
|
|
stl_be_p(&meta_ext.count, ea->count);
|
|
|
|
|
|
|
|
nbd_extent_array_convert_to_be(ea);
|
|
|
|
iov[2].iov_base = ea->extents;
|
|
|
|
iov[2].iov_len = ea->count * sizeof(ea->extents[0]);
|
|
|
|
} else {
|
|
|
|
type = NBD_REPLY_TYPE_BLOCK_STATUS;
|
|
|
|
|
|
|
|
iov[1].iov_base = &meta;
|
|
|
|
iov[1].iov_len = sizeof(meta);
|
|
|
|
stl_be_p(&meta.context_id, context_id);
|
|
|
|
|
|
|
|
extents = nbd_extent_array_convert_to_narrow(ea);
|
|
|
|
iov[2].iov_base = extents;
|
|
|
|
iov[2].iov_len = ea->count * sizeof(extents[0]);
|
|
|
|
}
|
2020-02-05 14:20:39 +03:00
|
|
|
|
2023-06-08 16:56:34 +03:00
|
|
|
trace_nbd_co_send_extents(request->cookie, ea->count, context_id,
|
2023-06-08 16:56:33 +03:00
|
|
|
ea->total_length, last);
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
set_be_chunk(client, iov, 3, last ? NBD_REPLY_FLAG_DONE : 0, type,
|
|
|
|
request);
|
2018-03-12 18:21:21 +03:00
|
|
|
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
return nbd_co_send_iov(client, iov, 3, errp);
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Get block status from the exported device and send it to the client */
|
2022-11-28 17:23:26 +03:00
|
|
|
static int
|
2023-06-08 16:56:33 +03:00
|
|
|
coroutine_fn nbd_co_send_block_status(NBDClient *client, NBDRequest *request,
|
2022-11-28 17:23:27 +03:00
|
|
|
BlockBackend *blk, uint64_t offset,
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
uint64_t length, bool dont_fragment,
|
2022-11-28 17:23:26 +03:00
|
|
|
bool last, uint32_t context_id,
|
|
|
|
Error **errp)
|
2018-03-12 18:21:21 +03:00
|
|
|
{
|
|
|
|
int ret;
|
2019-05-10 18:17:35 +03:00
|
|
|
unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
g_autoptr(NBDExtentArray) ea =
|
|
|
|
nbd_extent_array_new(nb_extents, client->mode);
|
2018-03-12 18:21:21 +03:00
|
|
|
|
2020-10-27 08:05:54 +03:00
|
|
|
if (context_id == NBD_META_ID_BASE_ALLOCATION) {
|
2022-11-28 17:23:27 +03:00
|
|
|
ret = blockstatus_to_extents(blk, offset, length, ea);
|
2020-10-27 08:05:54 +03:00
|
|
|
} else {
|
2022-11-28 17:23:27 +03:00
|
|
|
ret = blockalloc_to_extents(blk, offset, length, ea);
|
2020-10-27 08:05:54 +03:00
|
|
|
}
|
2018-03-12 18:21:21 +03:00
|
|
|
if (ret < 0) {
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_co_send_chunk_error(client, request, -ret,
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
"can't get block status", errp);
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_co_send_extents(client, request, ea, last, context_id, errp);
|
2018-06-09 18:17:56 +03:00
|
|
|
}
|
|
|
|
|
2020-02-05 14:20:40 +03:00
|
|
|
/* Populate @ea from a dirty bitmap. */
|
2020-02-05 14:20:39 +03:00
|
|
|
static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
|
|
|
|
uint64_t offset, uint64_t length,
|
2020-02-05 14:20:40 +03:00
|
|
|
NBDExtentArray *es)
|
2018-06-09 18:17:56 +03:00
|
|
|
{
|
2020-02-05 14:20:40 +03:00
|
|
|
int64_t start, dirty_start, dirty_count;
|
|
|
|
int64_t end = offset + length;
|
|
|
|
bool full = false;
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
int64_t bound = es->extended ? INT64_MAX : INT32_MAX;
|
2018-06-09 18:17:56 +03:00
|
|
|
|
|
|
|
bdrv_dirty_bitmap_lock(bitmap);
|
|
|
|
|
2020-02-05 14:20:40 +03:00
|
|
|
for (start = offset;
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end, bound,
|
2020-02-05 14:20:40 +03:00
|
|
|
&dirty_start, &dirty_count);
|
|
|
|
start = dirty_start + dirty_count)
|
|
|
|
{
|
|
|
|
if ((nbd_extent_array_add(es, dirty_start - start, 0) < 0) ||
|
|
|
|
(nbd_extent_array_add(es, dirty_count, NBD_STATE_DIRTY) < 0))
|
|
|
|
{
|
|
|
|
full = true;
|
2020-02-05 14:20:39 +03:00
|
|
|
break;
|
|
|
|
}
|
2018-06-09 18:17:56 +03:00
|
|
|
}
|
|
|
|
|
2020-02-05 14:20:40 +03:00
|
|
|
if (!full) {
|
2020-11-06 23:36:11 +03:00
|
|
|
/* last non dirty extent, nothing to do if array is now full */
|
|
|
|
(void) nbd_extent_array_add(es, end - start, 0);
|
2020-02-05 14:20:40 +03:00
|
|
|
}
|
2018-06-09 18:17:56 +03:00
|
|
|
|
|
|
|
bdrv_dirty_bitmap_unlock(bitmap);
|
|
|
|
}
|
|
|
|
|
2023-06-08 16:56:33 +03:00
|
|
|
static int coroutine_fn nbd_co_send_bitmap(NBDClient *client,
|
|
|
|
NBDRequest *request,
|
|
|
|
BdrvDirtyBitmap *bitmap,
|
|
|
|
uint64_t offset,
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
uint64_t length, bool dont_fragment,
|
2023-06-08 16:56:33 +03:00
|
|
|
bool last, uint32_t context_id,
|
|
|
|
Error **errp)
|
2018-06-09 18:17:56 +03:00
|
|
|
{
|
2019-05-10 18:17:35 +03:00
|
|
|
unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
g_autoptr(NBDExtentArray) ea =
|
|
|
|
nbd_extent_array_new(nb_extents, client->mode);
|
2018-06-09 18:17:56 +03:00
|
|
|
|
2020-02-05 14:20:40 +03:00
|
|
|
bitmap_to_extents(bitmap, offset, length, ea);
|
2018-06-09 18:17:56 +03:00
|
|
|
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_co_send_extents(client, request, ea, last, context_id, errp);
|
2018-03-12 18:21:21 +03:00
|
|
|
}
|
|
|
|
|
nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
Allow a client to request a subset of negotiated meta contexts. For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).
Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads. Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.
This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.
Of note: qemu will always advertise the new feature bit during
NBD_OPT_INFO if extended headers have alreay been negotiated
(regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
occurred); but for NBD_OPT_GO, qemu only advertises the feature if
block status is also enabled (that is, if the client does not
negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
the feature is not advertised).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-26-eblake@redhat.com>
[eblake: fix logic to reject unnegotiated contexts]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:42 +03:00
|
|
|
/*
|
|
|
|
* nbd_co_block_status_payload_read
|
|
|
|
* Called when a client wants a subset of negotiated contexts via a
|
|
|
|
* BLOCK_STATUS payload. Check the payload for valid length and
|
|
|
|
* contents. On success, return 0 with request updated to effective
|
|
|
|
* length. If request was invalid but all payload consumed, return 0
|
|
|
|
* with request->len and request->contexts->count set to 0 (which will
|
|
|
|
* trigger an appropriate NBD_EINVAL response later on). Return
|
|
|
|
* negative errno if the payload was not fully consumed.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
uint64_t payload_len = request->len;
|
|
|
|
g_autofree char *buf = NULL;
|
|
|
|
size_t count, i, nr_bitmaps;
|
|
|
|
uint32_t id;
|
|
|
|
|
|
|
|
if (payload_len > NBD_MAX_BUFFER_SIZE) {
|
|
|
|
error_setg(errp, "len (%" PRIu64 ") is larger than max len (%u)",
|
|
|
|
request->len, NBD_MAX_BUFFER_SIZE);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(client->contexts.exp == client->exp);
|
|
|
|
nr_bitmaps = client->exp->nr_export_bitmaps;
|
|
|
|
request->contexts = g_new0(NBDMetaContexts, 1);
|
|
|
|
request->contexts->exp = client->exp;
|
|
|
|
|
|
|
|
if (payload_len % sizeof(uint32_t) ||
|
|
|
|
payload_len < sizeof(NBDBlockStatusPayload) ||
|
|
|
|
payload_len > (sizeof(NBDBlockStatusPayload) +
|
|
|
|
sizeof(id) * client->contexts.count)) {
|
|
|
|
goto skip;
|
|
|
|
}
|
|
|
|
|
|
|
|
buf = g_malloc(payload_len);
|
|
|
|
if (nbd_read(client->ioc, buf, payload_len,
|
|
|
|
"CMD_BLOCK_STATUS data", errp) < 0) {
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
trace_nbd_co_receive_request_payload_received(request->cookie,
|
|
|
|
payload_len);
|
|
|
|
request->contexts->bitmaps = g_new0(bool, nr_bitmaps);
|
|
|
|
count = (payload_len - sizeof(NBDBlockStatusPayload)) / sizeof(id);
|
|
|
|
payload_len = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < count; i++) {
|
|
|
|
id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
|
|
|
|
if (id == NBD_META_ID_BASE_ALLOCATION) {
|
|
|
|
if (!client->contexts.base_allocation ||
|
|
|
|
request->contexts->base_allocation) {
|
|
|
|
goto skip;
|
|
|
|
}
|
|
|
|
request->contexts->base_allocation = true;
|
|
|
|
} else if (id == NBD_META_ID_ALLOCATION_DEPTH) {
|
|
|
|
if (!client->contexts.allocation_depth ||
|
|
|
|
request->contexts->allocation_depth) {
|
|
|
|
goto skip;
|
|
|
|
}
|
|
|
|
request->contexts->allocation_depth = true;
|
|
|
|
} else {
|
|
|
|
unsigned idx = id - NBD_META_ID_DIRTY_BITMAP;
|
|
|
|
|
|
|
|
if (idx >= nr_bitmaps || !client->contexts.bitmaps[idx] ||
|
|
|
|
request->contexts->bitmaps[idx]) {
|
|
|
|
goto skip;
|
|
|
|
}
|
|
|
|
request->contexts->bitmaps[idx] = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
request->len = ldq_be_p(buf);
|
|
|
|
request->contexts->count = count;
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
skip:
|
|
|
|
trace_nbd_co_receive_block_status_payload_compliance(request->from,
|
|
|
|
request->len);
|
|
|
|
request->len = request->contexts->count = 0;
|
|
|
|
return nbd_drop(client->ioc, payload_len, errp);
|
|
|
|
}
|
|
|
|
|
2017-06-02 18:01:44 +03:00
|
|
|
/* nbd_co_receive_request
|
|
|
|
* Collect a client request. Return 0 if request looks valid, -EIO to drop
|
2020-12-14 20:05:18 +03:00
|
|
|
* connection right away, -EAGAIN to indicate we were interrupted and the
|
|
|
|
* channel should be quiesced, and any other negative value to report an error
|
|
|
|
* to the client (although the caller may still need to disconnect after
|
|
|
|
* reporting the error).
|
2017-06-02 18:01:44 +03:00
|
|
|
*/
|
2023-08-29 20:58:32 +03:00
|
|
|
static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
|
|
|
|
NBDRequest *request,
|
2023-03-09 11:44:51 +03:00
|
|
|
Error **errp)
|
2011-09-19 17:07:54 +04:00
|
|
|
{
|
2011-10-07 18:47:56 +04:00
|
|
|
NBDClient *client = req->client;
|
nbd/server: Support a request payload
Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (64 bits, although we generally assume 63 bits because
of off_t limitations). Without that extension, only the NBD_CMD_WRITE
request has a payload; but with the extension, it makes sense to allow
at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
in a future patch (where the payload is a limited-size struct that in
turn gives the real effect length as well as a subset of known ids for
which status is requested). Other future NBD commands may also have a
request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.
According to the spec, a client should never send a command with a
payload without the negotiation phase proving such extension is
available. So in the unlikely event the bit is set or cleared
incorrectly, the client is already at fault; if the client then
provides the payload, we can gracefully consume it off the wire and
fail the command with NBD_EINVAL (subsequent checks for magic numbers
ensure we are still in sync), while if the client fails to send
payload we block waiting for it (basically deadlocking our connection
to the bad client, but not negatively impacting our ability to service
other clients, so not a security risk). Note that we do not support
the payload version of BLOCK_STATUS yet.
This patch also fixes a latent bug introduced in b2578459: once
request->len can be 64 bits, assigning it to a 32-bit payload_len can
cause wraparound to 0 which then sets req->complete prematurely;
thankfully, the bug was not possible back then (it takes this and
later patches to even allow request->len larger than 32 bits; and
since previously the only 'payload_len = request->len' assignment was
in NBD_CMD_WRITE which also sets check_length, which in turn rejects
lengths larger than 32M before relying on any possibly-truncated value
stored in payload_len).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-15-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[eblake: enhance comment on handling client error, fix type bug]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:31 +03:00
|
|
|
bool extended_with_payload;
|
2023-08-29 20:58:32 +03:00
|
|
|
bool check_length = false;
|
|
|
|
bool check_rofs = false;
|
|
|
|
bool allocate_buffer = false;
|
nbd/server: Support a request payload
Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (64 bits, although we generally assume 63 bits because
of off_t limitations). Without that extension, only the NBD_CMD_WRITE
request has a payload; but with the extension, it makes sense to allow
at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
in a future patch (where the payload is a limited-size struct that in
turn gives the real effect length as well as a subset of known ids for
which status is requested). Other future NBD commands may also have a
request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.
According to the spec, a client should never send a command with a
payload without the negotiation phase proving such extension is
available. So in the unlikely event the bit is set or cleared
incorrectly, the client is already at fault; if the client then
provides the payload, we can gracefully consume it off the wire and
fail the command with NBD_EINVAL (subsequent checks for magic numbers
ensure we are still in sync), while if the client fails to send
payload we block waiting for it (basically deadlocking our connection
to the bad client, but not negatively impacting our ability to service
other clients, so not a security risk). Note that we do not support
the payload version of BLOCK_STATUS yet.
This patch also fixes a latent bug introduced in b2578459: once
request->len can be 64 bits, assigning it to a 32-bit payload_len can
cause wraparound to 0 which then sets req->complete prematurely;
thankfully, the bug was not possible back then (it takes this and
later patches to even allow request->len larger than 32 bits; and
since previously the only 'payload_len = request->len' assignment was
in NBD_CMD_WRITE which also sets check_length, which in turn rejects
lengths larger than 32M before relying on any possibly-truncated value
stored in payload_len).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-15-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[eblake: enhance comment on handling client error, fix type bug]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:31 +03:00
|
|
|
bool payload_okay = false;
|
|
|
|
uint64_t payload_len = 0;
|
2023-08-29 20:58:32 +03:00
|
|
|
int valid_flags = NBD_CMD_FLAG_FUA;
|
2020-12-14 20:05:18 +03:00
|
|
|
int ret;
|
2011-09-19 17:07:54 +04:00
|
|
|
|
2016-02-10 21:41:04 +03:00
|
|
|
g_assert(qemu_in_coroutine());
|
2020-12-14 20:05:18 +03:00
|
|
|
ret = nbd_receive_request(client, request, errp);
|
|
|
|
if (ret < 0) {
|
2021-12-04 02:15:26 +03:00
|
|
|
return ret;
|
2011-09-19 17:07:54 +04:00
|
|
|
}
|
|
|
|
|
2023-06-08 16:56:34 +03:00
|
|
|
trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
|
2017-07-07 23:30:43 +03:00
|
|
|
nbd_cmd_lookup(request->type));
|
nbd/server: Support a request payload
Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (64 bits, although we generally assume 63 bits because
of off_t limitations). Without that extension, only the NBD_CMD_WRITE
request has a payload; but with the extension, it makes sense to allow
at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
in a future patch (where the payload is a limited-size struct that in
turn gives the real effect length as well as a subset of known ids for
which status is requested). Other future NBD commands may also have a
request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.
According to the spec, a client should never send a command with a
payload without the negotiation phase proving such extension is
available. So in the unlikely event the bit is set or cleared
incorrectly, the client is already at fault; if the client then
provides the payload, we can gracefully consume it off the wire and
fail the command with NBD_EINVAL (subsequent checks for magic numbers
ensure we are still in sync), while if the client fails to send
payload we block waiting for it (basically deadlocking our connection
to the bad client, but not negatively impacting our ability to service
other clients, so not a security risk). Note that we do not support
the payload version of BLOCK_STATUS yet.
This patch also fixes a latent bug introduced in b2578459: once
request->len can be 64 bits, assigning it to a 32-bit payload_len can
cause wraparound to 0 which then sets req->complete prematurely;
thankfully, the bug was not possible back then (it takes this and
later patches to even allow request->len larger than 32 bits; and
since previously the only 'payload_len = request->len' assignment was
in NBD_CMD_WRITE which also sets check_length, which in turn rejects
lengths larger than 32M before relying on any possibly-truncated value
stored in payload_len).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-15-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[eblake: enhance comment on handling client error, fix type bug]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:31 +03:00
|
|
|
extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
|
|
|
|
request->flags & NBD_CMD_FLAG_PAYLOAD_LEN;
|
|
|
|
if (extended_with_payload) {
|
|
|
|
payload_len = request->len;
|
|
|
|
check_length = true;
|
|
|
|
}
|
|
|
|
|
2023-08-29 20:58:32 +03:00
|
|
|
switch (request->type) {
|
|
|
|
case NBD_CMD_DISC:
|
2016-05-12 01:39:37 +03:00
|
|
|
/* Special case: we're going to disconnect without a reply,
|
|
|
|
* whether or not flags, from, or len are bogus */
|
2023-08-29 20:58:32 +03:00
|
|
|
req->complete = true;
|
2017-06-02 18:01:45 +03:00
|
|
|
return -EIO;
|
2016-05-12 01:39:37 +03:00
|
|
|
|
2023-08-29 20:58:32 +03:00
|
|
|
case NBD_CMD_READ:
|
|
|
|
if (client->mode >= NBD_MODE_STRUCTURED) {
|
|
|
|
valid_flags |= NBD_CMD_FLAG_DF;
|
2016-01-07 16:32:42 +03:00
|
|
|
}
|
2023-08-29 20:58:32 +03:00
|
|
|
check_length = true;
|
|
|
|
allocate_buffer = true;
|
|
|
|
break;
|
2016-01-07 16:32:42 +03:00
|
|
|
|
2023-08-29 20:58:32 +03:00
|
|
|
case NBD_CMD_WRITE:
|
nbd/server: Support a request payload
Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (64 bits, although we generally assume 63 bits because
of off_t limitations). Without that extension, only the NBD_CMD_WRITE
request has a payload; but with the extension, it makes sense to allow
at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
in a future patch (where the payload is a limited-size struct that in
turn gives the real effect length as well as a subset of known ids for
which status is requested). Other future NBD commands may also have a
request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.
According to the spec, a client should never send a command with a
payload without the negotiation phase proving such extension is
available. So in the unlikely event the bit is set or cleared
incorrectly, the client is already at fault; if the client then
provides the payload, we can gracefully consume it off the wire and
fail the command with NBD_EINVAL (subsequent checks for magic numbers
ensure we are still in sync), while if the client fails to send
payload we block waiting for it (basically deadlocking our connection
to the bad client, but not negatively impacting our ability to service
other clients, so not a security risk). Note that we do not support
the payload version of BLOCK_STATUS yet.
This patch also fixes a latent bug introduced in b2578459: once
request->len can be 64 bits, assigning it to a 32-bit payload_len can
cause wraparound to 0 which then sets req->complete prematurely;
thankfully, the bug was not possible back then (it takes this and
later patches to even allow request->len larger than 32 bits; and
since previously the only 'payload_len = request->len' assignment was
in NBD_CMD_WRITE which also sets check_length, which in turn rejects
lengths larger than 32M before relying on any possibly-truncated value
stored in payload_len).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-15-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[eblake: enhance comment on handling client error, fix type bug]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:31 +03:00
|
|
|
if (client->mode >= NBD_MODE_EXTENDED) {
|
|
|
|
if (!extended_with_payload) {
|
|
|
|
/* The client is noncompliant. Trace it, but proceed. */
|
|
|
|
trace_nbd_co_receive_ext_payload_compliance(request->from,
|
|
|
|
request->len);
|
|
|
|
}
|
|
|
|
valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
|
|
|
|
}
|
|
|
|
payload_okay = true;
|
2023-08-29 20:58:32 +03:00
|
|
|
payload_len = request->len;
|
|
|
|
check_length = true;
|
|
|
|
allocate_buffer = true;
|
|
|
|
check_rofs = true;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case NBD_CMD_FLUSH:
|
|
|
|
break;
|
|
|
|
|
|
|
|
case NBD_CMD_TRIM:
|
|
|
|
check_rofs = true;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case NBD_CMD_CACHE:
|
|
|
|
check_length = true;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case NBD_CMD_WRITE_ZEROES:
|
|
|
|
valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO;
|
|
|
|
check_rofs = true;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case NBD_CMD_BLOCK_STATUS:
|
nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
Allow a client to request a subset of negotiated meta contexts. For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).
Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads. Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.
This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.
Of note: qemu will always advertise the new feature bit during
NBD_OPT_INFO if extended headers have alreay been negotiated
(regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
occurred); but for NBD_OPT_GO, qemu only advertises the feature if
block status is also enabled (that is, if the client does not
negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
the feature is not advertised).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-26-eblake@redhat.com>
[eblake: fix logic to reject unnegotiated contexts]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:42 +03:00
|
|
|
if (extended_with_payload) {
|
|
|
|
ret = nbd_co_block_status_payload_read(client, request, errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
/* payload now consumed */
|
|
|
|
check_length = false;
|
|
|
|
payload_len = 0;
|
|
|
|
valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
|
|
|
|
} else {
|
|
|
|
request->contexts = &client->contexts;
|
|
|
|
}
|
2023-08-29 20:58:32 +03:00
|
|
|
valid_flags |= NBD_CMD_FLAG_REQ_ONE;
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
/* Unrecognized, will fail later */
|
|
|
|
;
|
2013-05-02 16:23:08 +04:00
|
|
|
}
|
2019-07-25 13:05:50 +03:00
|
|
|
|
2023-08-29 20:58:32 +03:00
|
|
|
/* Payload and buffer handling. */
|
|
|
|
if (!payload_len) {
|
|
|
|
req->complete = true;
|
|
|
|
}
|
|
|
|
if (check_length && request->len > NBD_MAX_BUFFER_SIZE) {
|
|
|
|
/* READ, WRITE, CACHE */
|
|
|
|
error_setg(errp, "len (%" PRIu64 ") is larger than max len (%u)",
|
|
|
|
request->len, NBD_MAX_BUFFER_SIZE);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
nbd/server: Support a request payload
Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (64 bits, although we generally assume 63 bits because
of off_t limitations). Without that extension, only the NBD_CMD_WRITE
request has a payload; but with the extension, it makes sense to allow
at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
in a future patch (where the payload is a limited-size struct that in
turn gives the real effect length as well as a subset of known ids for
which status is requested). Other future NBD commands may also have a
request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.
According to the spec, a client should never send a command with a
payload without the negotiation phase proving such extension is
available. So in the unlikely event the bit is set or cleared
incorrectly, the client is already at fault; if the client then
provides the payload, we can gracefully consume it off the wire and
fail the command with NBD_EINVAL (subsequent checks for magic numbers
ensure we are still in sync), while if the client fails to send
payload we block waiting for it (basically deadlocking our connection
to the bad client, but not negatively impacting our ability to service
other clients, so not a security risk). Note that we do not support
the payload version of BLOCK_STATUS yet.
This patch also fixes a latent bug introduced in b2578459: once
request->len can be 64 bits, assigning it to a 32-bit payload_len can
cause wraparound to 0 which then sets req->complete prematurely;
thankfully, the bug was not possible back then (it takes this and
later patches to even allow request->len larger than 32 bits; and
since previously the only 'payload_len = request->len' assignment was
in NBD_CMD_WRITE which also sets check_length, which in turn rejects
lengths larger than 32M before relying on any possibly-truncated value
stored in payload_len).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-15-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[eblake: enhance comment on handling client error, fix type bug]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:31 +03:00
|
|
|
if (payload_len && !payload_okay) {
|
|
|
|
/*
|
|
|
|
* For now, we don't support payloads on other commands; but
|
|
|
|
* we can keep the connection alive by ignoring the payload.
|
|
|
|
* We will fail the command later with NBD_EINVAL for the use
|
|
|
|
* of an unsupported flag (and not for access beyond bounds).
|
|
|
|
*/
|
|
|
|
assert(request->type != NBD_CMD_WRITE);
|
|
|
|
request->len = 0;
|
|
|
|
}
|
2023-08-29 20:58:32 +03:00
|
|
|
if (allocate_buffer) {
|
|
|
|
/* READ, WRITE */
|
|
|
|
req->data = blk_try_blockalign(client->exp->common.blk,
|
|
|
|
request->len);
|
|
|
|
if (req->data == NULL) {
|
|
|
|
error_setg(errp, "No memory");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (payload_len) {
|
nbd/server: Support a request payload
Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (64 bits, although we generally assume 63 bits because
of off_t limitations). Without that extension, only the NBD_CMD_WRITE
request has a payload; but with the extension, it makes sense to allow
at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
in a future patch (where the payload is a limited-size struct that in
turn gives the real effect length as well as a subset of known ids for
which status is requested). Other future NBD commands may also have a
request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.
According to the spec, a client should never send a command with a
payload without the negotiation phase proving such extension is
available. So in the unlikely event the bit is set or cleared
incorrectly, the client is already at fault; if the client then
provides the payload, we can gracefully consume it off the wire and
fail the command with NBD_EINVAL (subsequent checks for magic numbers
ensure we are still in sync), while if the client fails to send
payload we block waiting for it (basically deadlocking our connection
to the bad client, but not negatively impacting our ability to service
other clients, so not a security risk). Note that we do not support
the payload version of BLOCK_STATUS yet.
This patch also fixes a latent bug introduced in b2578459: once
request->len can be 64 bits, assigning it to a 32-bit payload_len can
cause wraparound to 0 which then sets req->complete prematurely;
thankfully, the bug was not possible back then (it takes this and
later patches to even allow request->len larger than 32 bits; and
since previously the only 'payload_len = request->len' assignment was
in NBD_CMD_WRITE which also sets check_length, which in turn rejects
lengths larger than 32M before relying on any possibly-truncated value
stored in payload_len).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-15-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[eblake: enhance comment on handling client error, fix type bug]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:31 +03:00
|
|
|
if (payload_okay) {
|
|
|
|
/* WRITE */
|
|
|
|
assert(req->data);
|
|
|
|
ret = nbd_read(client->ioc, req->data, payload_len,
|
|
|
|
"CMD_WRITE data", errp);
|
|
|
|
} else {
|
|
|
|
ret = nbd_drop(client->ioc, payload_len, errp);
|
|
|
|
}
|
2023-08-29 20:58:32 +03:00
|
|
|
if (ret < 0) {
|
2017-06-02 18:01:45 +03:00
|
|
|
return -EIO;
|
2011-09-19 17:07:54 +04:00
|
|
|
}
|
2016-05-12 01:39:37 +03:00
|
|
|
req->complete = true;
|
2023-06-08 16:56:34 +03:00
|
|
|
trace_nbd_co_receive_request_payload_received(request->cookie,
|
2023-08-29 20:58:32 +03:00
|
|
|
payload_len);
|
2011-09-19 17:07:54 +04:00
|
|
|
}
|
2016-05-12 01:39:37 +03:00
|
|
|
|
nbd/server: Fix error reporting for bad requests
The NBD spec says an attempt to NBD_CMD_TRIM on a read-only
export should fail with EPERM, as a trim has the potential
to change disk contents, but we were relying on the block
layer to catch that for us, which might not always give the
right error (and even if it does, it does not let us pass
back a sane message for structured replies).
The NBD spec says an attempt to NBD_CMD_WRITE_ZEROES out of
bounds should fail with ENOSPC, not EINVAL.
Our check for u64 offset + u32 length wraparound up front is
pointless; nothing uses offset until after the second round
of sanity checks, and we can just as easily ensure there is
no wraparound by checking whether offset is in bounds (since
a disk size cannot exceed off_t which is 63 bits, adding a
32-bit number for a valid offset can't overflow). Bonus:
dropping the up-front check lets us keep the connection alive
after NBD_CMD_WRITE, whereas before we would drop the
connection (of course, any client sending a packet that would
trigger the failure is already buggy, so it's also okay to
drop the connection, but better quality-of-implementation
never hurts).
Solve all of these issues by some code motion and improved
request validation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171115213557.3548-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-16 00:35:56 +03:00
|
|
|
/* Sanity checks. */
|
2023-08-29 20:58:32 +03:00
|
|
|
if (client->exp->nbdflags & NBD_FLAG_READ_ONLY && check_rofs) {
|
|
|
|
/* WRITE, TRIM, WRITE_ZEROES */
|
nbd/server: Fix error reporting for bad requests
The NBD spec says an attempt to NBD_CMD_TRIM on a read-only
export should fail with EPERM, as a trim has the potential
to change disk contents, but we were relying on the block
layer to catch that for us, which might not always give the
right error (and even if it does, it does not let us pass
back a sane message for structured replies).
The NBD spec says an attempt to NBD_CMD_WRITE_ZEROES out of
bounds should fail with ENOSPC, not EINVAL.
Our check for u64 offset + u32 length wraparound up front is
pointless; nothing uses offset until after the second round
of sanity checks, and we can just as easily ensure there is
no wraparound by checking whether offset is in bounds (since
a disk size cannot exceed off_t which is 63 bits, adding a
32-bit number for a valid offset can't overflow). Bonus:
dropping the up-front check lets us keep the connection alive
after NBD_CMD_WRITE, whereas before we would drop the
connection (of course, any client sending a packet that would
trigger the failure is already buggy, so it's also okay to
drop the connection, but better quality-of-implementation
never hurts).
Solve all of these issues by some code motion and improved
request validation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171115213557.3548-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-16 00:35:56 +03:00
|
|
|
error_setg(errp, "Export is read-only");
|
|
|
|
return -EROFS;
|
|
|
|
}
|
|
|
|
if (request->from > client->exp->size ||
|
2019-01-17 22:36:43 +03:00
|
|
|
request->len > client->exp->size - request->from) {
|
2023-08-29 20:58:31 +03:00
|
|
|
error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu64
|
2017-07-07 18:29:11 +03:00
|
|
|
", Size: %" PRIu64, request->from, request->len,
|
2019-01-17 22:36:43 +03:00
|
|
|
client->exp->size);
|
nbd/server: Fix error reporting for bad requests
The NBD spec says an attempt to NBD_CMD_TRIM on a read-only
export should fail with EPERM, as a trim has the potential
to change disk contents, but we were relying on the block
layer to catch that for us, which might not always give the
right error (and even if it does, it does not let us pass
back a sane message for structured replies).
The NBD spec says an attempt to NBD_CMD_WRITE_ZEROES out of
bounds should fail with ENOSPC, not EINVAL.
Our check for u64 offset + u32 length wraparound up front is
pointless; nothing uses offset until after the second round
of sanity checks, and we can just as easily ensure there is
no wraparound by checking whether offset is in bounds (since
a disk size cannot exceed off_t which is 63 bits, adding a
32-bit number for a valid offset can't overflow). Bonus:
dropping the up-front check lets us keep the connection alive
after NBD_CMD_WRITE, whereas before we would drop the
connection (of course, any client sending a packet that would
trigger the failure is already buggy, so it's also okay to
drop the connection, but better quality-of-implementation
never hurts).
Solve all of these issues by some code motion and improved
request validation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171115213557.3548-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-16 00:35:56 +03:00
|
|
|
return (request->type == NBD_CMD_WRITE ||
|
|
|
|
request->type == NBD_CMD_WRITE_ZEROES) ? -ENOSPC : -EINVAL;
|
2016-05-12 01:39:37 +03:00
|
|
|
}
|
nbd/server: Trace client noncompliance on unaligned requests
We've recently added traces for clients to flag server non-compliance;
let's do the same for servers to flag client non-compliance. According
to the spec, if the client requests NBD_INFO_BLOCK_SIZE, it is
promising to send all requests aligned to those boundaries. Of
course, if the client does not request NBD_INFO_BLOCK_SIZE, then it
made no promises so we shouldn't flag anything; and because we are
willing to handle clients that made no promises (the spec allows us to
use NBD_REP_ERR_BLOCK_SIZE_REQD if we had been unwilling), we already
have to handle unaligned requests (which the block layer already does
on our behalf). So even though the spec allows us to return EINVAL
for clients that promised to behave, it's easier to always answer
unaligned requests. Still, flagging non-compliance can be useful in
debugging a client that is trying to be maximally portable.
Qemu as client used to have one spot where it sent non-compliant
requests: if the server sends an unaligned reply to
NBD_CMD_BLOCK_STATUS, and the client was iterating over the entire
disk, the next request would start at that unaligned point; this was
fixed in commit a39286dd when the client was taught to work around
server non-compliance; but is equally fixed if the server is patched
to not send unaligned replies in the first place (yes, qemu 4.0 as
server still has few such bugs, although they will be patched in
4.1). Fortunately, I did not find any more spots where qemu as client
was non-compliant. I was able to test the patch by using the following
hack to convince qemu-io to run various unaligned commands, coupled
with serving 512-byte alignment by intentionally omitting '-f raw' on
the server while viewing server traces.
| diff --git i/nbd/client.c w/nbd/client.c
| index 427980bdd22..1858b2aac35 100644
| --- i/nbd/client.c
| +++ w/nbd/client.c
| @@ -449,6 +449,7 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt,
| nbd_send_opt_abort(ioc);
| return -1;
| }
| + info->min_block = 1;//hack
| if (!is_power_of_2(info->min_block)) {
| error_setg(errp, "server minimum block size %" PRIu32
| " is not a power of two", info->min_block);
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190403030526.12258-3-eblake@redhat.com>
[eblake: address minor review nits]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-03 06:05:21 +03:00
|
|
|
if (client->check_align && !QEMU_IS_ALIGNED(request->from | request->len,
|
|
|
|
client->check_align)) {
|
|
|
|
/*
|
|
|
|
* The block layer gracefully handles unaligned requests, but
|
|
|
|
* it's still worth tracing client non-compliance
|
|
|
|
*/
|
|
|
|
trace_nbd_co_receive_align_compliance(nbd_cmd_lookup(request->type),
|
|
|
|
request->from,
|
|
|
|
request->len,
|
|
|
|
client->check_align);
|
|
|
|
}
|
2017-10-27 13:40:32 +03:00
|
|
|
if (request->flags & ~valid_flags) {
|
|
|
|
error_setg(errp, "unsupported flags for command %s (got 0x%x)",
|
|
|
|
nbd_cmd_lookup(request->type), request->flags);
|
2017-06-02 18:01:45 +03:00
|
|
|
return -EINVAL;
|
2016-10-14 21:33:17 +03:00
|
|
|
}
|
2016-05-12 01:39:37 +03:00
|
|
|
|
2017-06-02 18:01:45 +03:00
|
|
|
return 0;
|
2011-09-19 17:07:54 +04:00
|
|
|
}
|
|
|
|
|
2018-03-08 21:46:35 +03:00
|
|
|
/* Send simple reply without a payload, or a structured error
|
|
|
|
* @error_msg is ignored if @ret >= 0
|
|
|
|
* Returns 0 if connection is still live, -errno on failure to talk to client
|
|
|
|
*/
|
|
|
|
static coroutine_fn int nbd_send_generic_reply(NBDClient *client,
|
2023-06-08 16:56:33 +03:00
|
|
|
NBDRequest *request,
|
2018-03-08 21:46:35 +03:00
|
|
|
int ret,
|
|
|
|
const char *error_msg,
|
|
|
|
Error **errp)
|
|
|
|
{
|
2023-08-29 20:58:28 +03:00
|
|
|
if (client->mode >= NBD_MODE_STRUCTURED && ret < 0) {
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_co_send_chunk_error(client, request, -ret, error_msg, errp);
|
2023-09-25 22:22:33 +03:00
|
|
|
} else if (client->mode >= NBD_MODE_EXTENDED) {
|
|
|
|
return nbd_co_send_chunk_done(client, request, errp);
|
2018-03-08 21:46:35 +03:00
|
|
|
} else {
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_co_send_simple_reply(client, request, ret < 0 ? -ret : 0,
|
2018-03-08 21:46:35 +03:00
|
|
|
NULL, 0, errp);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Handle NBD_CMD_READ request.
|
|
|
|
* Return -errno if sending fails. Other errors are reported directly to the
|
|
|
|
* client as an error reply. */
|
|
|
|
static coroutine_fn int nbd_do_cmd_read(NBDClient *client, NBDRequest *request,
|
|
|
|
uint8_t *data, Error **errp)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
NBDExport *exp = client->exp;
|
|
|
|
|
2019-07-25 13:05:50 +03:00
|
|
|
assert(request->type == NBD_CMD_READ);
|
2023-08-29 20:58:31 +03:00
|
|
|
assert(request->len <= NBD_MAX_BUFFER_SIZE);
|
2018-03-08 21:46:35 +03:00
|
|
|
|
|
|
|
/* XXX: NBD Protocol only documents use of FUA with WRITE */
|
|
|
|
if (request->flags & NBD_CMD_FLAG_FUA) {
|
2020-09-24 18:27:08 +03:00
|
|
|
ret = blk_co_flush(exp->common.blk);
|
2018-03-08 21:46:35 +03:00
|
|
|
if (ret < 0) {
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_send_generic_reply(client, request, ret,
|
2018-03-08 21:46:35 +03:00
|
|
|
"flush failed", errp);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-08-29 20:58:28 +03:00
|
|
|
if (client->mode >= NBD_MODE_STRUCTURED &&
|
|
|
|
!(request->flags & NBD_CMD_FLAG_DF) && request->len)
|
2018-10-03 17:47:38 +03:00
|
|
|
{
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_co_send_sparse_read(client, request, request->from,
|
2018-03-08 21:46:35 +03:00
|
|
|
data, request->len, errp);
|
|
|
|
}
|
|
|
|
|
2023-03-09 11:44:51 +03:00
|
|
|
ret = blk_co_pread(exp->common.blk, request->from, request->len, data, 0);
|
2019-07-25 13:05:50 +03:00
|
|
|
if (ret < 0) {
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_send_generic_reply(client, request, ret,
|
2018-03-08 21:46:35 +03:00
|
|
|
"reading from file failed", errp);
|
|
|
|
}
|
|
|
|
|
2023-08-29 20:58:28 +03:00
|
|
|
if (client->mode >= NBD_MODE_STRUCTURED) {
|
2018-03-08 21:46:35 +03:00
|
|
|
if (request->len) {
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_co_send_chunk_read(client, request, request->from, data,
|
nbd/server: Prepare for alternate-size headers
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-08 16:56:32 +03:00
|
|
|
request->len, true, errp);
|
2018-03-08 21:46:35 +03:00
|
|
|
} else {
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_co_send_chunk_done(client, request, errp);
|
2018-03-08 21:46:35 +03:00
|
|
|
}
|
|
|
|
} else {
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_co_send_simple_reply(client, request, 0,
|
2018-03-08 21:46:35 +03:00
|
|
|
data, request->len, errp);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-07-25 13:05:50 +03:00
|
|
|
/*
|
|
|
|
* nbd_do_cmd_cache
|
|
|
|
*
|
|
|
|
* Handle NBD_CMD_CACHE request.
|
|
|
|
* Return -errno if sending fails. Other errors are reported directly to the
|
|
|
|
* client as an error reply.
|
|
|
|
*/
|
|
|
|
static coroutine_fn int nbd_do_cmd_cache(NBDClient *client, NBDRequest *request,
|
|
|
|
Error **errp)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
NBDExport *exp = client->exp;
|
|
|
|
|
|
|
|
assert(request->type == NBD_CMD_CACHE);
|
2023-08-29 20:58:31 +03:00
|
|
|
assert(request->len <= NBD_MAX_BUFFER_SIZE);
|
2019-07-25 13:05:50 +03:00
|
|
|
|
2020-09-24 18:27:08 +03:00
|
|
|
ret = blk_co_preadv(exp->common.blk, request->from, request->len,
|
2019-07-25 13:05:50 +03:00
|
|
|
NULL, BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);
|
|
|
|
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_send_generic_reply(client, request, ret,
|
2019-07-25 13:05:50 +03:00
|
|
|
"caching data failed", errp);
|
|
|
|
}
|
|
|
|
|
2018-03-13 01:14:28 +03:00
|
|
|
/* Handle NBD request.
|
|
|
|
* Return -errno if sending fails. Other errors are reported directly to the
|
|
|
|
* client as an error reply. */
|
|
|
|
static coroutine_fn int nbd_handle_request(NBDClient *client,
|
|
|
|
NBDRequest *request,
|
|
|
|
uint8_t *data, Error **errp)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
int flags;
|
|
|
|
NBDExport *exp = client->exp;
|
|
|
|
char *msg;
|
2020-10-27 08:05:52 +03:00
|
|
|
size_t i;
|
2018-03-13 01:14:28 +03:00
|
|
|
|
|
|
|
switch (request->type) {
|
2018-04-13 17:31:56 +03:00
|
|
|
case NBD_CMD_CACHE:
|
2019-07-25 13:05:50 +03:00
|
|
|
return nbd_do_cmd_cache(client, request, errp);
|
|
|
|
|
|
|
|
case NBD_CMD_READ:
|
2018-03-13 01:14:28 +03:00
|
|
|
return nbd_do_cmd_read(client, request, data, errp);
|
|
|
|
|
|
|
|
case NBD_CMD_WRITE:
|
|
|
|
flags = 0;
|
|
|
|
if (request->flags & NBD_CMD_FLAG_FUA) {
|
|
|
|
flags |= BDRV_REQ_FUA;
|
|
|
|
}
|
2023-08-29 20:58:31 +03:00
|
|
|
assert(request->len <= NBD_MAX_BUFFER_SIZE);
|
2023-03-09 11:44:51 +03:00
|
|
|
ret = blk_co_pwrite(exp->common.blk, request->from, request->len, data,
|
|
|
|
flags);
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_send_generic_reply(client, request, ret,
|
2018-03-13 01:14:28 +03:00
|
|
|
"writing to file failed", errp);
|
|
|
|
|
|
|
|
case NBD_CMD_WRITE_ZEROES:
|
|
|
|
flags = 0;
|
|
|
|
if (request->flags & NBD_CMD_FLAG_FUA) {
|
|
|
|
flags |= BDRV_REQ_FUA;
|
|
|
|
}
|
|
|
|
if (!(request->flags & NBD_CMD_FLAG_NO_HOLE)) {
|
|
|
|
flags |= BDRV_REQ_MAY_UNMAP;
|
|
|
|
}
|
2019-08-23 17:37:25 +03:00
|
|
|
if (request->flags & NBD_CMD_FLAG_FAST_ZERO) {
|
|
|
|
flags |= BDRV_REQ_NO_FALLBACK;
|
|
|
|
}
|
2023-03-09 11:44:51 +03:00
|
|
|
ret = blk_co_pwrite_zeroes(exp->common.blk, request->from, request->len,
|
|
|
|
flags);
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_send_generic_reply(client, request, ret,
|
2018-03-13 01:14:28 +03:00
|
|
|
"writing to file failed", errp);
|
|
|
|
|
|
|
|
case NBD_CMD_DISC:
|
|
|
|
/* unreachable, thanks to special case in nbd_co_receive_request() */
|
|
|
|
abort();
|
|
|
|
|
|
|
|
case NBD_CMD_FLUSH:
|
2020-09-24 18:27:08 +03:00
|
|
|
ret = blk_co_flush(exp->common.blk);
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_send_generic_reply(client, request, ret,
|
2018-03-13 01:14:28 +03:00
|
|
|
"flush failed", errp);
|
|
|
|
|
|
|
|
case NBD_CMD_TRIM:
|
2021-11-17 20:02:30 +03:00
|
|
|
ret = blk_co_pdiscard(exp->common.blk, request->from, request->len);
|
2020-07-23 00:22:31 +03:00
|
|
|
if (ret >= 0 && request->flags & NBD_CMD_FLAG_FUA) {
|
2020-09-24 18:27:08 +03:00
|
|
|
ret = blk_co_flush(exp->common.blk);
|
nbd/server: Honor FUA request on NBD_CMD_TRIM
The NBD spec states that since trim requests can affect disk contents,
then they should allow for FUA semantics just like writes for ensuring
the disk has settled before returning. As bdrv_[co_]pdiscard() does
not support a flags argument, we can't pass FUA down the block layer
stack, and must therefore emulate it with a flush at the NBD layer.
Note that in all reality, generic well-behaved clients will never
send TRIM+FUA (in fact, qemu as a client never does, and we have no
intention to plumb flags into bdrv_pdiscard). This is because the
NBD protocol states that it is unspecified to READ a trimmed area
(you might read stale data, all zeroes, or even random unrelated
data) without first rewriting it, and even the experimental
BLOCK_STATUS extension states that TRIM need not affect reported
status. Thus, in the general case, a client cannot tell the
difference between an arbitrary server that ignores TRIM, a server
that had a power outage without flushing to disk, and a server that
actually affected the disk before returning; so waiting for the
trim actions to flush to disk makes little sense. However, for a
specific client and server pair, where the client knows the server
treats TRIM'd areas as guaranteed reads-zero, waiting for a flush
makes sense, hence why the protocol documents that FUA is valid on
trim. So, even though the NBD protocol doesn't have a way for the
server to advertise what effects (if any) TRIM will actually have,
and thus any client that relies on specific effects is probably
in error, we can at least support a client that requests TRIM+FUA.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180307225732.155835-1-eblake@redhat.com>
2018-03-08 01:57:32 +03:00
|
|
|
}
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_send_generic_reply(client, request, ret,
|
2018-03-13 01:14:28 +03:00
|
|
|
"discard failed", errp);
|
|
|
|
|
2018-03-12 18:21:21 +03:00
|
|
|
case NBD_CMD_BLOCK_STATUS:
|
2023-09-25 22:22:41 +03:00
|
|
|
assert(request->contexts);
|
nbd/server: Support 64-bit block status
The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits. As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.
For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.
Note that we previously had some interesting size-juggling on call
chains, such as:
nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
-> bdrv_block_status_above(bytes, &uint64_t num)
-> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length
But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers). This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that. Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-09-25 22:22:34 +03:00
|
|
|
assert(client->mode >= NBD_MODE_EXTENDED ||
|
|
|
|
request->len <= UINT32_MAX);
|
2023-09-25 22:22:41 +03:00
|
|
|
if (request->contexts->count) {
|
2018-07-04 14:23:02 +03:00
|
|
|
bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
|
2023-09-25 22:22:41 +03:00
|
|
|
int contexts_remaining = request->contexts->count;
|
2018-07-04 14:23:02 +03:00
|
|
|
|
nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
Allow a client to request a subset of negotiated meta contexts. For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).
Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads. Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.
This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.
Of note: qemu will always advertise the new feature bit during
NBD_OPT_INFO if extended headers have alreay been negotiated
(regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
occurred); but for NBD_OPT_GO, qemu only advertises the feature if
block status is also enabled (that is, if the client does not
negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
the feature is not advertised).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-26-eblake@redhat.com>
[eblake: fix logic to reject unnegotiated contexts]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:42 +03:00
|
|
|
if (!request->len) {
|
|
|
|
return nbd_send_generic_reply(client, request, -EINVAL,
|
|
|
|
"need non-zero length", errp);
|
|
|
|
}
|
2023-09-25 22:22:41 +03:00
|
|
|
if (request->contexts->base_allocation) {
|
2023-06-08 16:56:33 +03:00
|
|
|
ret = nbd_co_send_block_status(client, request,
|
2022-11-28 17:23:27 +03:00
|
|
|
exp->common.blk,
|
2020-09-24 18:27:08 +03:00
|
|
|
request->from,
|
2018-07-04 14:23:02 +03:00
|
|
|
request->len, dont_fragment,
|
2020-10-27 08:05:51 +03:00
|
|
|
!--contexts_remaining,
|
2018-06-09 18:17:56 +03:00
|
|
|
NBD_META_ID_BASE_ALLOCATION,
|
|
|
|
errp);
|
2020-02-06 20:38:32 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-09-25 22:22:41 +03:00
|
|
|
if (request->contexts->allocation_depth) {
|
2023-06-08 16:56:33 +03:00
|
|
|
ret = nbd_co_send_block_status(client, request,
|
2022-11-28 17:23:27 +03:00
|
|
|
exp->common.blk,
|
2020-10-27 08:05:54 +03:00
|
|
|
request->from, request->len,
|
|
|
|
dont_fragment,
|
|
|
|
!--contexts_remaining,
|
|
|
|
NBD_META_ID_ALLOCATION_DEPTH,
|
|
|
|
errp);
|
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-09-25 22:22:41 +03:00
|
|
|
assert(request->contexts->exp == client->exp);
|
2020-10-27 08:05:52 +03:00
|
|
|
for (i = 0; i < client->exp->nr_export_bitmaps; i++) {
|
2023-09-25 22:22:41 +03:00
|
|
|
if (!request->contexts->bitmaps[i]) {
|
2020-10-27 08:05:52 +03:00
|
|
|
continue;
|
|
|
|
}
|
2023-06-08 16:56:33 +03:00
|
|
|
ret = nbd_co_send_bitmap(client, request,
|
2020-10-27 08:05:52 +03:00
|
|
|
client->exp->export_bitmaps[i],
|
2018-06-09 18:17:56 +03:00
|
|
|
request->from, request->len,
|
2020-10-27 08:05:51 +03:00
|
|
|
dont_fragment, !--contexts_remaining,
|
2020-10-27 08:05:52 +03:00
|
|
|
NBD_META_ID_DIRTY_BITMAP + i, errp);
|
2020-02-06 20:38:32 +03:00
|
|
|
if (ret < 0) {
|
|
|
|
return ret;
|
|
|
|
}
|
2018-06-09 18:17:56 +03:00
|
|
|
}
|
|
|
|
|
2020-10-27 08:05:51 +03:00
|
|
|
assert(!contexts_remaining);
|
|
|
|
|
2020-02-06 20:38:32 +03:00
|
|
|
return 0;
|
2023-09-25 22:22:41 +03:00
|
|
|
} else if (client->contexts.count) {
|
|
|
|
return nbd_send_generic_reply(client, request, -EINVAL,
|
|
|
|
"CMD_BLOCK_STATUS payload not valid",
|
|
|
|
errp);
|
2018-03-12 18:21:21 +03:00
|
|
|
} else {
|
2023-06-08 16:56:33 +03:00
|
|
|
return nbd_send_generic_reply(client, request, -EINVAL,
|
2018-03-12 18:21:21 +03:00
|
|
|
"CMD_BLOCK_STATUS not negotiated",
|
|
|
|
errp);
|
|
|
|
}
|
|
|
|
|
2018-03-13 01:14:28 +03:00
|
|
|
default:
|
|
|
|
msg = g_strdup_printf("invalid request type (%" PRIu32 ") received",
|
|
|
|
request->type);
|
2023-06-08 16:56:33 +03:00
|
|
|
ret = nbd_send_generic_reply(client, request, -EINVAL, msg,
|
2018-03-13 01:14:28 +03:00
|
|
|
errp);
|
|
|
|
g_free(msg);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-02-13 16:52:24 +03:00
|
|
|
/* Owns a reference to the NBDClient passed as opaque. */
|
|
|
|
static coroutine_fn void nbd_trip(void *opaque)
|
2008-07-03 17:41:03 +04:00
|
|
|
{
|
2024-03-14 19:58:24 +03:00
|
|
|
NBDRequestData *req = opaque;
|
|
|
|
NBDClient *client = req->client;
|
2017-02-13 16:52:24 +03:00
|
|
|
NBDRequest request = { 0 }; /* GCC thinks it can be used uninitialized */
|
2017-06-02 18:01:42 +03:00
|
|
|
int ret;
|
2017-07-07 18:29:11 +03:00
|
|
|
Error *local_err = NULL;
|
2011-02-22 18:44:51 +03:00
|
|
|
|
2023-12-21 22:24:51 +03:00
|
|
|
/*
|
|
|
|
* Note that nbd_client_put() and client_close() must be called from the
|
|
|
|
* main loop thread. Use aio_co_reschedule_self() to switch AioContext
|
|
|
|
* before calling these functions.
|
|
|
|
*/
|
|
|
|
|
2017-07-07 18:29:18 +03:00
|
|
|
trace_nbd_trip();
|
2023-12-21 22:24:52 +03:00
|
|
|
|
|
|
|
qemu_mutex_lock(&client->lock);
|
|
|
|
|
2012-08-22 20:45:12 +04:00
|
|
|
if (client->closing) {
|
2023-12-21 22:24:51 +03:00
|
|
|
goto done;
|
2012-08-22 20:45:12 +04:00
|
|
|
}
|
2011-02-22 18:44:51 +03:00
|
|
|
|
2020-12-14 20:05:18 +03:00
|
|
|
if (client->quiescing) {
|
|
|
|
/*
|
|
|
|
* We're switching between AIO contexts. Don't attempt to receive a new
|
|
|
|
* request and kick the main context which may be waiting for us.
|
|
|
|
*/
|
|
|
|
client->recv_coroutine = NULL;
|
|
|
|
aio_wait_kick();
|
2023-12-21 22:24:51 +03:00
|
|
|
goto done;
|
2020-12-14 20:05:18 +03:00
|
|
|
}
|
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
/*
|
|
|
|
* nbd_co_receive_request() returns -EAGAIN when nbd_drained_begin() has
|
|
|
|
* set client->quiescing but by the time we get back nbd_drained_end() may
|
|
|
|
* have already cleared client->quiescing. In that case we try again
|
|
|
|
* because nothing else will spawn an nbd_trip() coroutine until we set
|
|
|
|
* client->recv_coroutine = NULL further down.
|
|
|
|
*/
|
|
|
|
do {
|
|
|
|
assert(client->recv_coroutine == qemu_coroutine_self());
|
|
|
|
qemu_mutex_unlock(&client->lock);
|
|
|
|
ret = nbd_co_receive_request(req, &request, &local_err);
|
|
|
|
qemu_mutex_lock(&client->lock);
|
|
|
|
} while (ret == -EAGAIN && !client->quiescing);
|
|
|
|
|
2017-06-02 18:01:45 +03:00
|
|
|
client->recv_coroutine = NULL;
|
2011-02-22 18:44:51 +03:00
|
|
|
|
2015-09-16 11:35:46 +03:00
|
|
|
if (client->closing) {
|
|
|
|
/*
|
|
|
|
* The client may be closed when we are blocked in
|
|
|
|
* nbd_co_receive_request()
|
|
|
|
*/
|
|
|
|
goto done;
|
|
|
|
}
|
|
|
|
|
2020-12-14 20:05:18 +03:00
|
|
|
if (ret == -EAGAIN) {
|
|
|
|
goto done;
|
|
|
|
}
|
|
|
|
|
nbd/server: fix: check client->closing before sending reply
Since the unchanged code has just set client->recv_coroutine to
NULL before calling nbd_client_receive_next_request(), we are
spawning a new coroutine unconditionally, but the first thing
that coroutine will do is check for client->closing, making it
a no-op if we have already detected that the client is going
away. Furthermore, for any error other than EIO (where we
disconnect, which itself sets client->closing), if the client
has already gone away, we'll probably encounter EIO later
in the function and attempt disconnect at that point. Logically,
as soon as we know the connection is closing, there is no need
to try a likely-to-fail a response or spawn a no-op coroutine.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: squash in further reordering: hoist check before spawning
next coroutine, and document rationale in commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-08 21:46:34 +03:00
|
|
|
nbd_client_receive_next_request(client);
|
2023-12-21 22:24:52 +03:00
|
|
|
|
nbd/server: fix: check client->closing before sending reply
Since the unchanged code has just set client->recv_coroutine to
NULL before calling nbd_client_receive_next_request(), we are
spawning a new coroutine unconditionally, but the first thing
that coroutine will do is check for client->closing, making it
a no-op if we have already detected that the client is going
away. Furthermore, for any error other than EIO (where we
disconnect, which itself sets client->closing), if the client
has already gone away, we'll probably encounter EIO later
in the function and attempt disconnect at that point. Logically,
as soon as we know the connection is closing, there is no need
to try a likely-to-fail a response or spawn a no-op coroutine.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: squash in further reordering: hoist check before spawning
next coroutine, and document rationale in commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-08 21:46:34 +03:00
|
|
|
if (ret == -EIO) {
|
|
|
|
goto disconnect;
|
|
|
|
}
|
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
qemu_mutex_unlock(&client->lock);
|
2023-03-24 13:47:20 +03:00
|
|
|
qio_channel_set_cork(client->ioc, true);
|
|
|
|
|
nbd/server: fix: check client->closing before sending reply
Since the unchanged code has just set client->recv_coroutine to
NULL before calling nbd_client_receive_next_request(), we are
spawning a new coroutine unconditionally, but the first thing
that coroutine will do is check for client->closing, making it
a no-op if we have already detected that the client is going
away. Furthermore, for any error other than EIO (where we
disconnect, which itself sets client->closing), if the client
has already gone away, we'll probably encounter EIO later
in the function and attempt disconnect at that point. Logically,
as soon as we know the connection is closing, there is no need
to try a likely-to-fail a response or spawn a no-op coroutine.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: squash in further reordering: hoist check before spawning
next coroutine, and document rationale in commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-08 21:46:34 +03:00
|
|
|
if (ret < 0) {
|
2021-12-04 02:15:26 +03:00
|
|
|
/* It wasn't -EIO, so, according to nbd_co_receive_request()
|
2018-03-08 21:46:35 +03:00
|
|
|
* semantics, we should return the error to the client. */
|
|
|
|
Error *export_err = local_err;
|
|
|
|
|
|
|
|
local_err = NULL;
|
2023-06-08 16:56:33 +03:00
|
|
|
ret = nbd_send_generic_reply(client, &request, -EINVAL,
|
2018-03-08 21:46:35 +03:00
|
|
|
error_get_pretty(export_err), &local_err);
|
|
|
|
error_free(export_err);
|
2018-03-13 01:14:28 +03:00
|
|
|
} else {
|
|
|
|
ret = nbd_handle_request(client, &request, req->data, &local_err);
|
2017-10-27 13:40:32 +03:00
|
|
|
}
|
2023-09-25 22:22:41 +03:00
|
|
|
if (request.contexts && request.contexts != &client->contexts) {
|
|
|
|
assert(request.type == NBD_CMD_BLOCK_STATUS);
|
|
|
|
g_free(request.contexts->bitmaps);
|
|
|
|
g_free(request.contexts);
|
|
|
|
}
|
2023-12-21 22:24:52 +03:00
|
|
|
|
|
|
|
qio_channel_set_cork(client->ioc, false);
|
|
|
|
qemu_mutex_lock(&client->lock);
|
|
|
|
|
2017-10-27 13:40:32 +03:00
|
|
|
if (ret < 0) {
|
2017-07-07 18:29:12 +03:00
|
|
|
error_prepend(&local_err, "Failed to send reply: ");
|
2017-07-07 18:29:11 +03:00
|
|
|
goto disconnect;
|
|
|
|
}
|
|
|
|
|
nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
Allow a client to request a subset of negotiated meta contexts. For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).
Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads. Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.
This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.
Of note: qemu will always advertise the new feature bit during
NBD_OPT_INFO if extended headers have alreay been negotiated
(regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
occurred); but for NBD_OPT_GO, qemu only advertises the feature if
block status is also enabled (that is, if the client does not
negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
the feature is not advertised).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-26-eblake@redhat.com>
[eblake: fix logic to reject unnegotiated contexts]
Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25 22:22:42 +03:00
|
|
|
/*
|
|
|
|
* We must disconnect after NBD_CMD_WRITE or BLOCK_STATUS with
|
|
|
|
* payload if we did not read the payload.
|
2017-06-02 18:01:50 +03:00
|
|
|
*/
|
2017-07-07 18:29:11 +03:00
|
|
|
if (!req->complete) {
|
|
|
|
error_setg(&local_err, "Request handling failed in intermediate state");
|
2017-06-02 18:01:50 +03:00
|
|
|
goto disconnect;
|
2011-02-22 18:44:51 +03:00
|
|
|
}
|
|
|
|
|
2012-03-05 12:10:35 +04:00
|
|
|
done:
|
2024-03-14 19:58:24 +03:00
|
|
|
nbd_request_put(req);
|
2023-12-21 22:24:52 +03:00
|
|
|
|
|
|
|
qemu_mutex_unlock(&client->lock);
|
|
|
|
|
2023-12-21 22:24:51 +03:00
|
|
|
if (!nbd_client_put_nonzero(client)) {
|
|
|
|
aio_co_reschedule_self(qemu_get_aio_context());
|
|
|
|
nbd_client_put(client);
|
|
|
|
}
|
2011-09-19 17:19:27 +04:00
|
|
|
return;
|
|
|
|
|
2017-06-02 18:01:50 +03:00
|
|
|
disconnect:
|
2017-07-07 18:29:11 +03:00
|
|
|
if (local_err) {
|
|
|
|
error_reportf_err(local_err, "Disconnect client, due to: ");
|
|
|
|
}
|
2023-12-21 22:24:52 +03:00
|
|
|
|
2011-10-07 18:47:56 +04:00
|
|
|
nbd_request_put(req);
|
2023-12-21 22:24:52 +03:00
|
|
|
qemu_mutex_unlock(&client->lock);
|
2023-12-21 22:24:51 +03:00
|
|
|
|
|
|
|
aio_co_reschedule_self(qemu_get_aio_context());
|
nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-09 01:26:17 +03:00
|
|
|
client_close(client, true);
|
2017-02-13 16:52:24 +03:00
|
|
|
nbd_client_put(client);
|
2008-05-28 01:13:40 +04:00
|
|
|
}
|
2011-09-19 16:03:37 +04:00
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
/*
|
|
|
|
* Runs in export AioContext and main loop thread. Caller must hold
|
|
|
|
* client->lock.
|
|
|
|
*/
|
2017-02-13 16:52:24 +03:00
|
|
|
static void nbd_client_receive_next_request(NBDClient *client)
|
2014-06-20 23:57:32 +04:00
|
|
|
{
|
2024-03-14 19:58:24 +03:00
|
|
|
NBDRequestData *req;
|
|
|
|
|
2020-12-14 20:05:18 +03:00
|
|
|
if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS &&
|
|
|
|
!client->quiescing) {
|
2017-02-13 16:52:24 +03:00
|
|
|
nbd_client_get(client);
|
2024-03-14 19:58:24 +03:00
|
|
|
req = nbd_request_get(client);
|
|
|
|
client->recv_coroutine = qemu_coroutine_create(nbd_trip, req);
|
2020-09-24 18:27:00 +03:00
|
|
|
aio_co_schedule(client->exp->common.ctx, client->recv_coroutine);
|
2014-06-20 23:57:32 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-01-14 11:41:03 +03:00
|
|
|
static coroutine_fn void nbd_co_client_start(void *opaque)
|
|
|
|
{
|
2017-06-02 18:01:46 +03:00
|
|
|
NBDClient *client = opaque;
|
2017-07-07 18:29:11 +03:00
|
|
|
Error *local_err = NULL;
|
2016-01-14 11:41:03 +03:00
|
|
|
|
2017-05-27 06:04:21 +03:00
|
|
|
qemu_co_mutex_init(&client->send_lock);
|
|
|
|
|
2017-07-07 18:29:11 +03:00
|
|
|
if (nbd_negotiate(client, &local_err)) {
|
|
|
|
if (local_err) {
|
|
|
|
error_report_err(local_err);
|
|
|
|
}
|
nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-09 01:26:17 +03:00
|
|
|
client_close(client, false);
|
2017-06-02 18:01:46 +03:00
|
|
|
return;
|
2016-01-14 11:41:03 +03:00
|
|
|
}
|
2017-02-13 16:52:24 +03:00
|
|
|
|
2023-12-21 22:24:52 +03:00
|
|
|
WITH_QEMU_LOCK_GUARD(&client->lock) {
|
|
|
|
nbd_client_receive_next_request(client);
|
|
|
|
}
|
2016-01-14 11:41:03 +03:00
|
|
|
}
|
|
|
|
|
nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-09 01:26:17 +03:00
|
|
|
/*
|
2018-10-03 20:02:28 +03:00
|
|
|
* Create a new client listener using the given channel @sioc.
|
|
|
|
* Begin servicing it in a coroutine. When the connection closes, call
|
|
|
|
* @close_fn with an indication of whether the client completed negotiation.
|
nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-09 01:26:17 +03:00
|
|
|
*/
|
2018-10-03 20:02:28 +03:00
|
|
|
void nbd_client_new(QIOChannelSocket *sioc,
|
2016-02-10 21:41:11 +03:00
|
|
|
QCryptoTLSCreds *tlscreds,
|
qemu-nbd: add support for authorization of TLS clients
Currently any client which can complete the TLS handshake is able to use
the NBD server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509 certificate.
This means the client will have to acquire a certificate from the CA
before they are permitted to use the NBD server. This is still a fairly
low bar to cross.
This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
takes the ID of a previously added 'QAuthZ' object instance. This will
be used to validate the client's x509 distinguished name. Clients
failing the authorization check will not be permitted to use the NBD
server.
For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name is
CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB
escape the commas in the name and use:
qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
endpoint=server,verify-peer=yes \
--object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\
O=Example Org,,L=London,,ST=London,,C=GB' \
--tls-creds tls0 \
--tls-authz authz0 \
....other qemu-nbd args...
NB: a real shell command line would not have leading whitespace after
the line continuation, it is just included here for clarity.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20190227162035.18543-2-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: split long line in --help text, tweak 233 to show that whitespace
after ,, in identity= portion is actually okay]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-27 19:20:33 +03:00
|
|
|
const char *tlsauthz,
|
nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-09 01:26:17 +03:00
|
|
|
void (*close_fn)(NBDClient *, bool))
|
2011-09-19 16:03:37 +04:00
|
|
|
{
|
2011-09-19 16:33:23 +04:00
|
|
|
NBDClient *client;
|
2017-06-02 18:01:46 +03:00
|
|
|
Coroutine *co;
|
2016-01-14 11:41:03 +03:00
|
|
|
|
2017-10-07 02:49:16 +03:00
|
|
|
client = g_new0(NBDClient, 1);
|
2023-12-21 22:24:52 +03:00
|
|
|
qemu_mutex_init(&client->lock);
|
2011-09-19 16:33:23 +04:00
|
|
|
client->refcount = 1;
|
2016-02-10 21:41:11 +03:00
|
|
|
client->tlscreds = tlscreds;
|
|
|
|
if (tlscreds) {
|
|
|
|
object_ref(OBJECT(client->tlscreds));
|
|
|
|
}
|
qemu-nbd: add support for authorization of TLS clients
Currently any client which can complete the TLS handshake is able to use
the NBD server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509 certificate.
This means the client will have to acquire a certificate from the CA
before they are permitted to use the NBD server. This is still a fairly
low bar to cross.
This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
takes the ID of a previously added 'QAuthZ' object instance. This will
be used to validate the client's x509 distinguished name. Clients
failing the authorization check will not be permitted to use the NBD
server.
For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name is
CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB
escape the commas in the name and use:
qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
endpoint=server,verify-peer=yes \
--object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\
O=Example Org,,L=London,,ST=London,,C=GB' \
--tls-creds tls0 \
--tls-authz authz0 \
....other qemu-nbd args...
NB: a real shell command line would not have leading whitespace after
the line continuation, it is just included here for clarity.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20190227162035.18543-2-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: split long line in --help text, tweak 233 to show that whitespace
after ,, in identity= portion is actually okay]
Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-27 19:20:33 +03:00
|
|
|
client->tlsauthz = g_strdup(tlsauthz);
|
2016-02-10 21:41:04 +03:00
|
|
|
client->sioc = sioc;
|
2023-04-04 03:40:47 +03:00
|
|
|
qio_channel_set_delay(QIO_CHANNEL(sioc), false);
|
2016-02-10 21:41:04 +03:00
|
|
|
object_ref(OBJECT(client->sioc));
|
|
|
|
client->ioc = QIO_CHANNEL(sioc);
|
|
|
|
object_ref(OBJECT(client->ioc));
|
nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-09 01:26:17 +03:00
|
|
|
client->close_fn = close_fn;
|
2012-09-18 15:26:25 +04:00
|
|
|
|
2017-06-02 18:01:46 +03:00
|
|
|
co = qemu_coroutine_create(nbd_co_client_start, client);
|
|
|
|
qemu_coroutine_enter(co);
|
2011-09-19 16:03:37 +04:00
|
|
|
}
|