This covers both file descriptor callbacks and polling callbacks,
since they execute related code.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-14-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-13-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
nb_cls_shrunk in iscsi_allocmap_update can become -1 if the
request starts and ends within the same cluster. This results
in passing -1 to bitmap_set and bitmap_clear and they don't
handle negative values properly. In the end this leads to data
corruption.
Fixes: e1123a3b40
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1484579832-18589-1-git-send-email-pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new AioPollFn io_poll() argument to aio_set_fd_handler() and
aio_set_event_handler() is used in the next patch.
Keep this code change separate due to the number of files it touches.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161201192652.9509-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Right now, the block layer rounds discard requests, so that
individual drivers are able to assert that discard requests
will never be unaligned. But there are some ISCSI devices
that track and coalesce multiple unaligned requests, turning it
into an actual discard if the requests eventually cover an
entire page, which implies that it is better to always pass
discard requests as low down the stack as possible.
In isolation, this patch has no semantic effect, since the
block layer currently never passes an unaligned request through.
But the block layer already has code that silently ignores
drivers that return -ENOTSUP for a discard request that cannot
be honored (as well as drivers that return 0 even when nothing
was done). But the next patch will update the block layer to
fragment discard requests, so that clients are guaranteed that
they are either dealing with an unaligned head or tail, or an
aligned core, making it similar to the block layer semantics of
write zero fragmentation.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
iSER is a new transport layer supported in Libiscsi,
iSER provides a zero-copy RDMA capable interface that can
improve performance.
In order to use the new iSER transport one need to have RDMA supported HW
and to choose iser as the protocol name in Libiscsi URI.
For now iSER memory buffers are pre-allocated and pre-registered,
hence in order to work with iSER from QEMU, one need to enable
MEMLOCK attribute in the VM to be large enough for all iSER buffers and RDMA
resources.
Signed-off-by: Roy Shterman <roysh@mellanox.com>
Message-Id: <1476000896-18632-3-git-send-email-roysh@mellanox.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A new API to deploy zero-copy command submission. The new API takes I/O
vectors list and number of I/O vectors to submit as input parameters
when initiating the command. New API must be used if working with
iSER transport option.
Signed-off-by: Roy Shterman <roysh@mellanox.com>
Message-Id: <1476000896-18632-2-git-send-email-roysh@mellanox.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This simplifies bottom half handlers by removing calls to qemu_bh_delete and
thus removing the need to stash the bottom half pointer in the opaque
datum.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Version: GnuPG v2
iQEcBAABCAAGBQJX5LZ0AAoJEMo1YkxqkXHGmgAIAKeDTKx0sA76ewIvH9fKdmUu
NNatJw59XnVX8lpfOU5yISkJ4BD6oBdN7tbWaOW8yzcAeYu1Ff5iUu4LBEUFb7eW
g6zqUCV58XjaCTLmTiAfa19Exfnh6pXZlZMRP4Hr3vUVSCHFmC0EyTEllfHxU/jW
aPHtAEge/p6EDAHygHJBTSQzsaXRdyJNyt/AKPreDtblNRT8VgapCDzZQPcCVGH1
F9grWVu0B/VVDS0mfgSRhT0UeF/vtiikuRW92sC4woVVB+brJyG4VwGT8oeUN8RU
30/tGo5p9fpqef3iP669uUrloLfmWcKcIJuPfQ4ZUlZh8kIV+lWK9kZuTVgocGw=
=xLJw
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/famz/tags/various-pull-request' into staging
# gpg: Signature made Fri 23 Sep 2016 05:58:28 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/various-pull-request: (23 commits)
docker: exec $CMD
docker: Terminate instances at SIGTERM and SIGHUP
docker: Support showing environment information
docker: Print used options before doing configure
docker: Flatten default target list in test-quick
docker: Update fedora image to latest
docker: Generate /packages.txt in ubuntu image
docker: Generate /packages.txt in fedora image
docker: Generate /packages.txt in centos6 image
tests: Ignore test-uuid
Add UUID files to MAINTAINERS
tests: Add uuid tests
uuid: Tighten uuid parse
vl: Switch qemu_uuid to QemuUUID
configure: Remove detection code for UUID
tests: No longer dependent on CONFIG_UUID
crypto: Switch to QEMU UUID API
vpc: Use QEMU UUID API
vdi: Use QEMU UUID API
vhdx: Use QEMU UUID API
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# tests/Makefile.include
A number of different places across the code base use CONFIG_UUID. Some
of them are soft dependency, some are not built if libuuid is not
available, some come with dummy fallback, some throws runtime error.
It is hard to maintain, and hard to reason for users.
Since UUID is a simple standard with only a small number of operations,
it is cleaner to have a central support in libqemuutil. This patch adds
qemu_uuid_* functions that all uuid users in the code base can
rely on. Except for qemu_uuid_generate which is new code, all other
functions are just copy from existing fallbacks from other files.
Note that qemu_uuid_parse is moved without updating the function
signature to use QemuUUID, to keep this patch simple.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-Id: <1474432046-325-2-git-send-email-famz@redhat.com>
When qemu uses iscsi devices in sg mode, iscsilun->block_size
is left at 0. Prior to commits cf081fca and similar, when
block limits were tracked in sectors, this did not matter:
various block limits were just left at 0. But when we started
scaling by block size, this caused SIGFPE.
Then, in a later patch, commit a5b8dd2c added an assertion to
bdrv_open_common() that request_alignment is always non-zero;
which was not true for SG mode. Rather than relax that assertion,
we can just provide a sane value (we don't know of any SG device
with a block size smaller than qemu's default sizing of 512 bytes).
One possible solution for SG mode is to just blindly skip ALL
of iscsi_refresh_limits(), since we already short circuit so
many other things in sg mode. But this patch takes a slightly
more conservative approach, and merely guarantees that scaling
will succeed, while still using multiples of the original size
where possible. Resulting limits may still be zero in SG mode
(that is, we mostly only fix block_size used as a denominator
or which affect assertions, not all uses).
Reported-by: Holger Schranz <holger@fam-schranz.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
CC: qemu-stable@nongnu.org
Message-Id: <1473283640-15756-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit moves the initialization of the QemuOptsList qemu_iscsi_opts
struct out of block/iscsi.c in order to allow the iscsi module to be
dynamically loaded.
Signed-off-by: Colin Lord <clord@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1471008424-16465-2-git-send-email-clord@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Another step towards killing off sector-based block APIs.
Unlike write_zeroes, where we can be handed unaligned requests
and must fail gracefully with -ENOTSUP for a fallback, we are
guaranteed that discard requests are always aligned because the
block layer already ignored unaligned head/tail.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-13-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that the block layer honors max_request, we don't need to
bother with an EINVAL on overlarge requests, but can instead
assert that requests are well-behaved.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-7-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
until now the allocation map was used only as a hint if a cluster
is allocated or not. If a block was not allocated (or Qemu had
no info about the allocation status) a get_block_status call was
issued to check the allocation status and possibly avoid
a subsequent read of unallocated sectors. If a block known to be
allocated the get_block_status call was omitted. In the other case
a get_block_status call was issued before every read to avoid
the necessity for a consistent allocation map. To avoid the
potential overhead of calling get_block_status for each and
every read request this took only place for the bigger requests.
This patch enhances this mechanism to cache the allocation
status and avoid calling get_block_status for blocks where
the allocation status has been queried before. This allows
for bypassing the read request even for smaller requests and
additionally omits calling get_block_status for known to be
unallocated blocks.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1468831940-15556-3-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
when setting clusters as alloacted the boundaries have
to be expanded. As Paolo pointed out the calculation of
the number of clusters is wrong:
Suppose cluster_sectors is 2, sector_num = 1, nb_sectors = 6:
In the "mark allocated" case, you want to set 0..8, i.e.
cluster_num=0, nb_clusters=4.
0--.--2--.--4--.--6--.--8
<--|_________________|--> (<--> = expanded)
Instead you are setting nb_clusters=3, so that 6..8 is not marked.
0--.--2--.--4--.--6--.--8
<--|______________|!!! (! = wrong)
Cc: qemu-stable@nongnu.org
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1468831940-15556-2-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.
Mostly done with the following semantic patch:
@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));
@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);
except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tracked down with an ugly, brittle and probably buggy Perl script.
Also move includes converted to <...> up so they get included before
ours where that's obviously okay.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Using int for values that are only used as booleans is confusing.
While at it, rearrange a couple of members so that all the bools
are contiguous.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It makes more sense to have ALL block size limit constraints
in the same struct. Improve the documentation while at it.
Simplify a couple of conditionals, now that we have audited and
documented that request_alignment is always non-zero.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment. Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code. The BlockLimits type is now completely
byte-based; and in iscsi.c, sector_limits_lun2qemu() is no
longer needed.
pdiscard_alignment is made unsigned (we use power-of-2 alignments
as bitmasks, where unsigned is easier to think about) while
leaving max_pdiscard signed (since we still have an 'int'
interface); this is comparable to what commit cf081fc did for
write zeroes limits. We may later want to make everything an
unsigned 64-bit limit - but that requires a bigger code audit.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length. Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation. Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.
When a value comes from an external source (iscsi and raw-posix),
sanitize the results to ensure that opt_transfer is a power of 2.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function sector_limits_lun2qemu() returns a value in units of
the block layer's 512-byte sector, and can be as large as
0x40000000, which is much larger than the block layer's inherent
limit of BDRV_REQUEST_MAX_SECTORS. The block layer already
handles '0' as a synonym to the inherent limit, and it is nicer
to return this value than it is to calculate an arbitrary
maximum, for two reasons: we want to ensure that the block layer
continues to special-case '0' as 'no limit beyond the inherent
limits'; and we want to be able to someday expand the block
layer to allow 64-bit limits, where auditing for uses of
BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
artificially constraining iscsi to old block layer limits.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 94d047a added an assertion the the request alignment check.
This introduced 2 issues:
a) A off-by-one error since a request of BDRV_REQUEST_MAX_SECTORS
is actually allowed.
b) The bdrv_get_block_status call in the read path to check the allocation
status requests up to INT_MAX sectors which triggers the assertion.
Fixes: 94d047a35b
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1466414680-18383-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Another step on our continuing quest to switch to byte-based
interfaces.
As this is the first byte-based iscsi interface, convert
is_request_lun_aligned() into two versions, one for sectors
and one for bytes. Also, change from outright -EINVAL failure
on an unaligned request, to instead failing with -ENOTSUP to
trigger a read-modify-write fallback, particularly since the
block layer should be honoring bs->request_alignment to avoid
-EINVAL on read/write requests.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Another step towards removing sector-based interfaces: convert
the maximum write and minimum alignment values from sectors to
bytes. Rename the variables to let the compiler check that all
users are converted to the new semantics.
The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS
is constrained by INT_MAX (this means that we can't even
support a 2G write_zeroes, but just under it) - changing
operation lengths to unsigned or to 64-bits is a much bigger
audit, and debatable if we even want to do it (since at the
core, a 32-bit platform will still have ssize_t as its
underlying limit on write()).
Meanwhile, alignment is changed to 'uint32_t', since it makes no
sense to have an alignment larger than the maximum write, and
less painful to use an unsigned type with well-defined behavior
in bit operations than to have to worry about what happens if
a driver mistakenly supplies a negative alignment.
Add an assert that no one was trying to use sectors to get a
write zeroes larger than 2G, and therefore that a later conversion
to bytes won't be impacted by keeping the limit at 32 bits.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If hardware does not advertise a minimum zero/discard
alignment, we still want to guarantee that the block layer
will align requests to our blocks, rather than the arbitrary
512-byte BDRV sector size.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
at least in the path via virtio-blk the maximum size is not
restricted.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1464080368-29584-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The block layer has a couple of cases where it can lose
Force Unit Access semantics when writing a large block of
zeroes, such that the request returns before the zeroes
have been guaranteed to land on underlying media.
SCSI does not support FUA during WRITESAME(10/16); FUA is only
supported if it falls back to WRITE(10/16). But where the
underlying device is new enough to not need a fallback, it
means that any upper layer request with FUA semantics was
silently ignoring BDRV_REQ_FUA.
Conversely, NBD has situations where it can support FUA but not
ZERO_WRITE; when that happens, the generic block layer fallback
to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu
2.6) was losing the FUA flag.
The problem of losing flags unrelated to ZERO_WRITE has been
latent in bdrv_co_do_write_zeroes() since commit aa7bfbff, but
back then, it did not matter because there was no FUA flag. It
became observable when commit 93f5e6d8 paved the way for flags
that can impact correctness, when we should have been using
bdrv_co_writev_flags() with modified flags. Compare to commit
9eeb6dd, which got flag manipulation right in
bdrv_co_do_zero_pwritev().
Symptoms: I tested with qemu-io with default writethrough cache
(which is supposed to use FUA semantics on every write), and
targetted an NBD client connected to a server that intentionally
did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512',
the NBD client sent two operations (NBD_CMD_WRITE then
NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
'write -z 0 512', the NBD client sent only NBD_CMD_WRITE.
The fix is do to a cleanup bdrv_co_flush() at the end of the
operation if any step in the middle relied on a BDS that does
not natively support FUA for that step (note that we don't
need to flush after every operation, if the operation is broken
into chunks based on bounce-buffer sizing). Each BDS gains a
new flag .supported_zero_flags, which parallels the use of
.supported_write_flags but only when accessing a zero write
operation (the flags MUST be different, because of SCSI having
different semantics based on WRITE vs. WRITESAME; and also
because BDRV_REQ_MAY_UNMAP only makes sense on zero writes).
Also fix some documentation to describe -ENOTSUP semantics,
particularly since iscsi depends on those semantics.
Down the road, we may want to add a driver where its
.bdrv_co_pwritev() honors all three of BDRV_REQ_FUA,
BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise
this via bs->supported_write_flags for blocks opened by that
driver; such a driver should NOT supply .bdrv_co_write_zeroes
nor .supported_zero_flags. But none of the drivers touched
in this patch want to do that (the act of writing zeroes is
different enough from normal writes to deserve a second
callback).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pre-patch, .supported_write_flags lives at the driver level, which
means we are blindly declaring that all block devices using a
given driver will either equally support FUA, or that we need a
fallback at the block layer. But there are drivers where FUA
support is a per-block decision: the NBD block driver is dependent
on the remote server advertising NBD_FLAG_SEND_FUA (and has
fallback code to duplicate the flush that the block layer would do
if NBD had not set .supported_write_flags); and the iscsi block
driver is dependent on the mode sense bits advertised by the
underlying device (and is currently silently ignoring FUA requests
if the underlying device does not support FUA).
The fix is to make supported flags as a per-BDS option, set during
.bdrv_open(). This patch moves the variable and fixes NBD and iscsi
to set it only conditionally; later patches will then further
simplify the NBD driver to quit duplicating work done at the block
layer, as well as tackle the fact that SCSI does not support FUA
semantics on WRITESAME(10/16) but only on WRITE(10/16).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a function that simply calls into the block driver for doing a
write, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.
This one is a bit more interesting than the version for reads: It adds
support for .bdrv_co_writev_flags() everywhere, so that drivers
implementing this function can drop .bdrv_co_writev() now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
This replaces the existing hack in the iscsi driver that sent the FUA
bit in writethrough mode and ignored the following flush in order to
optimise the number of roundtrips (see commit 73b5394e).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Whether a write cache is used or not is a decision that concerns the
user (e.g. the guest device) rather than the backend. It was already
logically part of the BB level as bdrv_move_feature_fields() always kept
it on top of the BDS tree; with this patch, the core of it (the actual
flag and the additional flushes) is also implemented there.
Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs
doesn't have a BlockBackend attached.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
The iSCSI driver currently accepts the CHAP password in plain text
as a block driver property. This change adds a new "password-secret"
property that accepts the ID of a QCryptoSecret instance.
$QEMU \
-object secret,id=sec0,filename=/home/berrange/example.pw \
-drive driver=iscsi,url=iscsi://example.com/target-foo/lun1,\
user=dan,password-secret=sec0
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-4-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
The added parameter can be used to return the BDS pointer which the
valid offset is referring to. Its value should be ignored unless
BDRV_BLOCK_OFFSET_VALID in ret is set.
Until block drivers fill in the right value, let's clear it explicitly
right before calling .bdrv_get_block_status.
The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest
case 102 passing, and will be fixed once all drivers return the right file
pointer.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-2-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When play with Dell MD3000 target, for sure it
is a TYPE_DISK, but readcapacity16 would fail.
Then we find that readcapacity10 succeeded. It
looks like the target just support readcapacity10
even through it is a TYPE_DISK or have some
TYPE_ROM characteristics.
This patch can give a chance to send
readcapacity16 when readcapacity10 failed.
This patch is not harmful to original pathes
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Message-Id: <1451359934-9236-1-git-send-email-lszhu@suse.com>
[Don't fall through on UNIT ATTENTION. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now the callback is not used any more, drop the field along with all
implementations in block drivers, which are iscsi and raw.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-8-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
iscsi_ioctl emulates SG_GET_VERSION_NUM and SG_GET_SCSI_ID. Now that
bdrv_ioctl() will be emulated with .bdrv_aio_ioctl, replicate the logic
into iscsi_aio_ioctl to make them consistent.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-5-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Previously we return -EIO blindly when anything goes wrong. Add a helper
function to parse sense fields and try to make the return code more
meaningful.
This also fixes the default werror configuration (enospc) when we're
using qcow2 on an iscsi lun. The old -EIO not being treated as out of
space error failed to trigger vm stop.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1446699609-11376-1-git-send-email-famz@redhat.com>
[libiscsi 1.9 compatibility - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All callers pass in false, and the real external ones will switch to
true in coming patches.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It has been reported that at least tgtd returns a block size of 0
for LUN 0. To avoid running into divide by zero later on and protect
against other problematic block sizes validate the block size right
at connection time.
Cc: qemu-stable@nongnu.org
Reported-by: Andrey Korolyov <andrey@xdel.ru>
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1439552016-8557-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
RHEL7 and others are stuck with libiscsi 1.9.0 since there
unfortunately was an ABI breakage after that release.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1435313881-19366-1-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
libiscsi starting with 1.15 will properly support timeout of iscsi
commands. The default will remain no timeout, but this can
be changed via cmdline parameters, e.g.:
qemu -iscsi timeout=30 -drive file=iscsi://...
If a timeout occurs a reconnect is scheduled and the timed out command
will be requeued for processing after a successful reconnect.
The required API call iscsi_set_timeout is present since libiscsi
1.10 which was released in October 2013. However, due to some bugs
in the libiscsi code the use is not recommended before version 1.15.
Please note that this patch bumps the libiscsi requirement to 1.10
to have all function and macros defined. The patch fixes also a
off-by-one error in the NOP timeout calculation which was fixed
while touching these code parts.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1434455107-19328-1-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
During migration, QEMU uses fsync()/fdatasync() on the open file
descriptor for read-write block devices to flush data just before
stopping the VM.
However, fsync() on a scsi-generic device returns -EINVAL which
causes the migration to fail. This patch skips flushing data in case
of an SG device, since submitting SCSI commands directly via an SG
character device (e.g. /dev/sg0) bypasses the page cache completely,
anyway.
Note that fsync() not only flushes the page cache but also the disk
cache. The scsi-generic device never sends flushes, and for
migration it assumes that the same SCSI device is used by the
destination host, so it does not issue any SCSI SYNCHRONIZE CACHE
(10) command.
Finally, remove the bdrv_is_sg() test from iscsi_co_flush() since
this is now redundant (we flush the underlying protocol at the end
of bdrv_co_flush() which, with this patch, we never reach).
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435056300-14924-3-git-send-email-dimara@arrikto.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Instead of checking bs->sg use bdrv_is_sg() consistently throughout
the code.
Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435056300-14924-2-git-send-email-dimara@arrikto.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
raw_bsd already has QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != 512), so iscsi
should relax.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
the allocationmap has only a hint character. The driver always
double checks that blocks marked unallocated in the cache are
still unallocated before taking the fast path and return zeroes.
So using the allocationmap is migration safe and can
also be enabled with cache.direct=on.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1429193313-4263-10-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1429193313-4263-9-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
a target may issue a SCSI_STATUS_TASK_SET_FULL status
if there is more than one "BUSY" command queued already.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1429193313-4263-8-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The idea is that a command is retried in a BUSY condition
up a time of approx. 60 seconds before it is failed. This should
be far higher than any command timeout in the guest.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1429193313-4263-7-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
SCSI allowes to tell the target to not return from a write command
if the date is not written to the disk. Use this so called FUA
bit if it is supported to optimize WRITE commands if writeback is
not allowed.
In this case qemu always issues a WRITE followed by a FLUSH. This
is 2 round trip times. If we set the FUA bit we can ignore the
following FLUSH.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1429193313-4263-6-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1429193313-4263-5-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1429193313-4263-4-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1429193313-4263-3-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We actually were always impolitely dropping the connection and
not cleanly logging out.
CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1429193313-4263-2-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
newer libiscsi versions may return zero events from iscsi_which_events.
In this case iscsi_service will return immediately without any progress.
To avoid busy waiting for iscsi_which_events to change we deregister all
read and write handlers in this case and schedule a timer to periodically
check iscsi_which_events for changed events.
Next libiscsi version will introduce async reconnects and zero events
are returned while libiscsi is waiting for a reconnect retry.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1428437295-29577-1-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The variable user in struct iscsi_url is a character array, not a pointer.
Therefore its address will never be NULL.
clang reports this error:
block/iscsi.c:1329:20: warning:
comparison of array 'iscsi_url->user' not equal to a null pointer
is always true [-Wtautological-pointer-compare]
Reviewed-by: Peter Lieven <pl@kamp.de>
Acked-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <1425719670-5486-1-git-send-email-sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Save the write protected flag and check before reopen.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1424839208-5195-1-git-send-email-famz@redhat.com>
[Fixed typo in the name of the new field. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Version: GnuPG v1
iQEcBAABAgAGBQJUV2wdAAoJEJykq7OBq3PIjcAH/29rl938ETw1wjXxYe3uH+R6
K2yFEiPh9/cOJSH0mJ+gD8DZIN+iyR4eoQGP2s5ALFPcX3bkYxRLlUeYK0BCp883
esc7gO6XPhLvTVqP0xgACRCdUwH2I0VTToDlHjXXZogyI/DuDX3gzWJufE3x1DGs
WNTMOp5n/uYkWH3rI3DkInmbSddEz3pgX65a8BuYtw0V/RSeSRnHKDYHMygvJBRL
EVfWRNeOIrZ730CyJry0t8ITjsZxiBDKXR5glNSwaIfQUfGkTSWi9YNSurNYkUDr
aMS2rgvOVlrOUDKTHUj9oS3jgoGWcDtlk9E1MeSoyIptbRoMhdFVl1AUJZsrMJU=
=Mfbu
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Mon 03 Nov 2014 11:50:53 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
* remotes/stefanha/tags/block-pull-request: (53 commits)
block: declare blockjobs and dataplane friends!
block: let commit blockjob run in BDS AioContext
block: let mirror blockjob run in BDS AioContext
block: let stream blockjob run in BDS AioContext
block: let backup blockjob run in BDS AioContext
block: add bdrv_drain()
blockjob: add block_job_defer_to_main_loop()
blockdev: add note that block_job_cb() must be thread-safe
blockdev: acquire AioContext in blockdev_mark_auto_del()
blockdev: acquire AioContext in do_qmp_query_block_jobs_one()
block: acquire AioContext in generic blockjob QMP commands
iotests: Expand test 061
block/qcow2: Simplify shared L2 handling in amend
block/qcow2: Make get_refcount() global
block/qcow2: Implement status CB for amend
qemu-img: Fix insignificant memleak
qemu-img: Add progress output for amend
block: Add status callback to bdrv_amend_options()
block: qemu-iotest 107 supports NFS
iotests: Add test for qcow2's bdrv_make_empty
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Cancel oversized requests early. They would generate
an iSCSI protocol error anyway; after having transferred
possibly a lot of data over the wire.
Suggested-By: Max Reitz <mreitz@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
As Max pointed out there is a hidden cast from int64_t to int for all
limits. So use the newly introduced sector_limits_lun2qemu for all
limits received from the target.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Copy the max_xfer_len from the BlockLimits VPD or use the
maximum value fitting in the CDB.
The helper function sector_limits_lun2qemu is introduced to convert
and cap the limits from the VPD to the maximum power of two fitting
in an integer; integer is the range for nb_sectors throughout
the block layer.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Before, when a write protected iSCSI target is attached as scsi-disk
with BDRV_O_RDWR, we report it as writable, while in fact all writes
will fail.
One way to improve this is to report write protect flag as true to
guest, but a even better way is to refuse using a write protected LUN to
guest.
Target write protect flag is checked with a mode sense query.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I'll use it with block backends shortly, and the name is going to fit
badly there. It's a block layer thing anyway, not just a block driver
thing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I'll use BlockDriverAIOCB with block backends shortly, and the name is
going to fit badly there. It's a block layer thing anyway, not just a
block driver thing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Creating an anonymous BDS can't fail. Make that obvious.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
yet 100% thread-safe, though, which makes it really, really
experimental. It also brings asynchronous cancellation to
the SCSI subsystem and implements it in virtio-scsi. This
is a pretty important feature. Almost all the work here
was done by Fam Zheng.
I also included the virtio refcount fixes from Gonglei,
because they had a small conflict with virtio-scsi dataplane.
This pull request is using the new subkey 4E6B09D7.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJUKpR2AAoJEBRUblpOawnXNLAH/RBeF66ZqWc29dl78JKEbv0+
C5pL61GhlI5vFIjSbPU3/iaZQifw3E4NLvX3SCN5ImsLzBw4r3qerapP2Ut96K/j
5CYdWTF1oqE32oCefvlWhJulHmE1vxGN53BvOz3HHxoehdF1/tJ0wUoZyfztGTOF
tiW85VMewi6CKm47/ns5tSNfGMVzWHqnUg67z/mwN6ZmPFU1dXBlgmiIv8Znahrn
B1AOAeMjWaKvOS+tiYNVG6k0GENWGoiypxiTR3ZXLQKxOYdkh/X0ARULqLMonASX
YsT772nzO9KZDIsdLj9QZZmM7vxs7UhW0MgQlvcSWP9vfZa5SeuRSgoXorPDj3Q=
=54T3
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
This update brings dataplane to virtio-scsi (NOT
yet 100% thread-safe, though, which makes it really, really
experimental. It also brings asynchronous cancellation to
the SCSI subsystem and implements it in virtio-scsi. This
is a pretty important feature. Almost all the work here
was done by Fam Zheng.
I also included the virtio refcount fixes from Gonglei,
because they had a small conflict with virtio-scsi dataplane.
This pull request is using the new subkey 4E6B09D7.
# gpg: Signature made Tue 30 Sep 2014 12:31:02 BST using RSA key ID 4E6B09D7
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: aka "Paolo Bonzini <bonzini@gnu.org>"
* remotes/bonzini/tags/for-upstream: (39 commits)
block/iscsi: handle failure on malloc of the allocationmap
util: introduce bitmap_try_new
virtio-scsi: Handle TMF request cancellation asynchronously
scsi: Introduce scsi_req_cancel_async
scsi: Introduce scsi_req_cancel_complete
scsi: Drop SCSIReqOps.cancel_io
scsi: Unify request unref in scsi_req_cancel
scsi-generic: Handle canceled request in scsi_command_complete
scsi: Drop scsi_req_abort
virtio-scsi: Process ".iothread" property
virtio-scsi: Call bdrv_io_plug/bdrv_io_unplug in cmd request handling
virtio-scsi: Batched prepare for cmd reqs
virtio-scsi: Two stages processing of cmd request
virtio-scsi: Add migration state notifier for dataplane code
virtio-scsi: Hook up with dataplane
virtio-scsi-dataplane: Code to run virtio-scsi on iothread
virtio-scsi: Add VirtIOSCSIVring in VirtIOSCSIReq
virtio-scsi: Add 'iothread' property to virtio-scsi
virtio: add a wrapper for virtio-backend initialization
virtio-9p: fix virtio-9p child refcount in transports
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
iscsi_aio_write16_cb, iscsi_aio_writev, iscsi_aio_read16_cb, and
iscsi_aio_readv have not not been in use since commit
063c3378a9 ("block/iscsi: introduce
bdrv_co_{readv, writev, flush_to_disk}").
These were the only trace events in block/iscsi.c so drop the the
trace.h include.
Cc: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1411394595-15300-4-git-send-email-stefanha@redhat.com
Currently the file size requested by user is rounded down to nearest
sector, causing the actual file size could be a bit less than the size
user requested. Since some formats (like qcow2) record virtual disk
size in bytes, this can make the last few bytes cannot be accessed.
This patch fixes it by rounding up file size to nearest sector so that
the actual file size is no less than the requested file size.
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bs->total_sectors is not yet updated at this point. resulting
in memory corruption if the volume has grown and data is written
to the newly availble areas.
CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
Patch created with Coccinelle, with two manual changes on top:
* Add const to bdrv_iterate_format() to keep the types straight
* Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
inexplicably misses
Coccinelle semantic patch:
@@
type T;
@@
-g_malloc(sizeof(T))
+g_new(T, 1)
@@
type T;
@@
-g_try_malloc(sizeof(T))
+g_try_new(T, 1)
@@
type T;
@@
-g_malloc0(sizeof(T))
+g_new0(T, 1)
@@
type T;
@@
-g_try_malloc0(sizeof(T))
+g_try_new0(T, 1)
@@
type T;
expression n;
@@
-g_malloc(sizeof(T) * (n))
+g_new(T, n)
@@
type T;
expression n;
@@
-g_try_malloc(sizeof(T) * (n))
+g_try_new(T, n)
@@
type T;
expression n;
@@
-g_malloc0(sizeof(T) * (n))
+g_new0(T, n)
@@
type T;
expression n;
@@
-g_try_malloc0(sizeof(T) * (n))
+g_try_new0(T, n)
@@
type T;
expression p, n;
@@
-g_realloc(p, sizeof(T) * (n))
+g_renew(T, p, n)
@@
type T;
expression p, n;
@@
-g_try_realloc(p, sizeof(T) * (n))
+g_try_renew(T, p, n)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the iscsi block driver.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
during rebasing the changed init value for the
retry counter was missed. This resulted in no retries
being performed at all.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch lifts the minimum supported libiscsi version from 1.4.0 to
1.9.0 since the BUSY patch required that change.
On one this allows us to remove all #ifdefs from the code which
makes the code easier to maintain and read. On the other hand
I would not recommend libiscsi prior to 1.8.0 for production use
because the following important libiscsi fixes for deadlocks and
protocol errors are missing prior to 1.8.0:
dbe9a1e SOCKET queue cmd PDUs directly in waitpdu queue
30df192 DATA-OUT set pdu->cmdsn appropriately
548bd22 ISCSI fix broken send logic in iscsi_scsi_async_command
14bee10 RECONNECT do not increase CmdSN for immediate PDUs
1f4a66a PDU queue out PDUs in order of itt.
562dd46 PDU avoid incrementing itt to 0xffffffff
cd09c0f PDU use serial32 arithmetic for cmdsn, maxcmdsn and expcmdsn.
89e918e SOCKET validate data_size in in_pdu header
91267f5 Limit immediate and unsolicited data to FirstBurstLength
Note that libiscsi 1.9.0 was released on Feb 24th, 2013, about
one month after 1.8.0.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
this patch changes the driver to uses 16 Byte CDBs for
READ/WRITE only if the target requires 64bit lba addressing.
On one hand this saves 6 bytes in each PDU on the other
hand it seems that 10 Byte CDBs seems to be much better
supported and tested as a recent issue I had with a
major storage supplier lined out.
For WRITESAME the logic is a bit more tricky as WRITESAME10
with UNMAP was added really late. Thus a fallback to WRITESAME16
is possible if it supports UNMAP and WRITESAME10 not.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
it might happen in the future that a function directly invokes its callback.
In this case we end up in a segfault because the iTask is gone when the BH
is scheduled.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
this patch adds handling of BUSY status reponse from an iSCSI target.
Currently, we fail with -EIO in case of SCSI_STATUS_BUSY while the
obvious reaction would be to retry the operation after some time.
The retry time is randomly choosen from a range with exponential
growth increasing with each retry.
This patch includes most of the changes by a an upcoming patch
from Stefan Hajnoczi:
iscsi: implement .bdrv_detach/attach_aio_context()
because I also need the reference to the aio_context for
the retry timer to work. I included the changes to maintain
better mergeability.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that all backend drivers are using QemuOpts, remove all
QEMUOptionParameter related codes.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Drop the assumption that we're using the main AioContext for Linux
AIO. Convert qemu_aio_set_fd_handler() to aio_set_fd_handler() and
timer_new_ms() to aio_timer_new().
The .bdrv_detach/attach_aio_context() interfaces also need to be
implemented to move the fd and timer from the old to the new AioContext.
Cc: Peter Lieven <pl@kamp.de>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
* remotes/bonzini/scsi-next:
megasas: remove buildtime strings
block: iscsi build fix if LIBISCSI_FEATURE_IOVECTOR is not defined
virtio-scsi: Plug memory leak on virtio_scsi_push_event() error path
scsi: Document intentional fall through in scsi_req_length()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit b03c380 introduced the function
iscsi_allocationmap_is_allocated(), however it is only used within a
code block that is conditionally compiled. This produces a warning
(error with -werror) of "defined but not used" for the the function, if
LIBISCSI_FEATURE_IOVECTOR is not defined.
This wraps iscsi_allocationmap_is_allocated() in the same conditional.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* remotes/bonzini/scsi-next:
[PATCH] block/iscsi: bump year in copyright notice
block/iscsi: allow cluster_size of 4K and greater
block/iscsi: clarify the meaning of ISCSI_CHECKALLOC_THRES
block/iscsi: speed up read for unallocated sectors
block/iscsi: allow fall back to WRITE SAME without UNMAP
MAINTAINERS: mark megasas as maintained
megasas: Add MSI support
megasas: Enable MSI-X support
megasas: Implement LD_LIST_QUERY
scsi: Improve error messages more
scsi-disk: Improve error messager if can't get version number
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
depending on the target the opt_unmap_gran might be as low
as 4K. As we know use this also as a knob to activate the allocationmap
feature lower the barrier. The limit 4K (and not 512) is choosen
to avoid a potentially too big allocationmap.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
this patch implements a cache that tracks if a page on the
iscsi target is allocated or not. The cache is implemented in
a way that it allows for false positives
(e.g. pretending a page is allocated, but it isn't), but
no false negatives.
The cached allocation info is then used to speed up the
read process for unallocated sectors by issueing a GET_LBA_STATUS
request for all sectors that are not yet known to be allocated.
If the read request is confirmed to fall into an unallocated
range we directly return zeroes and do not transfer the
data over the wire.
Tests have shown that a relatively small amount of GET_LBA_STATUS
requests happens a vServer boot time to fill the allocation cache
(all those blocks are not queried again).
Not to transfer all the data of unallocated sectors saves a lot
of time, bandwidth and storage I/O load during block jobs or storage
migration and it saves a lot of bandwidth as well for any big sequential
read of the whole disk (e.g. block copy or speed tests) if a significant
number of blocks is unallocated.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
if the iscsi driver receives a write zeroes request with
the BDRV_REQ_MAY_UNMAP flag set it fails with -ENOTSUP
if the iscsi target does not support WRITE SAME with
UNMAP. However, the BDRV_REQ_MAY_UNMAP is only a hint
and writing zeroes with WRITE SAME will still be
better than falling back to writing zeroes with WRITE16.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using error_is_set(errp) that way can sweep programming errors under
the carpet when we get called incorrectly with an error set.
Commit 24d3bd6 added a broken error path to iscsi_do_inquiry(): it
first calls error_setg(), then jumps to the preexisting error label,
where error_setg() gets called again, triggering an assertion failure.
Commit cbee81f fixed this by guarding the second error_setg() with an
error_is_set().
Replace this fix by a simpler and safer one: jump right behind the
second error_setg().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch adds an errp parameter to bdrv_new() and updates all its
callers. The next patches will make use of this in order to check for
duplicate IDs. Most of the callers know that their ID is fine, so they
can simply assert that there is no error.
Behaviour doesn't change with this patch yet as bdrv_new() doesn't
actually assign errors to errp.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This eliminates the possible assertion failure in error_setg().
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max WRITE SAME length is also used when the UNMAP bit is zero, so it
should be queried even if LBPWS=0. Same for the optimal transfer
length.
However, the write_zeroes_alignment only matters for UNMAP=1 so we
still restrict it to LBPWS=1.
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Non-block SCSI devices do not support flushing, but we may still send
them requests via bdrv_flush_all. Just ignore them.
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some targets may return "invalid field" as the ASCQ from WRITE SAME
if they support the command only without the UNMAP field. Recognize
that, and return ENOTSUP just like for "invalid operation code".
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The current iscsi block driver code makes the rather arbitrary decision
that TYPE_MEDIUM_CHANGER and TYPE_TAPE devices have bs->sg = 1 and all
other device types are disks.
Instead of this, check for TYPE_DISK to expose the disk interface and
make everything else bs->sg = 1. In particular, this includes devices
with TYPE_STORAGE_ARRAY, which is what LUN 0 of an iscsi target is.
(See https://bugzilla.redhat.com/show_bug.cgi?id=1067784 for the exact
scenario.)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
commit fa6252b0 introduced a segfault because it tries
to read iTask.task->sense after iTask.task has been
freed.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
this patch ensures that we only query for block provisioning and
block limits vpd pages if they are advertised. It also cleans
up the inquiry code and eliminates some redundant code.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
the retry logic was broken because the complete status
of the task structure was not reset. this resulted in
an infinite loop retrying the command over and over.
CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Before:
$ ./qemu-io-old
qemu-io-old> open -r -o file.driver=iscsi,file.filename=foo
Failed to parse URL : foo
qemu-io-old: can't open device (null): Could not open 'foo': Invalid argument
After:
$ ./qemu-io
qemu-io> open -r -o file.driver=iscsi,file.filename=foo
qemu-io: can't open device (null): Failed to parse URL : foo
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
error_is_set(&var) is the same as var != NULL, but it takes
whole-program analysis to figure that out. Unnecessarily hard for
optimizers, static checkers, and human readers. Dumb it down to
obvious.
Gets rid of several dozen Coverity false positives.
Note that the obvious form is already used in many places.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
iSCSI currently does not need to do any actions to support the
current usage of bdrv_reopen(). However, it is important to note
a couple of things: 1.) A connection will not be re-established to
an iSCSI target, and 2.) If iscsi_open() is changed to parse 'flags',
then iscsi_reopen_prepare() may need to be more than a stub.
In light of the above, this commit adds comments above both of the
functions to bring attention to these facts.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
the opt_transfer_length has nothing to do with logical
block provisioning stuff so always copy it from
the block limits VPD page.
Reported-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* bonzini/scsi-next:
scsi: Support TEST UNIT READY in the dummy LUN0
block: add .bdrv_reopen_prepare() stub for iscsi
virtio-scsi: Prevent assertion on missed events
virtio-scsi: Cleanup of I/Os that never started
scsi: Assign cancel_io vector for scsi_disk_emulate_ops
Conflicts:
block/iscsi.c
aliguori: resolve trivial merge conflict in block/iscsi.c
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
The iSCSI backend already gets the block size from the READ CAPACITY
command it sends. Save it so that the generic block layer gets it
too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
if an async libiscsi call fails directly it can only be due
to an out of memory condition. All other errors are returned
through the callback.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To suppport reopen(), the .bdrv_reopen_prepare() stub must exist.
iSCSI does not have anything that needs to be done to support reopen,
so we can just implement the _prepare() stub.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* bonzini/scsi-next:
scsi-disk: add UNMAP limits to block limits VPD page
block/iscsi: use a bh to schedule co reentrance
Message-id: 1387720926-11421-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
This is a boiler-plate _nofail variant of qemu_opts_create. Remove and
use error_abort in call sites.
null/0 arguments needs to be added for the id and fail_if_exists fields
in affected callsites due to argument inconsistency between the normal and
no_fail variants.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Function iscsi_read10_task got additional parameters starting with version
libiscsi 1.5.0.
libiscsi 1.4.0 is still widely used (Debian wheezy, jessie and other Linux
distributions currently provide packages for QEMU which use it), so we
still need support for this older API.
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
this fixes a potential segfault and performance regression.
If the coroutine is reentered directly in the iscsi_co_generic_cb
iscsi_process_{read,write} are interrupted and reentered any
time later. One the one hand this could happen after an iscsi_close
where the iscsi context is already gone (segfault). On the
other hand this limits the number of processed callbacks
in each aio_dispatch to one (potential performance regression).
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
this converts read, write and flush functions from aio to coroutines
eliminating almost 200 lines of code.
The requirement for libiscsi is bumped to version 1.4.0 which was
released in may 2012.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
this patch aims to set bdi->cluster_size to the internal page size
of the iscsi target so that enabled callers can align requests
properly.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The current check is right for MAY_UNMAP=1. For MAY_UNMAP=0, just
try and fall back to regular writes as soon as a WRITE SAME command
fails.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
added myself to reflect recent work on the iscsi block driver.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
since commit 3ac21627 the default value changed to 0.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
now that bdrv_co_discard can handle limits we do not need
the request split logic here anymore.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
compatiblity -> compatibility
continously -> continuously
existance -> existence
usefull -> useful
shoudl -> should
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Commit f35c934a accidently disabled iscsi_co_get_block_status for all
libiscsi versions. Its not possible to check for enumeration constants
in the C preprocessor. This patch changes the check to the preprocessor
constant LIBISCSI_FEATURE_IOVECTOR which was introduced shortly after
get_lba_status support was added to libiscsi.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some drivers will have driver specifics options but no filename.
This new bool allow the block layer to treat them correctly.
The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and
not having .bdrv_open.
The first exception to this rule will be the quorum driver.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
# By Hervé Poussineau (5) and Stefan Weil (1)
# Via Paolo Bonzini
* bonzini/scsi-next:
block/iscsi: Drop iscsi_co_get_block_status for older versions of libiscsi
lsi: add 53C810 variant
lsi: remove todo
lsi: ignore write accesses to CTEST0 registers
lsi: check ssid versus sdid only if ssid is valid
lsi: use constant name instead of its value
Debian wheezy includes libiscsi-dev 1.4.0 which does not provide
SCSI_PROVISIONING_TYPE_DEALLOCATED. Drop iscsi_co_get_block_status
in this case to allow compilation without errors.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
# By Max Reitz (16) and others
# Via Kevin Wolf
* kwolf/for-anthony: (33 commits)
qemu-iotests: Fix test 038
block: Assert validity of BdrvActionOps
qemu-iotests: Cleanup test image in test number 007
qemu-img: fix invalid JSON
coroutine: add ./configure --disable-coroutine-pool
qemu-iotests: Adjustments due to error propagation
qcow2: Use Error parameter
qemu-img create: Emit filename on error
block: Error parameter for create functions
block: Error parameter for open functions
bdrv: Use "Error" for creating images
bdrv: Use "Error" for opening images
qemu-iotests: add 057 internal snapshot for block device test case
hmp: add interface hmp_snapshot_delete_blkdev_internal
hmp: add interface hmp_snapshot_blkdev_internal
qmp: add interface blockdev-snapshot-delete-internal-sync
qmp: add interface blockdev-snapshot-internal-sync
qmp: add internal snapshot support in qmp_transaction
snapshot: distinguish id and name in snapshot delete
snapshot: new function bdrv_snapshot_find_by_id_and_name()
...
Message-id: 1379073063-14963-1-git-send-email-kwolf@redhat.com
Replace .bdrv_aio_discard with .bdrv_co_discard so that discard
requests can be split in multiple parts, each for a small amount
of sectors.
This is useful because we expose a generic API with no limit
on the amount of sectors that can be unmapped in one request.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add an Error ** parameter to BlockDriver.bdrv_open and
BlockDriver.bdrv_file_open to allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>