the former is not actually used by explicit backend, so instead of
silently ignoring the option in non valid context, exit with error.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20200409134133.11339-1-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some of the constraints on operand sizes have been relaxed, so adjust the
documentation.
Deprecate atomic_mb_read and atomic_mb_set; it is not really possible to
use them correctly because they do not interoperate with sequentially-consistent
RMW operations.
Finally, extend the memory barrier pairing section to cover acquire and
release semantics in general, roughly based on the KVM Forum 2016 talk,
"<atomic.h> weapons".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
OPC_SYNC_WMB, OPC_SYNC_MB, OPC_SYNC_ACQUIRE, OPC_SYNC_RELEASE and
OPC_SYNC_RMB have wrong encode. According to the mips manual,
their encode should be 'OPC_SYNC | 0x?? << 6' rather than
'OPC_SYNC | 0x?? << 5'. Wrong encode can lead illegal instruction
errors. These instructions often appear with multi-threaded
simulation.
Fixes: 6f0b99104a ("tcg/mips: Add support for fence")
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: lixinyu <precinct@mail.ustc.edu.cn>
Message-Id: <20200411124612.12560-1-precinct@mail.ustc.edu.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In touch_all_pages, if the mutex is not taken around qemu_cond_broadcast,
qemu_cond_broadcast may be called before all touch page threads enter
qemu_cond_wait. In this case, the touch page threads wait forever for the
main thread to wake them up, causing a deadlock.
Signed-off-by: Bauerchen <bauerchen@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With QEMU 4.0 an incompatible change was added to pc_piix, which makes it
practical impossible to migrate domUs started with qemu2 or qemu3 to
newer qemu versions. Commit 7fccf2a068
added and enabled a new member "smbus_no_migration_support". In commit
4ab2f2a8aa the vmstate_acpi got new
elements, which are conditionally filled. As a result, an incoming
migration expected smbus related data unless smbus migration was
disabled for a given MachineClass. Since first commit forgot to handle
'xenfv', domUs started with QEMU 4.x are incompatible with their QEMU
siblings.
Using other existing machine types, such as 'pc-i440fx-3.1', is not
possible because 'xenfv' creates the 'xen-platform' PCI device at
00:02.0, while all other variants to run a domU would create it at
00:04.0.
To cover both the existing and the broken case of 'xenfv' in a single
qemu binary, a new compatibility variant of 'xenfv-4.2' must be added
which targets domUs started with qemu 4.2. The existing 'xenfv' restores
compatibility of QEMU 5.x with qemu 3.1.
Host admins who started domUs with QEMU 4.x (preferrable QEMU 4.2)
have to use a wrapper script which appends '-machine xenfv-4.2' to
the device-model command line. This is only required if there is no
maintenance window which allows to temporary shutdown the domU and
restart it with a fixed device-model.
The wrapper script is as simple as this:
#!/bin/sh
exec /usr/bin/qemu-system-i386 "$@" -machine xenfv-4.2
With xl this script will be enabled with device_model_override=, see
xl.cfg(5). To live migrate a domU, adjust the existing domU.cfg and pass
it to xl migrate or xl save/restore:
xl migrate -C new-domU.cfg domU remote-host
xl save domU CheckpointFile new-domU.cfg
xl restore new-domU.cfg CheckpointFile
With libvirt this script will be enabled with the <emulator> element in
domU.xml. Use 'virsh edit' prior 'virsh migrate' to replace the existing
<emulator> element to point it to the wrapper script.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Message-Id: <20200327151841.13877-1-olaf@aepfle.de>
[Adjust tests for blacklisted machine types, simplifying the one in
qom-test. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When using C11 atomics, non-seqcst reads and writes do not participate
in the total order of seqcst operations. In util/async.c and util/aio-posix.c,
in particular, the pattern that we use
write ctx->notify_me write bh->scheduled
read bh->scheduled read ctx->notify_me
if !bh->scheduled, sleep if ctx->notify_me, notify
needs to use seqcst operations for both the write and the read. In
general this is something that we do not want, because there can be
many sources that are polled in addition to bottom halves. The
alternative is to place a seqcst memory barrier between the write
and the read. This also comes with a disadvantage, in that the
memory barrier is implicit on strongly-ordered architectures and
it wastes a few dozen clock cycles.
Fortunately, ctx->notify_me is never written concurrently by two
threads, so we can assert that and relax the writes to ctx->notify_me.
The resulting solution works and performs well on both aarch64 and x86.
Note that the atomic_set/atomic_read combination is not an atomic
read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED;
on x86, ATOMIC_RELAXED compiles to a locked operation.
Analyzed-by: Ying Fang <fangying1@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Ying Fang <fangying1@huawei.com>
Message-Id: <20200407140746.8041-6-pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Any thread that is not a iothread returns NULL for qemu_get_current_aio_context().
As a result, it would also return true for
in_aio_context_home_thread(qemu_get_aio_context()), causing
AIO_WAIT_WHILE to invoke aio_poll() directly. This is incorrect
if the BQL is not held, because aio_poll() does not expect to
run concurrently from multiple threads, and it can actually
happen when savevm writes to the vmstate file from the
migration thread.
Therefore, restrict in_aio_context_home_thread to return true
for the main AioContext only if the BQL is held.
The function is moved to aio-wait.h because it is mostly used
there and to avoid a circular reference between main-loop.h
and block/aio.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200407140746.8041-5-pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The io_uring_enter(2) syscall returns with errno=EINTR when interrupted
by a signal. Retry the syscall in this case.
It's essential to do this in the io_uring_submit_and_wait() case. My
interpretation of the Linux v5.5 io_uring_enter(2) code is that it
shouldn't affect the io_uring_submit() case, but there is no guarantee
this will always be the case. Let's check for -EINTR around both APIs.
Note that the liburing APIs have -errno return values.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200408091139.273851-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Coverity reported a missing fall through comment, add it.
Fixes: e5918d7d7f ("target/rx: TCG translation")
Reported-by: Coverity (CID 1422222 MISSING_BREAK)
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200403184419.28556-1-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Rather than dynamically allocate, and risk failing to free
when we longjmp out of the translator, allocate the maximum
buffer size based on the maximum supported instruction length.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Richard Henderson <richard.henderson@linaro.org>
- Fix crashes and hangs related to iothreads, bdrv_drain and block jobs:
- Fix some AIO context locking in jobs
- Fix blk->in_flight during blk_wait_while_drained()
- vpc: Don't round up already aligned BAT sizes
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJejI1UAAoJEH8JsnLIjy/WQhEQAIF2zkkdbT77VTMLvFmPU36J
hPR2RSzd5egNtfn3zm7vGZJ3FEZb5SEAV5R9CnwVAJeQnPUll7bjkPsoynrEUFU2
PSoFe8lKuYwoAOOSjqakASpKx/Xs3sctKc4fgRWr5/dbWFCC77Q+qxHiKO4KYfeh
g00jsjeCLMVLop3r9uMDovlA81mJUIP5axSjSH7akgzLC+ZCfH2RFFu62D7wYv9w
UgQM/NZt2YuLFm+BeQLB4K2BK+PZBCN1trPj+h11ecUwm9b1XZZfO/7R0T8Wrljc
49fd5Z/GombCnyxuLCr6QrhLZcr8FffLJXQgkdHTkWUKQXqZUfmqLjVIAbli0ZiC
hP8jE5EgdAXpBrGeM2oH+dk0iQSBGTIMaGlhDwxO32xOAQ8TKpRiMZMfwGHqB+vn
m/EPoHLHL6vQGxISfGjj4k3fnSP74nvRrS656MhLuG03SkPZacyTZQpTkErg2imW
AeU6g4afvvLtJBoF/YYA5Qhff1Ux5eh9jactIW1DRf/Q4tTc4ioTU25560Le4eGZ
kex/AwIcf9P47eTUCP8L6iNKz8RU7bV4g9vl9zz7fQm3i9GEhly88XvOppnXRUvT
XdkfdlSmUZ9vue6rAfgsL5fQIHtsGRfH90nT11/IW1X4baOImtcQBWg3xZdR4zPS
W2H0J01PlSKE8l/OWQlo
=oODf
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- Fix crashes and hangs related to iothreads, bdrv_drain and block jobs:
- Fix some AIO context locking in jobs
- Fix blk->in_flight during blk_wait_while_drained()
- vpc: Don't round up already aligned BAT sizes
# gpg: Signature made Tue 07 Apr 2020 15:25:24 BST
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
vpc: Don't round up already aligned BAT sizes
block: Fix blk->in_flight during blk_wait_while_drained()
block: Increase BB.in_flight for coroutine and sync interfaces
block-backend: Reorder flush/pdiscard function definitions
backup: don't acquire aio_context in backup_clean
replication: assert we own context before job_cancel_sync
job: take each job's lock individually in job_txn_apply
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When %gs cannot be used, we use register offset addressing.
This path is almost never used, so it was clearly not tested.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200406174803.8192-1-richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Without -Werror, the probe may succeed, but then compilation fails
later when -Werror is added for other reasons. Shows up on windows,
where the compiler complains about -fPIC.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200401214756.6559-1-richard.henderson@linaro.org>
Message-Id: <20200403191150.863-13-alex.bennee@linaro.org>
The https://makecode.microbit.org/#editor generates slightly weird
.hex files which work fine on a real microbit but causes QEMU to
choke. The reason is extraneous data after the EOF record which causes
the loader to attempt to write a bigger file than it should to the
"rom". According to the HEX file spec an EOF really should be the last
thing we process so lets do that.
Reported-by: Ursula Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200403191150.863-12-alex.bennee@linaro.org>
Don't use magic spaces, calculate the justification for the file
field like the kernel does with seq_pad.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200403191150.863-10-alex.bennee@linaro.org>
Unfortunately reading /proc/self/maps is still considered the gold
standard for a process finding out about it's own memory layout. As we
will want this data in other contexts soon factor out the code to read
and parse the data. Rather than just blindly copying the existing
sscanf based code we use a more modern glib version of the parsing
code to make a more general purpose map structure.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200403191150.863-9-alex.bennee@linaro.org>
All other calls to normalize*Subnormal detect zero input before
the call -- this is the only outlier. This case can happen with
+0.0 + +0.0 = +0.0 or -0.0 + -0.0 = -0.0, so return a zero of
the correct sign.
Reported-by: Coverity (CID 1421991)
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200327232042.10008-1-richard.henderson@linaro.org>
Message-Id: <20200403191150.863-8-alex.bennee@linaro.org>
./gdbstub.c: In function ‘handle_query_thread_extra’:
/usr/include/glib-2.0/glib/glib-autocleanups.h:28:10:
error: ‘cpu_name’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
g_free (*pp);
^
./gdbstub.c:2063:26: note: ‘cpu_name’ was declared here
g_autofree char *cpu_name;
^
cc1: all warnings being treated as errors
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Message-Id: <20200326151407.25046-1-dplotnikov@virtuozzo.com>
Reported-by: Euler Robot <euler.robot@huawei.com>
Reported-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Miroslav Rezanina <mrezanin@redhat.com>
Message-Id: <20200325092137.24020-1-kuhn.chenqun@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200403191150.863-7-alex.bennee@linaro.org>
Dynamically allocating a new structure within the DisasContext can
potentially leak as we can longjmp out of the translation loop (see
test_phys_mem). The proper fix would be to use static allocation
within the DisasContext but as the Xtensa translator imports it's code
from elsewhere I leave that as an exercise for the maintainer.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Message-Id: <20200403191150.863-6-alex.bennee@linaro.org>
Searching for memory space can cause problems so lets extend the
CPU_LOG_PAGE output so you can watch init_guest_space fail to
allocate memory. A more involved fix is actually required to make this
function play nicely with the large guard pages the sanitiser likes to
use.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20200403191150.863-5-alex.bennee@linaro.org>
We are not using them and they just get in the way.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200403191150.863-4-alex.bennee@linaro.org>
Checking TARGET_ABI_BITS is sketchy - we should check for the presence
of the define to be sure. Also clean up the white space while we are
there.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200403191150.863-3-alex.bennee@linaro.org>
It's perfectly possible to have no function symbols in your elf file
and if we do the undefined behaviour sanitizer rightly complains about
us passing NULL to qsort. Check nsyms before we go ahead.
While we are at it lets drop the unchecked return value and cleanup
the fail leg by use of g_autoptr.
Another fix was proposed 101 weeks ago in:
Message-Id: 20180421232120.22208-1-f4bug@amsat.org
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200403191150.863-2-alex.bennee@linaro.org>
Some GitHub users try to open pull requests against the GitHub
mirror. Unfortunate these get ignored until eventually someone
notices and closes the request.
Enable the 'Repo Lockdown' [*] 3rd party bot which can autorespond
to pull requests with a friendly comment, close the request, and
then lock it to prevent further comments.
[*] https://github.com/dessant/repo-lockdown
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-Id: <20200406214125.18538-1-f4bug@amsat.org>
[AJB: s/fill/file/ and point at canonical qemu.org/contribute]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20200406165043.1447837-1-anthony.perard@citrix.com>
Since 7f5d9b206d ("object-add: don't create return value if
failed"), qmp_object_add() don't write any value in 'ret_data', thus
has random data. Then qobject_unref() fails and abort().
Fix by initialising 'ret_data' properly.
Fixes: 5f07c4d60d ("qapi: Flatten object-add")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200406164207.1446817-1-anthony.perard@citrix.com>
The function usbback_packet_complete() currently takes a USBPacket*,
which must be a pointer to the packet field within a struct
usbback_req; the function uses container_of() to get the struct
usbback_req* given the USBPacket*.
This is unnecessarily confusing (and in particular it confuses the
Coverity Scan analysis, resulting in the false positive CID 1421919
where it thinks that we write off the end of the structure). Since
both callsites already have the pointer to the struct usbback_req,
just pass that in directly.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20200323164318.26567-1-peter.maydell@linaro.org>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
As reported on Launchpad, Azure apparently doesn't accept images for
upload that are not both aligned to 1 MB blocks and have a BAT size that
matches the image size exactly.
As far as I can tell, there is no real reason why we create a BAT that
is one entry longer than necessary for aligned image sizes, so change
that.
(Even though the condition is only mentioned as "should" in the spec and
previous products accepted larger BATs - but we'll try to maintain
compatibility with as many of Microsoft's ever-changing interpretations
of the VHD spec as possible.)
Fixes: https://bugs.launchpad.net/bugs/1870098
Reported-by: Tobias Witek
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200402093603.2369-1-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Waiting in blk_wait_while_drained() while blk->in_flight is increased
for the current request is wrong because it will cause the drain
operation to deadlock.
This patch makes sure that blk_wait_while_drained() is called with
blk->in_flight increased exactly once for the current request, and that
it temporarily decreases the counter while it waits.
Fixes: cf3129323f
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200407121259.21350-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
External callers of blk_co_*() and of the synchronous blk_*() functions
don't currently increase the BlockBackend.in_flight counter, but calls
from blk_aio_*() do, so there is an inconsistency whether the counter
has been increased or not.
This patch moves the actual operations to static functions that can
later know they will always be called with in_flight increased exactly
once, even for external callers using the blk_co_*() coroutine
interfaces.
If the public blk_co_*() interface is unused, remove it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200407121259.21350-3-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move all variants of the flush/pdiscard functions to a single place and
put the blk_co_*() version first because it is called by all other
variants (and will become static in the next patch).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200407121259.21350-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All code-paths leading to backup_clean (via job_clean) have the job's
context already acquired. The job's context is guaranteed to be the same
as the one used by backup_top via backup_job_create.
Since the previous logic effectively acquired the lock twice, this
broke cleanup of backups for disks using IO threads, since the BDRV_POLL_WHILE
in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release the lock
once, thus deadlocking with the IO thread.
This is a partial revert of 0abf258171.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200407115651.69472-4-s.reiter@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
job_cancel_sync requires the job's lock to be held, all other callers
already do this (replication_stop, drive_backup_abort,
blockdev_backup_abort, job_cancel_sync_all, cancel_common).
In this case we're in a BlockDriver handler, so we already have a lock,
just assert that it is the same as the one used for the commit_job.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Message-Id: <20200407115651.69472-3-s.reiter@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All callers of job_txn_apply hold a single job's lock, but different
jobs within a transaction can have different contexts, thus we need to
lock each one individually before applying the callback function.
Similar to job_completed_txn_abort this also requires releasing the
caller's context before and reacquiring it after to avoid recursive
locks which might break AIO_WAIT_WHILE in the callback. This is safe, since
existing code would already have to take this into account, lest
job_completed_txn_abort might have broken.
This also brings to light a different issue: When a callback function in
job_txn_apply moves it's job to a different AIO context, callers will
try to release the wrong lock (now that we re-acquire the lock
correctly, previously it would just continue with the old lock, leaving
the job unlocked for the rest of the return path). Fix this by not caching
the job's context.
This is only necessary for qmp_block_job_finalize, qmp_job_finalize and
job_exit, since everyone else calls through job_exit.
One test needed adapting, since it calls job_finalize directly, so it
manually needs to acquire the correct context.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Message-Id: <20200407115651.69472-2-s.reiter@proxmox.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit a31ca6801c ("qemu/queue.h: clear linked list pointers on
remove") revealed that a request was removed twice from a list, once
in xen_block_finish_request() and a second time in
xen_block_release_request() when both function are called from
xen_block_complete_aio(). But also, the `requests_inflight' counter is
decreased twice, and thus became negative.
This is a bug that was introduced in bfd0d63660 ("xen-block: improve
response latency"), where a `finished' list was removed.
That commit also introduced a leak of request in xen_block_do_aio().
That function calls xen_block_finish_request() but the request is
never released after that.
To fix both issue, we do two changes:
- we squash finish_request() and release_request() together as we want
to remove a request from 'inflight' list to add it to 'freelist'.
- before releasing a request, we need to let the other end know the
result, thus we should call xen_block_send_response() before
releasing a request.
The first change fixes the double QLIST_REMOVE() as we remove the extra
call. The second change makes the leak go away because if we want to
call finish_request(), we need to call a function that does all of
finish, send response, and release.
Fixes: bfd0d63660 ("xen-block: improve response latency")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20200406140217.1441858-1-anthony.perard@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
[mreitz: Amended commit message as per Paul's suggestions]
Signed-off-by: Max Reitz <mreitz@redhat.com>
From time to time, my shell decides to repace the bracketed numbers here
by the numbers inside (i.e., "=== Clusters to be compressed [1]" is
printed as "=== Clusters to be compressed 1"). That makes tests that
use common.pattern fail. Prevent that from happening by quoting the
arguments to all echos in common.pattern.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200403101134.805871-1-mreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
When issuing a compressed write request the number of bytes must be a
multiple of the cluster size or reach the end of the last cluster.
With the current code such requests are allowed and we hit an
assertion:
$ qemu-img create -f qcow2 img.qcow2 1M
$ qemu-io -c 'write -c 0 32k' img.qcow2
qemu-io: block/qcow2.c:4257: qcow2_co_pwritev_compressed_task:
Assertion `bytes == s->cluster_size || (bytes < s->cluster_size &&
(offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS))' failed.
Aborted
This patch fixes a regression introduced in 0d483dce38
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200406143401.26854-1-berto@igalia.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>