Commit 8119334918 ("block: Don't
block_job_pause_all() in bdrv_drain_all()") removed the only callers of
block_job_pause/resume_all().
Pausing and resuming now happens in child_job_drained_begin/end() so
it's no longer necessary to globally pause/resume jobs.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180424085240.5798-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
QAPI generator provide #define helpers for looking up enum string.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180327153011.29569-1-marcandre.lureau@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This fixes leaks found by ASAN such as:
GTESTER tests/test-blockjob
=================================================================
==31442==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7f88483cba38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38)
#1 0x7f8845e1bd77 in g_malloc0 ../glib/gmem.c:129
#2 0x7f8845e1c04b in g_malloc0_n ../glib/gmem.c:360
#3 0x5584d2732498 in block_job_txn_new /home/elmarco/src/qemu/blockjob.c:172
#4 0x5584d2739b28 in block_job_create /home/elmarco/src/qemu/blockjob.c:973
#5 0x5584d270ae31 in mk_job /home/elmarco/src/qemu/tests/test-blockjob.c:34
#6 0x5584d270b1c1 in do_test_id /home/elmarco/src/qemu/tests/test-blockjob.c:57
#7 0x5584d270b65c in test_job_ids /home/elmarco/src/qemu/tests/test-blockjob.c:118
#8 0x7f8845e40b69 in test_case_run ../glib/gtestutils.c:2255
#9 0x7f8845e40f29 in g_test_run_suite_internal ../glib/gtestutils.c:2339
#10 0x7f8845e40fd2 in g_test_run_suite_internal ../glib/gtestutils.c:2351
#11 0x7f8845e411e9 in g_test_run_suite ../glib/gtestutils.c:2426
#12 0x7f8845e3fe72 in g_test_run ../glib/gtestutils.c:1692
#13 0x5584d270d6e2 in main /home/elmarco/src/qemu/tests/test-blockjob.c:377
#14 0x7f8843641f29 in __libc_start_main (/lib64/libc.so.6+0x20f29)
Add an assert to make sure that the job doesn't have associated txn before free().
[Jeff Cody: N.B., used updated patch provided by John Snow]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
When doing drive mirror to a low speed shared storage, if there was heavy
BLK IO write workload in VM after the 'ready' event, drive mirror block job
can't be canceled immediately, it would keep running until the heavy BLK IO
workload stopped in the VM.
Libvirt depends on the current block-job-cancel semantics, which is that
when used without a flag after the 'ready' event, the command blocks
until data is in sync. However, these semantics are awkward in other
situations, for example, people may use drive mirror for realtime
backups while still wanting to use block live migration. Libvirt cannot
start a block live migration while another drive mirror is in progress,
but the user would rather abandon the backup attempt as broken and
proceed with the live migration than be stuck waiting for the current
drive mirror backup to finish.
The drive-mirror command already includes a 'force' flag, which libvirt
does not use, although it documented the flag as only being useful to
quit a job which is paused. However, since quitting a paused job has
the same effect as abandoning a backup in a non-paused job (namely, the
destination file is not in sync, and the command completes immediately),
we can just improve the documentation to make the force flag obviously
useful.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Reported-by: Huaitong Han <huanhuaitong@didichuxing.com>
Signed-off-by: Huaitong Han <huanhuaitong@didichuxing.com>
Signed-off-by: Liang Li <liliangleo@didichuxing.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Expose the "manual" property via QAPI for the backup-related jobs.
As of this commit, this allows the management API to request the
"concluded" and "dismiss" semantics for backup jobs.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of automatically transitioning from PENDING to CONCLUDED, gate
the .prepare() and .commit() phases behind an explicit acknowledgement
provided by the QMP monitor if auto_finalize = false has been requested.
This allows us to perform graph changes in prepare and/or commit so that
graph changes do not occur autonomously without knowledge of the
controlling management layer.
Transactions that have reached the "PENDING" state together can all be
moved to invoke their finalization methods by issuing block_job_finalize
to any one job in the transaction.
Jobs in a transaction with mixed job->auto_finalize settings will all
remain stuck in the "PENDING" state, as if the entire transaction was
specified with auto_finalize = false. Jobs that specified
auto_finalize = true, however, will still not emit the PENDING event.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For jobs utilizing the new manual workflow, we intend to prohibit
them from modifying the block graph until the management layer provides
an explicit ACK via block-job-finalize to move the process forward.
To distinguish this runstate from "ready" or "waiting," we add a new
"pending" event and status.
For now, the transition from PENDING to CONCLUDED/ABORTING is automatic,
but a future commit will add the explicit block-job-finalize step.
Transitions:
Waiting -> Pending: Normal transition.
Pending -> Concluded: Normal transition.
Pending -> Aborting: Late transactional failures and cancellations.
Removed Transitions:
Waiting -> Concluded: Jobs must go to PENDING first.
Verbs:
Cancel: Can be applied to a pending job.
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
+---------+CREATED+-----------------+
| +--+----+ |
| | |
| +--+----+ +------+ |
+---------+RUNNING<----->PAUSED| |
| +--+-+--+ +------+ |
| | | |
| | +------------------+ |
| | | |
| +--v--+ +-------+ | |
+---------+READY<------->STANDBY| | |
| +--+--+ +-------+ | |
| | | |
| +--v----+ | |
+---------+WAITING<---------------+ |
| +--+----+ |
| | |
| +--v----+ |
+---------+PENDING| |
| +--+----+ |
| | |
+--v-----+ +--v------+ |
|ABORTING+--->CONCLUDED| |
+--------+ +--+------+ |
| |
+--v-+ |
|NULL<--------------------+
+----+
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For jobs that are stuck waiting on others in a transaction, it would
be nice to know that they are no longer "running" in that sense, but
instead are waiting on other jobs in the transaction.
Jobs that are "waiting" in this sense cannot be meaningfully altered
any longer as they have left their running loop. The only meaningful
user verb for jobs in this state is "cancel," which will cancel the
whole transaction, too.
Transitions:
Running -> Waiting: Normal transition.
Ready -> Waiting: Normal transition.
Waiting -> Aborting: Transactional cancellation.
Waiting -> Concluded: Normal transition.
Removed Transitions:
Running -> Concluded: Jobs must go to WAITING first.
Ready -> Concluded: Jobs must go to WAITING first.
Verbs:
Cancel: Can be applied to WAITING jobs.
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
+---------+CREATED+-----------------+
| +--+----+ |
| | |
| +--v----+ +------+ |
+---------+RUNNING<----->PAUSED| |
| +--+-+--+ +------+ |
| | | |
| | +------------------+ |
| | | |
| +--v--+ +-------+ | |
+---------+READY<------->STANDBY| | |
| +--+--+ +-------+ | |
| | | |
| +--v----+ | |
+---------+WAITING<---------------+ |
| +--+----+ |
| | |
+--v-----+ +--v------+ |
|ABORTING+--->CONCLUDED| |
+--------+ +--+------+ |
| |
+--v-+ |
|NULL<--------------------+
+----+
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some jobs upon finalization may need to perform some work that can
still fail. If these jobs are part of a transaction, it's important
that these callbacks fail the entire transaction.
We allow for a new callback in addition to commit/abort/clean that
allows us the opportunity to have fairly late-breaking failures
in the transactional process.
The expected flow is:
- All jobs in a transaction converge to the PENDING state,
added in a forthcoming commit.
- Upon being finalized, either automatically or explicitly
by the user, jobs prepare to complete.
- If any job fails preparation, all jobs call .abort.
- Otherwise, they succeed and call .commit.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Simply apply a function transaction-wide.
A few more uses of this in forthcoming patches.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The completed_single function is getting a little mucked up with
checking to see which callbacks exist, so let's factor them out.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Presently, even if a job is canceled post-completion as a result of
a failing peer in a transaction, it will still call .commit because
nothing has updated or changed its return code.
The reason why this does not cause problems currently is because
backup's implementation of .commit checks for cancellation itself.
I'd like to simplify this contract:
(1) Abort is called if the job/transaction fails
(2) Commit is called if the job/transaction succeeds
To this end: A job's return code, if 0, will be forcibly set as
-ECANCELED if that job has already concluded. Remove the now
redundant check in the backup job implementation.
We need to check for cancellation in both block_job_completed
AND block_job_completed_single, because jobs may be cancelled between
those two calls; for instance in transactions. This also necessitates
an ABORTING -> ABORTING transition to be allowed.
The check in block_job_completed could be removed, but there's no
point in starting to attempt to succeed a transaction that we know
in advance will fail.
This does NOT affect mirror jobs that are "canceled" during their
synchronous phase. The mirror job itself forcibly sets the canceled
property to false prior to ceding control, so such cases will invoke
the "commit" callback.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For jobs that have reached their CONCLUDED state, prior to having their
last reference put down (meaning jobs that have completed successfully,
unsuccessfully, or have been canceled), allow the user to dismiss the
job's lingering status report via block-job-dismiss.
This gives management APIs the chance to conclusively determine if a job
failed or succeeded, even if the event broadcast was missed.
Note: block_job_do_dismiss and block_job_decommission happen to do
exactly the same thing, but they're called from different semantic
contexts, so both aliases are kept to improve readability.
Note 2: Don't worry about the 0x04 flag definition for AUTO_DISMISS, she
has a friend coming in a future patch to fill the hole where 0x02 is.
Verbs:
Dismiss: operates on CONCLUDED jobs only.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new state that specifically demarcates when we begin to permanently
demolish a job after it has performed all work. This makes the transition
explicit in the STM table and highlights conditions under which a job may
be demolished.
Alongside this state, add a new helper command "block_job_decommission",
which transitions to the NULL state and puts down our implicit reference.
This separates instances in the code for "block_job_unref" which merely
undo a matching "block_job_ref" with instances intended to initiate the
full destruction of the object.
This decommission action also sets a number of fields to make sure that
block internals or external users that are holding a reference to a job
to see when it "finishes" are convinced that the job object is "done."
This is necessary, for instance, to do a block_job_cancel_sync on a
created object which will not make any progress.
Now, all jobs must go through block_job_decommission prior to being
freed, giving us start-to-finish state machine coverage for jobs.
Transitions:
Created -> Null: Early failure event before the job is started
Concluded -> Null: Standard transition.
Verbs:
None. This should not ever be visible to the monitor.
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
+---------+CREATED+------------------+
| +--+----+ |
| | |
| +--v----+ +------+ |
+---------+RUNNING<----->PAUSED| |
| +--+-+--+ +------+ |
| | | |
| | +------------------+ |
| | | |
| +--v--+ +-------+ | |
+---------+READY<------->STANDBY| | |
| +--+--+ +-------+ | |
| | | |
+--v-----+ +--v------+ | |
|ABORTING+--->CONCLUDED<-------------+ |
+--------+ +--+------+ |
| |
+--v-+ |
|NULL<---------------------+
+----+
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
add a new state "CONCLUDED" that identifies a job that has ceased all
operations. The wording was chosen to avoid any phrasing that might
imply success, error, or cancellation. The task has simply ceased all
operation and can never again perform any work.
("finished", "done", and "completed" might all imply success.)
Transitions:
Running -> Concluded: normal completion
Ready -> Concluded: normal completion
Aborting -> Concluded: error and cancellations
Verbs:
None as of this commit. (a future commit adds 'dismiss')
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
+---------+CREATED|
| +--+----+
| |
| +--v----+ +------+
+---------+RUNNING<----->PAUSED|
| +--+-+--+ +------+
| | |
| | +------------------+
| | |
| +--v--+ +-------+ |
+---------+READY<------->STANDBY| |
| +--+--+ +-------+ |
| | |
+--v-----+ +--v------+ |
|ABORTING+--->CONCLUDED<-------------+
+--------+ +---------+
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new state ABORTING.
This makes transitions from normative states to error states explicit
in the STM, and serves as a disambiguation for which states may complete
normally when normal end-states (CONCLUDED) are added in future commits.
Notably, Paused/Standby jobs do not transition directly to aborting,
as they must wake up first and cooperate in their cancellation.
Transitions:
Created -> Aborting: can be cancelled (by the system)
Running -> Aborting: can be cancelled or encounter an error
Ready -> Aborting: can be cancelled or encounter an error
Verbs:
None. The job must finish cleaning itself up and report its final status.
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
+---------+CREATED|
| +--+----+
| |
| +--v----+ +------+
+---------+RUNNING<----->PAUSED|
| +--+----+ +------+
| |
| +--v--+ +-------+
+---------+READY<------->STANDBY|
| +-----+ +-------+
|
+--v-----+
|ABORTING|
+--------+
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Which commands ("verbs") are appropriate for jobs in which state is
also somewhat burdensome to keep track of.
As of this commit, it looks rather useless, but begins to look more
interesting the more states we add to the STM table.
A recurring theme is that no verb will apply to an 'undefined' job.
Further, it's not presently possible to restrict the "pause" or "resume"
verbs any more than they are in this commit because of the asynchronous
nature of how jobs enter the PAUSED state; justifications for some
seemingly erroneous applications are given below.
=====
Verbs
=====
Cancel: Any state except undefined.
Pause: Any state except undefined;
'created': Requests that the job pauses as it starts.
'running': Normal usage. (PAUSED)
'paused': The job may be paused for internal reasons,
but the user may wish to force an indefinite
user-pause, so this is allowed.
'ready': Normal usage. (STANDBY)
'standby': Same logic as above.
Resume: Any state except undefined;
'created': Will lift a user's pause-on-start request.
'running': Will lift a pause request before it takes effect.
'paused': Normal usage.
'ready': Will lift a pause request before it takes effect.
'standby': Normal usage.
Set-speed: Any state except undefined, though ready may not be meaningful.
Complete: Only a 'ready' job may accept a complete request.
=======
Changes
=======
(1)
To facilitate "nice" error checking, all five major block-job verb
interfaces in blockjob.c now support an errp parameter:
- block_job_user_cancel is added as a new interface.
- block_job_user_pause gains an errp paramter
- block_job_user_resume gains an errp parameter
- block_job_set_speed already had an errp parameter.
- block_job_complete already had an errp parameter.
(2)
block-job-pause and block-job-resume will no longer no-op when trying
to pause an already paused job, or trying to resume a job that isn't
paused. These functions will now report that they did not perform the
action requested because it was not possible.
iotests have been adjusted to address this new behavior.
(3)
block-job-complete doesn't worry about checking !block_job_started,
because the permission table guards against this.
(4)
test-bdrv-drain's job implementation needs to announce that it is
'ready' now, in order to be completed.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The state transition table has mostly been implied. We're about to make
it a bit more complex, so let's make the STM explicit instead.
Perform state transitions with a function that for now just asserts the
transition is appropriate.
Transitions:
Undefined -> Created: During job initialization.
Created -> Running: Once the job is started.
Jobs cannot transition from "Created" to "Paused"
directly, but will instead synchronously transition
to running to paused immediately.
Running -> Paused: Normal workflow for pauses.
Running -> Ready: Normal workflow for jobs reaching their sync point.
(e.g. mirror)
Ready -> Standby: Normal workflow for pausing ready jobs.
Paused -> Running: Normal resume.
Standby -> Ready: Resume of a Standby job.
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
|CREATED|
+--+----+
|
+--v----+ +------+
|RUNNING<----->PAUSED|
+--+----+ +------+
|
+--v--+ +-------+
|READY<------->STANDBY|
+-----+ +-------+
Notably, there is no state presently defined as of this commit that
deals with a job after the "running" or "ready" states, so this table
will be adjusted alongside the commits that introduce those states.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We're about to add several new states, and booleans are becoming
unwieldly and difficult to reason about. It would help to have a
more explicit bookkeeping of the state of blockjobs. To this end,
add a new "status" field and add our existing states in a redundant
manner alongside the bools they are replacing:
UNDEFINED: Placeholder, default state. Not currently visible to QMP
unless changes occur in the future to allow creating jobs
without starting them via QMP.
CREATED: replaces !!job->co && paused && !busy
RUNNING: replaces effectively (!paused && busy)
PAUSED: Nearly redundant with info->paused, which shows pause_count.
This reports the actual status of the job, which almost always
matches the paused request status. It differs in that it is
strictly only true when the job has actually gone dormant.
READY: replaces job->ready.
STANDBY: Paused, but job->ready is true.
New state additions in coming commits will not be quite so redundant:
WAITING: Waiting on transaction. This job has finished all the work
it can until the transaction converges, fails, or is canceled.
PENDING: Pending authorization from user. This job has finished all the
work it can until the job or transaction is finalized via
block_job_finalize. This implies the transaction has converged
and left the WAITING phase.
ABORTING: Job has encountered an error condition and is in the process
of aborting.
CONCLUDED: Job has ceased all operations and has a return code available
for query and may be dismissed via block_job_dismiss.
NULL: Job has been dismissed and (should) be destroyed. Should never
be visible to QMP.
Some of these states appear somewhat superfluous, but it helps define the
expected flow of a job; so some of the states wind up being synchronous
empty transitions. Importantly, jobs can be in only one of these states
at any given time, which helps code and external users alike reason about
the current condition of a job unambiguously.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
model all independent jobs as single job transactions.
It's one less case we have to worry about when we add more states to the
transition machine. This way, we can just treat all job lifetimes exactly
the same. This helps tighten assertions of the STM graph and removes some
conditionals that would have been needed in the coming commits adding a
more explicit job lifetime management API.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If speed is '0' it's not actually "less than" the previous speed.
Kick the job in this case too.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.
The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h. Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.
To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects. The next commit will
improve it further.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-19-armbru@redhat.com>
This cleanup makes the number of objects depending on qapi/error.h
drop from 1910 (out of 4743) to 1612 in my "build everything" tree.
While there, separate #include from file comment with a blank line,
and drop a useless comment on why qemu/osdep.h is included first.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-5-armbru@redhat.com>
[Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
Block jobs already paused themselves when their main BlockBackend
entered a drained section. This is not good enough: We also want to
pause a block job and may not submit new requests if, for example, the
mirror target node should be drained.
This implements .drained_begin/end callbacks in child_job in order to
consider all block nodes related to the job, and removes the
BlockBackend callbacks which are unnecessary now because the root of the
job main BlockBackend is always referenced with a child_job, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If users set an unreasonably low speed (like one byte per second), the
calculated delay may exceed many hours. While we like to punish users
for asking for stupid things, we do also like to allow users to correct
their wicked ways.
When a user provides a new speed, kick the job to allow it to recalculate
its delay.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20171213204611.26276-1-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Starting from commit 40840e419b we are
pausing all block jobs during bdrv_reopen_multiple() to prevent any of
them from finishing and removing nodes from the graph while they are
being reopened.
It turns out that pausing a block job doesn't necessarily prevent it
from finishing: a paused block job can still run its exit function
from the main loop and call block_job_completed(). The mirror block
job in particular always goes to the main loop while it is paused (by
virtue of the bdrv_drained_begin() call in mirror_run()).
Destroying a paused block job during bdrv_reopen_multiple() has two
consequences:
1) The references to the nodes involved in the job are released,
possibly destroying some of them. If those nodes were in the
reopen queue this would trigger the problem originally described
in commit 40840e419b, crashing QEMU.
2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
not be doing all necessary bdrv_parent_drained_end() calls.
I can reproduce problem 1) easily with iotest 030 by increasing
STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking
the iotest like in this example:
https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html
This patch keeps an additional reference to all block jobs between
block_job_pause_all() and block_job_resume_all(), guaranteeing that
they are kept alive.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
coroutine double entry or entry-after-completion", 2017-11-21)
This fixed the symptom of a bug rather than the root cause. Canceling the
wait on a sleeping blockjob coroutine is generally fine, we just need to
make it work correctly across AioContexts. To do so, use a QEMUTimer
that calls block_job_enter. Use a mutex to ensure that block_job_enter
synchronizes correctly with block_job_sleep_ns.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-By: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Hide the clearing of job->busy in a single function, and set it
in block_job_enter. This lets block_job_do_yield verify that
qemu_coroutine_enter is not used while job->busy = false.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-By: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to
support more than one clock when block_job_sleep_ns switches to a single
timer stored in the BlockJob struct.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Tested-By: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When destroying a block job in block_job_unref() we should remove it
from the job list before calling block_job_remove_all_bdrv().
This is because removing the BDSs can trigger an aio_poll() and wake
up other jobs that might attempt to use the block job list. If that
happens the job we're currently destroying should not be in that list
anymore.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When block_job_sleep_ns() is called, the co-routine is scheduled for
future execution. If we allow the job to be re-entered prior to the
scheduled time, we present a race condition in which a coroutine can be
entered recursively, or even entered after the coroutine is deleted.
The job->busy flag is used by blockjobs when a coroutine is busy
executing. The function 'block_job_enter()' obeys the busy flag,
and will not enter a coroutine if set. If we sleep a job, we need to
leave the busy flag set, so that subsequent calls to block_job_enter()
are prevented.
This changes the prior behavior of block_job_cancel() being able to
immediately wake up and cancel a job; in practice, this should not be an
issue, as the coroutine sleep times are generally very small, and the
cancel will occur the next time the coroutine wakes up.
This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
img_commit could fall into an infinite loop calling run_block_job() if
its blockjob fails on any I/O error, fix this already known problem.
Signed-off-by: sochin.jiang <sochin.jiang@huawei.com>
Message-id: 1497509253-28941-1-git-send-email-sochin.jiang@huawei.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
All block jobs are using block_job_defer_to_main_loop as the final
step just before the coroutine terminates. At this point,
block_job_enter should do nothing, but currently it restarts
the freed coroutine.
Now, the job->co states should probably be changed to an enum
(e.g. BEFORE_START, STARTED, YIELDED, COMPLETED) subsuming
block_job_started, job->deferred_to_main_loop and job->busy.
For now, this patch eliminates the problematic reenter by
removing the reset of job->deferred_to_main_loop (which served
no purpose, as far as I could see) and checking the flag in
block_job_enter.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-12-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This splits the part that touches job states from the part that invokes
callbacks. It will make the code simpler to understand once job states will
be protected by a different mutex than the AioContext lock.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-11-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Yet another pure code movement patch, preparing for the next change.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-9-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
The new functions helps respecting the invariant that the coroutine
is entered with false user_resume, zero pause count and no error
recorded in the iostatus.
Resetting the iostatus is now common to all of block_job_cancel_async,
block_job_user_resume and block_job_iostatus_reset, albeit with slight
differences:
- block_job_cancel_async resets the iostatus, and resumes the job if
there was an error, but the coroutine is not restarted immediately.
For example the caller may continue with a call to block_job_finish_sync.
- block_job_user_resume resets the iostatus. It wants to resume the job
unconditionally, even if there was no error.
- block_job_iostatus_reset doesn't resume the job at all. Maybe that's
a bug but it should be fixed separately.
block_job_iostatus_reset does the least common denominator, so add some
checking but otherwise leave it as the entry point for resetting the
iostatus.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-8-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Outside blockjob.c, the block_job_iostatus_reset function is used once
in the monitor and once in BlockBackend. When we introduce the block
job mutex, block_job_iostatus_reset's client is going to be the block
layer (for which blockjob.c will take the block job mutex) rather than
the monitor (which will take the block job mutex by itself).
The monitor's call to block_job_iostatus_reset from the monitor comes
just before the sole call to block_job_user_resume, so reset the
iostatus directly from block_job_iostatus_reset. This will avoid
the need to introduce separate block_job_iostatus_reset and
block_job_iostatus_reset_locked APIs.
After making this change, move the function together with the others
that were moved in the previous patch.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170508141310.8674-7-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
We have two different headers for block job operations, blockjob.h
and blockjob_int.h. The former contains APIs called by the monitor,
the latter contains APIs called by the block job drivers and the
block layer itself.
Keep the two APIs separate in the blockjob.c file too. This will
be useful when transitioning away from the AioContext lock, because
there will be locking policies for the two categories, too---the
monitor will have to call new block_job_lock/unlock APIs, while blockjob
APIs will take care of this for the users.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-6-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Remove use of block_job_pause/resume from outside blockjob.c, thus
making them static. The new functions are used by the block layer,
so place them in blockjob_int.h.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170508141310.8674-5-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Outside blockjob.c, block_job_unref is only used when a block job fails
to start, and block_job_ref is not used at all. The reference counting
thus is pretty well hidden. Introduce a separate function to be used
by block jobs; because block_job_ref and block_job_unref now become
static, move them earlier in blockjob.c.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170508141310.8674-4-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This is unused since commit 66a0fae ("blockjob: Don't touch BDS iostatus",
2016-05-19).
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170508141310.8674-3-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
!job is always checked prior to the call, drop it from here.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170508141310.8674-2-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Resuming and especially starting of the block job coroutine, could be issued in
the main thread. However the coroutine's "home" ctx should be set to the same
context as job->blk. Use bdrv_coroutine_enter to ensure that.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
This lets us hook into drained_begin and drained_end requests from the
backend level, which is particularly useful for making sure that all
jobs associated with a particular node (whether the source or the target)
receive a drain request.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170316212351.13797-4-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
The purpose of this shim is to allow us to pause pre-started jobs.
The purpose of *that* is to allow us to buffer a pause request that
will be able to take effect before the job ever does any work, allowing
us to create jobs during a quiescent state (under which they will be
automatically paused), then resuming the jobs after the critical section
in any order, either:
(1) -block_job_start
-block_job_resume (via e.g. drained_end)
(2) -block_job_resume (via e.g. drained_end)
-block_job_start
The problem that requires a startup wrapper is the idea that a job must
start in the busy=true state only its first time-- all subsequent entries
require busy to be false, and the toggling of this state is otherwise
handled during existing pause and yield points.
The wrapper simply allows us to mandate that a job can "start," set busy
to true, then immediately pause only if necessary. We could avoid
requiring a wrapper, but all jobs would need to do it, so it's been
factored out here.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170316212351.13797-2-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Streaming or any other block job hangs when performed on a block device
that has a non-default iothread. This happens because the AioContext
is acquired twice by block_job_defer_to_main_loop_bh and then released
only once by BDRV_POLL_WHILE. (Insert rants on recursive mutexes, which
unfortunately are a temporary but necessary evil for iothreads at the
moment).
Luckily, the reason for the double acquisition is simple; the function
acquires the AioContext for both the job iothread and the BDS iothread,
in case the BDS iothread was changed while the job was running. It
is therefore enough to skip the second acquisition when the two
AioContexts are one and the same.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490118490-5597-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
In some cases, we want to remove op blockers on intermediate nodes
before the whole block job transaction has completed (because they block
restoring the final graph state during completion). Provide a function
for this.
The whole block job lifecycle is a bit messed up and it's hard to
actually do all things in the right order, but I'll leave simplifying
this for another day.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>