Commit Graph

43 Commits

Author SHA1 Message Date
Kevin Wolf
d8b3afd597 test-bdrv-drain: Test draining job source child and parent
For the block job drain test, don't only test draining the source and
the target node, but create a backing chain for the source
(source_backing <- source <- source_overlay) and test draining each of
the nodes in it.

When using iothreads, the source node (and therefore the job) is in a
different AioContext than the drain, which happens from the main
thread. This way, the main thread waits in AIO_WAIT_WHILE() for the
iothread to make process and aio_wait_kick() is required to notify it.
The test validates that calling bdrv_wakeup() for a child or a parent
node will actually notify AIO_WAIT_WHILE() instead of letting it hang.

Increase the sleep time a bit (to 1 ms) because the test case is racy
and with the shorter sleep, it didn't reproduce the bug it is supposed
to test for me under 'rr record -n'.

This was because bdrv_drain_invoke_entry() (in the main thread) was only
called after the job had already reached the pause point, so we got a
bdrv_dec_in_flight() from the main thread and the additional
aio_wait_kick() when the job becomes idle (that we really wanted to test
here) wasn't even necessary any more to make progress.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
5599c162c3 test-bdrv-drain: Fix outdated comments
Commit 89bd030533 changed the test case from using job_sleep_ns() to
using qemu_co_sleep_ns() instead. Also, block_job_sleep_ns() became
job_sleep_ns() in commit 5d43e86e11.

In both cases, some comments in the test case were not updated. Do that
now.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
d49725af46 test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort
This adds tests for calling AIO_WAIT_WHILE() in the .commit and .abort
callbacks. Both reasons why .abort could be called for a single job are
tested: Either .run or .prepare could return an error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
ecc1a5c790 test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()
This is a regression test for a deadlock that could occur in callbacks
called from the aio_poll() in bdrv_drain_poll_top_level(). The
AioContext lock wasn't released and therefore would be taken a second
time in the callback. This would cause a possible AIO_WAIT_WHILE() in
the callback to hang.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
ae23dde9dd test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback
This is a regression test for a deadlock that occurred in block job
completion callbacks (via job_defer_to_main_loop) because the AioContext
lock was taken twice: once in job_finish_sync() and then again in
job_defer_to_main_loop_bh(). This would cause AIO_WAIT_WHILE() to hang.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2018-09-25 15:50:15 +02:00
Kevin Wolf
f62c172959 test-bdrv-drain: Drain with block jobs in an I/O thread
This extends the existing drain test with a block job to include
variants where the block job runs in a different AioContext.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2018-09-25 15:50:15 +02:00
John Snow
eb23654dbe jobs: utilize job_exit shim
Utilize the job_exit shim by not calling job_defer_to_main_loop, and
where applicable, converting the deferred callback into the job_exit
callback.

This converts backup, stream, create, and the unit tests all at once.
Most of these jobs do not see any changes to the order in which they
clean up their resources, except the test-blockjob-txn test, which
now puts down its bs before job_completed is called.

This is safe for the same reason the reordering in the mirror job is
safe, because job_completed no longer runs under two locks, making
the unref safe even if it causes a flush.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180830015734.19765-7-jsnow@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-31 16:28:33 +02:00
John Snow
3d1f8b07a4 jobs: canonize Error object
Jobs presently use both an Error object in the case of the create job,
and char strings in the case of generic errors elsewhere.

Unify the two paths as just j->err, and remove the extra argument from
job_completed. The integer error code for job_completed is kept for now,
to be removed shortly in a separate patch.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20180830015734.19765-3-jsnow@redhat.com
[mreitz: Dropped a superfluous g_strdup()]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-31 16:28:33 +02:00
John Snow
f67432a201 jobs: change start callback to run callback
Presently we codify the entry point for a job as the "start" callback,
but a more apt name would be "run" to clarify the idea that when this
function returns we consider the job to have "finished," except for
any cleanup which occurs in separate callbacks later.

As part of this clarification, change the signature to include an error
object and a return code. The error ptr is not yet used, and the return
code while captured, will be overwritten by actions in the job_completed
function.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180830015734.19765-2-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-31 16:28:33 +02:00
Marc-André Lureau
7b43db3cd0 tests: fix bdrv-drain leak
Spotted by ASAN:

=================================================================
==5378==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x7f788f83bc48 in malloc (/lib64/libasan.so.5+0xeec48)
    #1 0x7f788c9923c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5)
    #2 0x5622a1fe37bc in coroutine_trampoline /home/elmarco/src/qq/util/coroutine-ucontext.c:116
    #3 0x7f788a15d75f in __correctly_grouped_prefixwc (/lib64/libc.so.6+0x4c75f)

(Broken in commit 4c8158e359d.)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20180809114417.28718-3-marcandre.lureau@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-08-31 16:28:33 +02:00
Kevin Wolf
b994c5bc51 test-bdrv-drain: Test bdrv_append() to drained node
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10 10:36:15 +02:00
Kevin Wolf
19f7a7e574 test-bdrv-drain: Test graph changes in drain_all section
This tests both adding and remove a node between bdrv_drain_all_begin()
and bdrv_drain_all_end(), and enabled the existing detach test for
drain_all.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
57320ca961 test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll
This adds a test case that goes wrong if bdrv_drain_invoke() calls
aio_poll().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
231281ab42 test-bdrv-drain: Graph change through parent callback
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
ebd3183761 test-bdrv-drain: Test node deletion in subtree recursion
If bdrv_do_drained_begin() polls during its subtree recursion, the graph
can change and mess up the bs->children iteration. Test that this
doesn't happen.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Max Reitz
4c8158e359 test-bdrv-drain: Add test for node deletion
This patch adds two bdrv-drain tests for what happens if some BDS goes
away during the drainage.

The basic idea is that you have a parent BDS with some child nodes.
Then, you drain one of the children.  Because of that, the party who
actually owns the parent decides to (A) delete it, or (B) detach all its
children from it -- both while the child is still being drained.

A real-world case where this can happen is the mirror block job, which
may exit if you drain one of its children.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
89bd030533 block: Really pause block jobs on drain
We already requested that block jobs be paused in .bdrv_drained_begin,
but no guarantee was made that the job was actually inactive at the
point where bdrv_drained_begin() returned.

This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
uses it to make bdrv_drain_poll() consider block jobs using the node to
be drained.

For the test case to work as expected, we have to switch from
block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
considered active and must be waited for when draining the node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
6d0252f2f9 tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now
Since we use bdrv_do_drained_begin/end() for bdrv_drain_all_begin/end(),
coroutine context is automatically left with a BH, preventing the
deadlocks that made bdrv_drain_all*() unsafe in coroutine context. Now
that we even removed the old polling code as dead code, it's obvious
that it's compatible now.

Enable the coroutine test cases for bdrv_drain_all().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
79ab8b21dc block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
bdrv_do_drain_begin/end() implement already everything that
bdrv_drain_all_begin/end() need and currently still do manually: Disable
external events, call parent drain callbacks, call block driver
callbacks.

It also does two more things:

The first is incrementing bs->quiesce_counter. bdrv_drain_all() already
stood out in the test case by behaving different from the other drain
variants. Adding this is not only safe, but in fact a bug fix.

The second is calling bdrv_drain_recurse(). We already do that later in
the same function in a loop, so basically doing an early first iteration
doesn't hurt.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
bb67568954 test-bdrv-drain: bdrv_drain() works with cross-AioContext events
As long as nobody keeps the other I/O thread from working, there is no
reason why bdrv_drain() wouldn't work with cross-AioContext events. The
key is that the root request we're waiting for is in the AioContext
we're polling (which it always is for bdrv_drain()) so that aio_poll()
is woken up in the end.

Add a test case that shows that it works. Remove the comment in
bdrv_drain() that claims otherwise.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
1266c9b9f5 job: Add error message for failing jobs
So far we relied on job->ret and strerror() to produce an error message
for failed jobs. Not surprisingly, this tends to result in completely
useless messages.

This adds a Job.error field that can contain an error string for a
failing job, and a parameter to job_completed() that sets the field. As
a default, if NULL is passed, we continue to use strerror(job->ret).

All existing callers are changed to pass NULL. They can be improved in
separate patches.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-30 13:31:01 +02:00
Kevin Wolf
2e1795b581 job: Add job_transition_to_ready()
The transition to the READY state was still performed in the BlockJob
layer, in the same function that sent the BLOCK_JOB_READY QMP event.

This patch brings the state transition to the Job layer and implements
the QMP event using a notifier called from the Job layer, like we
already do for other events related to state transitions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:51 +02:00
Kevin Wolf
3d70ff53b6 job: Move completion and cancellation to Job
This moves the top-level job completion and cancellation functions from
BlockJob to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-23 14:30:51 +02:00
Kevin Wolf
3453d97243 job: Move .complete callback to Job
This moves the .complete callback that tells a READY job to complete
from BlockJobDriver to JobDriver. The wrapper function job_complete()
doesn't require anything block job specific any more and can be moved
to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
b69f777dd9 job: Add job_drain()
block_job_drain() contains a blk_drain() call which cannot be moved to
Job, so add a new JobDriver callback JobDriver.drain which has a common
implementation for all BlockJobs. In addition to this we keep the
existing BlockJobDriver.drain callback that is called by the common
drain implementation for all block jobs.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
b15de82867 job: Move pause/resume functions to Job
While we already moved the state related to job pausing to Job, the
functions to do were still BlockJob only. This commit moves them over to
Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
5d43e86e11 job: Add job_sleep_ns()
There is nothing block layer specific about block_job_sleep_ns(), so
move the function to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
da01ff7f38 job: Move coroutine and related code to Job
This commit moves some core functions for dealing with the job coroutine
from BlockJob to Job. This includes primarily entering the coroutine
(both for the first and reentering) and yielding explicitly and at pause
points.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
1908a5590c job: Move defer_to_main_loop to Job
Move the defer_to_main_loop functionality from BlockJob to Job.

The code can be simplified because we can use job->aio_context in
job_defer_to_main_loop_bh() now, instead of having to access the
BlockDriverState.

Probably taking the data->aio_context lock in addition was already
unnecessary in the old code because we didn't actually make use of
anything protected by the old AioContext except getting the new
AioContext, in case it changed between scheduling the BH and running it.
But it's certainly unnecessary now that the BDS isn't accessed at all
any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
80fa2c756b job: Add reference counting
This moves reference counting from BlockJob to Job.

In order to keep calling the BlockJob cleanup code when the job is
deleted via job_unref(), introduce a new JobDriver.free callback. Every
block job must use block_job_free() for this callback, this is asserted
in block_job_create().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Kevin Wolf
33e9e9bd62 job: Create Job, JobDriver and job_create()
This is the first step towards creating an infrastructure for generic
background jobs that aren't tied to a block device. For now, Job only
stores its ID and JobDriver, the rest stays in BlockJob.

The following patches will move over more parts of BlockJob to Job if
they are meaningful outside the context of a block job.

BlockJob.driver is now redundant, but this patch leaves it around to
avoid unnecessary churn. The next patches will get rid of almost all of
its uses anyway so that it can be removed later with much less churn.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
John Snow
0ec4dfb8d6 blockjobs: add block_job_verb permission table
Which commands ("verbs") are appropriate for jobs in which state is
also somewhat burdensome to keep track of.

As of this commit, it looks rather useless, but begins to look more
interesting the more states we add to the STM table.

A recurring theme is that no verb will apply to an 'undefined' job.

Further, it's not presently possible to restrict the "pause" or "resume"
verbs any more than they are in this commit because of the asynchronous
nature of how jobs enter the PAUSED state; justifications for some
seemingly erroneous applications are given below.

=====
Verbs
=====

Cancel:    Any state except undefined.
Pause:     Any state except undefined;
           'created': Requests that the job pauses as it starts.
           'running': Normal usage. (PAUSED)
           'paused':  The job may be paused for internal reasons,
                      but the user may wish to force an indefinite
                      user-pause, so this is allowed.
           'ready':   Normal usage. (STANDBY)
           'standby': Same logic as above.
Resume:    Any state except undefined;
           'created': Will lift a user's pause-on-start request.
           'running': Will lift a pause request before it takes effect.
           'paused':  Normal usage.
           'ready':   Will lift a pause request before it takes effect.
           'standby': Normal usage.
Set-speed: Any state except undefined, though ready may not be meaningful.
Complete:  Only a 'ready' job may accept a complete request.

=======
Changes
=======

(1)

To facilitate "nice" error checking, all five major block-job verb
interfaces in blockjob.c now support an errp parameter:

- block_job_user_cancel is added as a new interface.
- block_job_user_pause gains an errp paramter
- block_job_user_resume gains an errp parameter
- block_job_set_speed already had an errp parameter.
- block_job_complete already had an errp parameter.

(2)

block-job-pause and block-job-resume will no longer no-op when trying
to pause an already paused job, or trying to resume a job that isn't
paused. These functions will now report that they did not perform the
action requested because it was not possible.

iotests have been adjusted to address this new behavior.

(3)

block-job-complete doesn't worry about checking !block_job_started,
because the permission table guards against this.

(4)

test-bdrv-drain's job implementation needs to announce that it is
'ready' now, in order to be completed.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
John Snow
75859b9420 blockjobs: model single jobs as transactions
model all independent jobs as single job transactions.

It's one less case we have to worry about when we add more states to the
transition machine. This way, we can just treat all job lifetimes exactly
the same. This helps tighten assertions of the STM graph and removes some
conditionals that would have been needed in the coming commits adding a
more explicit job lifetime management API.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
Kevin Wolf
acebcf8de8 test-bdrv-drain: Test graph changes in drained section
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf
27e64474a3 test-bdrv-drain: Recursive draining with multiple parents
Test that drain sections are correctly propagated through the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf
0582eb1006 test-bdrv-drain: Test behaviour in coroutine context
If bdrv_do_drained_begin/end() are called in coroutine context, they
first use a BH to get out of the coroutine context. Call some existing
tests again from a coroutine to cover this code path.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf
d2a85d0f42 test-bdrv-drain: Tests for bdrv_subtree_drain
Add a subtree drain version to the existing test cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf
6c429a6a97 test-bdrv-drain: Test nested drain sections
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf
8119334918 block: Don't block_job_pause_all() in bdrv_drain_all()
Block jobs are already paused using the BdrvChildRole drain callbacks,
so we don't need an additional block_job_pause_all() call.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf
7253220de4 test-bdrv-drain: Test drain vs. block jobs
Block jobs must be paused if any of the involved nodes are drained.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf
89a6ceab46 test-bdrv-drain: Test bs->quiesce_counter
This is currently only working correctly for bdrv_drain(), not for
bdrv_drain_all(). Leave a comment for the drain_all case, we'll address
it later.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf
86e1c840ec test-bdrv-drain: Test callback for bdrv_drain
The existing test is for bdrv_drain_all_begin/end() only. Generalise the
test case so that it can be run for the other variants as well. At the
moment this is only bdrv_drain_begin/end(), but in a while, we'll add
another one.

Also, add a backing file to the test node to test whether the operations
work recursively.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:31 +01:00
Kevin Wolf
881cfd17c7 test-bdrv-drain: Test BlockDriver callbacks for drain
This adds a test case that the BlockDriver callbacks for drain are
called in bdrv_drained_all_begin/end(), and that both of them are called
exactly once.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-12-22 15:03:41 +01:00