mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Vladimir Sementsov-Ogievskiy	8d9648cbf3	blockjob: fix user pause in block_job_error_action Job (especially mirror) may call block_job_error_action several times before actual pause if it has several in-flight requests. block_job_error_action will call job_pause more than once in this case, which lead to following block-job-resume qmp command can't actually resume the job. Fix it by do not increase pause level in block_job_error_action if user_paused already set. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-19 15:49:29 +01:00
Kevin Wolf	cfe29d8294	block: Use a single global AioWait When draining a block node, we recurse to its parent and for subtree drains also to its children. A single AIO_WAIT_WHILE() is then used to wait for bdrv_drain_poll() to become true, which depends on all of the nodes we recursed to. However, if the respective child or parent becomes quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent is checked, while AIO_WAIT_WHILE() depends on the AioWait of the original node. Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE(). This may mean that the draining thread gets a few more unnecessary wakeups because an unrelated operation got completed, but we already wake it up when something _could_ have changed rather than only if it has certainly changed. Apart from that, drain is a slow path anyway. In theory it would be possible to use wakeups more selectively and still correctly, but the gains are likely not worth the additional complexity. In fact, this patch is a nice simplification for some places in the code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-09-25 15:50:15 +02:00
Kevin Wolf	b5a7a05735	blockjob: Lie better in child_job_drained_poll() Block jobs claim in .drained_poll() that they are in a quiescent state as soon as job->deferred_to_main_loop is true. This is obviously wrong, they still have a completion BH to run. We only get away with this because commit `91af091f92` added an unconditional aio_poll(false) to the drain functions, but this is bypassing the regular drain mechanisms. However, just removing this and telling that the job is still active doesn't work either: The completion callbacks themselves call drain functions (directly, or indirectly with bdrv_reopen), so they would deadlock then. As a better lie, tell that the job is active as long as the BH is pending, but falsely call it quiescent from the point in the BH when the completion callback is called. At this point, nested drain calls won't deadlock because they ignore the job, and outer drains will wait for the job to really reach a quiescent state because the callback is already running. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-09-25 15:50:15 +02:00
Kevin Wolf	34dc97b9a0	blockjob: Wake up BDS when job becomes idle In the context of draining a BDS, the .drained_poll callback of block jobs is called. If this returns true (i.e. there is still some activity pending), the drain operation may call aio_poll() with blocking=true to wait for completion. As soon as the pending activity is completed and the job finally arrives in a quiescent state (i.e. its coroutine either yields with busy=false or terminates), the block job must notify the aio_poll() loop to wake up, otherwise we get a deadlock if both are running in different threads. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-09-25 15:50:15 +02:00
Peter Xu	3ab72385b2	qapi: Drop qapi_event_send_FOO()'s Error argument The generated qapi_event_send_FOO() take an Error argument. They can't actually fail, because all they do with the argument is passing it to functions that can't fail: the QObject output visitor, and the @qmp_emit callback, which is either monitor_qapi_event_queue() or event_test_emit(). Drop the argument, and pass &error_abort to the QObject output visitor and @qmp_emit instead. Suggested-by: Eric Blake <eblake@redhat.com> Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180815133747.25032-4-peterx@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Commit message rewritten, update to qapi-code-gen.txt corrected] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-08-28 18:21:38 +02:00
Kevin Wolf	89bd030533	block: Really pause block jobs on drain We already requested that block jobs be paused in .bdrv_drained_begin, but no guarantee was made that the job was actually inactive at the point where bdrv_drained_begin() returned. This introduces a new callback BdrvChildRole.bdrv_drained_poll() and uses it to make bdrv_drain_poll() consider block jobs using the node to be drained. For the test case to work as expected, we have to switch from block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even considered active and must be waited for when draining the node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	9f6bb4c004	blockjob: Remove BlockJob.driver BlockJob.driver is redundant with Job.driver and only used in very few places any more. Remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	30a5c887bf	job: Move progress fields to Job BlockJob has fields .offset and .len, which are actually misnomers today because they are no longer tied to block device sizes, but just progress counters. As such they make a lot of sense in generic Jobs. This patch moves the fields to Job and renames them to .progress_current and .progress_total to describe their function better. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	2e1795b581	job: Add job_transition_to_ready() The transition to the READY state was still performed in the BlockJob layer, in the same function that sent the BLOCK_JOB_READY QMP event. This patch brings the state transition to the Job layer and implements the QMP event using a notifier called from the Job layer, like we already do for other events related to state transitions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	df956ae201	job: Add job_is_ready() Instead of having a 'bool ready' in BlockJob, add a function that derives its value from the job status. At the same time, this fixes the behaviour to match what the QAPI documentation promises for query-block-job: 'true if the job may be completed'. When the ready flag was introduced in commit `ef6dbf1e46`, the flag never had to be reset to match the description because after being ready, the jobs would immediately complete and disappear. Job transactions and manual job finalisation were introduced only later. With these changes, jobs may stay around even after having completed (and they are not ready to be completed a second time), however their patches forgot to reset the ready flag. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	5f9a6a08e8	job: Add job_dismiss() This moves block_job_dismiss() to the Job layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	198c49cc8d	job: Add job_yield() This moves block_job_yield() to the Job layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	3d70ff53b6	job: Move completion and cancellation to Job This moves the top-level job completion and cancellation functions from BlockJob to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	7eaa8fb57d	job: Move transactions to Job This moves the logic that implements job transactions from BlockJob to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	62c9e4162a	job: Switch transactions to JobTxn This doesn't actually move any transaction code to Job yet, but it renames the type for transactions from BlockJobTxn to JobTxn and makes them contain Jobs rather than BlockJobs Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	6a74c075ac	job: Move job_finish_sync() to Job block_job_finish_sync() doesn't contain anything block job specific any more, so it can be moved to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	3453d97243	job: Move .complete callback to Job This moves the .complete callback that tells a READY job to complete from BlockJobDriver to JobDriver. The wrapper function job_complete() doesn't require anything block job specific any more and can be moved to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	b69f777dd9	job: Add job_drain() block_job_drain() contains a blk_drain() call which cannot be moved to Job, so add a new JobDriver callback JobDriver.drain which has a common implementation for all BlockJobs. In addition to this we keep the existing BlockJobDriver.drain callback that is called by the common drain implementation for all block jobs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	004e95df98	job: Convert block_job_cancel_async() to Job block_job_cancel_async() did two things that were still block job specific: * Setting job->force. This field makes sense on the Job level, so we can just move it. While at it, rename it to job->force_cancel to make its purpose more obvious. * Resetting the I/O status. This can't be moved because generic Jobs don't have an I/O status. What the function really implements is a user resume, except without entering the coroutine. Consequently, it makes sense to call the .user_resume driver callback here which already resets the I/O status. The old block_job_cancel_async() has two separate if statements that check job->iostatus != BLOCK_DEVICE_IO_STATUS_OK and job->user_paused. However, the former condition always implies the latter (as is asserted in block_job_iostatus_reset()), so changing the explicit call of block_job_iostatus_reset() on the former condition with the .user_resume callback on the latter condition is equivalent and doesn't need to access any BlockJob specific state. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	4ad351819b	job: Move single job finalisation to Job This moves the finalisation of a single job from BlockJob to Job. Some part of this code depends on job transactions, and job transactions call this code, we introduce some temporary calls from Job functions to BlockJob ones. This will be fixed once transactions move to Job, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	139a9f020d	job: Add job_event_*() Go through the Job layer in order to send QMP events. For the moment, these functions only call a notifier in the BlockJob layer that sends the existing commands. This uses notifiers rather than JobDriver callbacks because internal users of jobs won't receive QMP events, but might still be interested in getting notified for the events. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	5d4f376998	blockjob: Split block_job_event_pending() block_job_event_pending() doesn't only send a QMP event, but it also transitions to the PENDING state. Split the function so that we get one part only sending the event (like other block_job_event_* functions) and another part that does the state transition. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	bb02b65c7d	job: Move BlockJobCreateFlags to Job This renames the BlockJobCreateFlags constants, moves a few JOB_INTERNAL checks to job_create() and the auto_{finalize,dismiss} fields from BlockJob to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	dbe5e6c1f7	job: Replace BlockJob.completed with job_is_completed() Since we introduced an explicit status to block job, BlockJob.completed is redundant because it can be derived from the status. Remove the field from BlockJob and add a function to derive it from the status at the Job level. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	b15de82867	job: Move pause/resume functions to Job While we already moved the state related to job pausing to Job, the functions to do were still BlockJob only. This commit moves them over to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	5d43e86e11	job: Add job_sleep_ns() There is nothing block layer specific about block_job_sleep_ns(), so move the function to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	da01ff7f38	job: Move coroutine and related code to Job This commit moves some core functions for dealing with the job coroutine from BlockJob to Job. This includes primarily entering the coroutine (both for the first and reentering) and yielding explicitly and at pause points. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	1908a5590c	job: Move defer_to_main_loop to Job Move the defer_to_main_loop functionality from BlockJob to Job. The code can be simplified because we can use job->aio_context in job_defer_to_main_loop_bh() now, instead of having to access the BlockDriverState. Probably taking the data->aio_context lock in addition was already unnecessary in the old code because we didn't actually make use of anything protected by the old AioContext except getting the new AioContext, in case it changed between scheduling the BH and running it. But it's certainly unnecessary now that the BDS isn't accessed at all any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	08be6fe26f	job: Add Job.aio_context When block jobs need an AioContext, they just take it from their main block node. Generic jobs don't have a main block node, so we need to assign them an AioContext explicitly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	daa7f2f946	job: Move cancelled to Job We cannot yet move the whole logic around job cancelling to Job because it depends on quite a few other things that are still only in BlockJob, but we can move the cancelled field at least. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	80fa2c756b	job: Add reference counting This moves reference counting from BlockJob to Job. In order to keep calling the BlockJob cleanup code when the job is deleted via job_unref(), introduce a new JobDriver.free callback. Every block job must use block_job_free() for this callback, this is asserted in block_job_create(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	a50c2ab858	job: Move state transitions to Job This moves BlockJob.status and the closely related functions (block_)job_state_transition() and (block_)job_apply_verb to Job. The two QAPI enums are renamed to JobStatus and JobVerb. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	e7c1d78bbd	job: Maintain a list of all jobs This moves the job list from BlockJob to Job. Now we can check for duplicate IDs in job_create(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	fd61a701f1	job: Add job_delete() This moves freeing the Job object and its fields from block_job_unref() to job_delete(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	252291eaea	job: Add JobDriver.job_type This moves the job_type field from BlockJobDriver to JobDriver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	8e4c87000f	job: Rename BlockJobType into JobType QAPI types aren't externally visible, so we can rename them without causing problems. Before we add a job type to Job, rename the enum so it can be used for more than just block jobs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	33e9e9bd62	job: Create Job, JobDriver and job_create() This is the first step towards creating an infrastructure for generic background jobs that aren't tied to a block device. For now, Job only stores its ID and JobDriver, the rest stays in BlockJob. The following patches will move over more parts of BlockJob to Job if they are meaningful outside the context of a block job. BlockJob.driver is now redundant, but this patch leaves it around to avoid unnecessary churn. The next patches will get rid of almost all of its uses anyway so that it can be removed later with much less churn. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Stefan Hajnoczi	4c7e813ce9	blockjob: do not cancel timer in resume Currently the timer is cancelled and the block job is entered by block_job_resume(). This behavior causes drain to run extra blockjob iterations when the job was sleeping due to the ratelimit. This patch leaves the job asleep when block_job_resume() is called. Jobs can still be forcibly woken up using block_job_enter(), which is used to cancel jobs. After this patch drain no longer runs extra blockjob iterations. This is the expected behavior that qemu-iotests 185 used to rely on. We temporarily changed the 185 test output to make it pass for the QEMU 2.12 release but now it's time to address this issue. Cc: QingFeng Hao <haoqf@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Message-id: 20180508135436.30140-3-stefanha@redhat.com Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-05-16 13:37:33 -04:00
Kevin Wolf	bd21935b50	blockjob: Add block_job_driver() The backup block job directly accesses the driver field in BlockJob. Add a wrapper for getting it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:50 +02:00
Kevin Wolf	dee81d5111	blockjob: Introduce block_job_ratelimit_get_delay() This gets us rid of more direct accesses to BlockJob fields from the job drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:50 +02:00
Kevin Wolf	18bb69287e	blockjob: Implement block_job_set_speed() centrally All block job drivers support .set_speed and all of them duplicate the same code to implement it. Move that code to blockjob.c and remove the now useless callback. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:50 +02:00
Kevin Wolf	05df8a6a2b	blockjob: Wrappers for progress counter access Block job drivers are not expected to mess with the internals of the BlockJob object, so provide wrapper functions for one of the cases where they still do it: Updating the progress counter. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:49 +02:00
Kevin Wolf	37aa19b63c	blockjob: Fix assertion in block_job_finalize() Every job gets a non-NULL job->txn on creation, but it doesn't necessarily keep it until it is decommissioned: Finalising a job removes it from its transaction. Therefore, calling 'blockdev-job-finalize' a second time on an already concluded job causes an assertion failure. Remove job->txn from the assertion in block_job_finalize() to fix this. block_job_do_finalize() still has the same assertion, but if a job is already removed from its transaction, block_job_apply_verb() will already error out before we run into that assertion. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:49 +02:00
John Snow	ab9ba61455	blockjob: expose error string via query When we've reached the concluded state, we need to expose the error state if applicable. Add the new field. This should be sufficient for determining if a job completed successfully or not after concluding; if we want to discriminate based on how it failed more mechanically, we can always add an explicit return code enumeration later. I didn't bother to make it only show up if we are in the concluded state; I don't think it's necessary. Cc: qemu-stable@nongnu.org Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-15 16:11:41 +02:00
Stefan Hajnoczi	23d702d898	blockjob: drop block_job_pause/resume_all() Commit `8119334918` ("block: Don't block_job_pause_all() in bdrv_drain_all()") removed the only callers of block_job_pause/resume_all(). Pausing and resuming now happens in child_job_drained_begin/end() so it's no longer necessary to globally pause/resume jobs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180424085240.5798-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-05-10 10:41:12 +01:00
Marc-André Lureau	604343ced7	blockjob: use qapi enum helpers QAPI generator provide #define helpers for looking up enum string. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180327153011.29569-1-marcandre.lureau@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-04-03 09:56:55 -04:00
Marc-André Lureau	a865cebb82	blockjob: leak fix, remove from txn when failing early This fixes leaks found by ASAN such as: GTESTER tests/test-blockjob ================================================================= ==31442==ERROR: LeakSanitizer: detected memory leaks Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f88483cba38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38) #1 0x7f8845e1bd77 in g_malloc0 ../glib/gmem.c:129 #2 0x7f8845e1c04b in g_malloc0_n ../glib/gmem.c:360 #3 0x5584d2732498 in block_job_txn_new /home/elmarco/src/qemu/blockjob.c:172 #4 0x5584d2739b28 in block_job_create /home/elmarco/src/qemu/blockjob.c:973 #5 0x5584d270ae31 in mk_job /home/elmarco/src/qemu/tests/test-blockjob.c:34 #6 0x5584d270b1c1 in do_test_id /home/elmarco/src/qemu/tests/test-blockjob.c:57 #7 0x5584d270b65c in test_job_ids /home/elmarco/src/qemu/tests/test-blockjob.c:118 #8 0x7f8845e40b69 in test_case_run ../glib/gtestutils.c:2255 #9 0x7f8845e40f29 in g_test_run_suite_internal ../glib/gtestutils.c:2339 #10 0x7f8845e40fd2 in g_test_run_suite_internal ../glib/gtestutils.c:2351 #11 0x7f8845e411e9 in g_test_run_suite ../glib/gtestutils.c:2426 #12 0x7f8845e3fe72 in g_test_run ../glib/gtestutils.c:1692 #13 0x5584d270d6e2 in main /home/elmarco/src/qemu/tests/test-blockjob.c:377 #14 0x7f8843641f29 in __libc_start_main (/lib64/libc.so.6+0x20f29) Add an assert to make sure that the job doesn't have associated txn before free(). [Jeff Cody: N.B., used updated patch provided by John Snow] Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-04-03 09:56:55 -04:00
Liang Li	b76e4458b1	block/mirror: change the semantic of 'force' of block-job-cancel When doing drive mirror to a low speed shared storage, if there was heavy BLK IO write workload in VM after the 'ready' event, drive mirror block job can't be canceled immediately, it would keep running until the heavy BLK IO workload stopped in the VM. Libvirt depends on the current block-job-cancel semantics, which is that when used without a flag after the 'ready' event, the command blocks until data is in sync. However, these semantics are awkward in other situations, for example, people may use drive mirror for realtime backups while still wanting to use block live migration. Libvirt cannot start a block live migration while another drive mirror is in progress, but the user would rather abandon the backup attempt as broken and proceed with the live migration than be stuck waiting for the current drive mirror backup to finish. The drive-mirror command already includes a 'force' flag, which libvirt does not use, although it documented the flag as only being useful to quit a job which is paused. However, since quitting a paused job has the same effect as abandoning a backup in a non-paused job (namely, the destination file is not in sync, and the command completes immediately), we can just improve the documentation to make the force flag obviously useful. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: John Snow <jsnow@redhat.com> Reported-by: Huaitong Han <huanhuaitong@didichuxing.com> Signed-off-by: Huaitong Han <huanhuaitong@didichuxing.com> Signed-off-by: Liang Li <liliangleo@didichuxing.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:39 +01:00
John Snow	b40dacdc7c	blockjobs: Expose manual property Expose the "manual" property via QAPI for the backup-related jobs. As of this commit, this allows the management API to request the "concluded" and "dismiss" semantics for backup jobs. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	11b61fbc0d	blockjobs: add block-job-finalize Instead of automatically transitioning from PENDING to CONCLUDED, gate the .prepare() and .commit() phases behind an explicit acknowledgement provided by the QMP monitor if auto_finalize = false has been requested. This allows us to perform graph changes in prepare and/or commit so that graph changes do not occur autonomously without knowledge of the controlling management layer. Transactions that have reached the "PENDING" state together can all be moved to invoke their finalization methods by issuing block_job_finalize to any one job in the transaction. Jobs in a transaction with mixed job->auto_finalize settings will all remain stuck in the "PENDING" state, as if the entire transaction was specified with auto_finalize = false. Jobs that specified auto_finalize = true, however, will still not emit the PENDING event. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	5f241594c4	blockjobs: add PENDING status and event For jobs utilizing the new manual workflow, we intend to prohibit them from modifying the block graph until the management layer provides an explicit ACK via block-job-finalize to move the process forward. To distinguish this runstate from "ready" or "waiting," we add a new "pending" event and status. For now, the transition from PENDING to CONCLUDED/ABORTING is automatic, but a future commit will add the explicit block-job-finalize step. Transitions: Waiting -> Pending: Normal transition. Pending -> Concluded: Normal transition. Pending -> Aborting: Late transactional failures and cancellations. Removed Transitions: Waiting -> Concluded: Jobs must go to PENDING first. Verbs: Cancel: Can be applied to a pending job. +---------+ \|UNDEFINED\| +--+------+ \| +--v----+ +---------+CREATED+-----------------+ \| +--+----+ \| \| \| \| \| +--+----+ +------+ \| +---------+RUNNING<----->PAUSED\| \| \| +--+-+--+ +------+ \| \| \| \| \| \| \| +------------------+ \| \| \| \| \| \| +--v--+ +-------+ \| \| +---------+READY<------->STANDBY\| \| \| \| +--+--+ +-------+ \| \| \| \| \| \| \| +--v----+ \| \| +---------+WAITING<---------------+ \| \| +--+----+ \| \| \| \| \| +--v----+ \| +---------+PENDING\| \| \| +--+----+ \| \| \| \| +--v-----+ +--v------+ \| \|ABORTING+--->CONCLUDED\| \| +--------+ +--+------+ \| \| \| +--v-+ \| \|NULL<--------------------+ +----+ Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	e8af5686ff	blockjobs: add waiting status For jobs that are stuck waiting on others in a transaction, it would be nice to know that they are no longer "running" in that sense, but instead are waiting on other jobs in the transaction. Jobs that are "waiting" in this sense cannot be meaningfully altered any longer as they have left their running loop. The only meaningful user verb for jobs in this state is "cancel," which will cancel the whole transaction, too. Transitions: Running -> Waiting: Normal transition. Ready -> Waiting: Normal transition. Waiting -> Aborting: Transactional cancellation. Waiting -> Concluded: Normal transition. Removed Transitions: Running -> Concluded: Jobs must go to WAITING first. Ready -> Concluded: Jobs must go to WAITING first. Verbs: Cancel: Can be applied to WAITING jobs. +---------+ \|UNDEFINED\| +--+------+ \| +--v----+ +---------+CREATED+-----------------+ \| +--+----+ \| \| \| \| \| +--v----+ +------+ \| +---------+RUNNING<----->PAUSED\| \| \| +--+-+--+ +------+ \| \| \| \| \| \| \| +------------------+ \| \| \| \| \| \| +--v--+ +-------+ \| \| +---------+READY<------->STANDBY\| \| \| \| +--+--+ +-------+ \| \| \| \| \| \| \| +--v----+ \| \| +---------+WAITING<---------------+ \| \| +--+----+ \| \| \| \| +--v-----+ +--v------+ \| \|ABORTING+--->CONCLUDED\| \| +--------+ +--+------+ \| \| \| +--v-+ \| \|NULL<--------------------+ +----+ Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	2da4617a54	blockjobs: add prepare callback Some jobs upon finalization may need to perform some work that can still fail. If these jobs are part of a transaction, it's important that these callbacks fail the entire transaction. We allow for a new callback in addition to commit/abort/clean that allows us the opportunity to have fairly late-breaking failures in the transactional process. The expected flow is: - All jobs in a transaction converge to the PENDING state, added in a forthcoming commit. - Upon being finalized, either automatically or explicitly by the user, jobs prepare to complete. - If any job fails preparation, all jobs call .abort. - Otherwise, they succeed and call .commit. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	efe4d4b7b2	blockjobs: add block_job_txn_apply function Simply apply a function transaction-wide. A few more uses of this in forthcoming patches. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	43628d9336	blockjobs: add commit, abort, clean helpers The completed_single function is getting a little mucked up with checking to see which callbacks exist, so let's factor them out. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	35d6b368f2	blockjobs: ensure abort is called for cancelled jobs Presently, even if a job is canceled post-completion as a result of a failing peer in a transaction, it will still call .commit because nothing has updated or changed its return code. The reason why this does not cause problems currently is because backup's implementation of .commit checks for cancellation itself. I'd like to simplify this contract: (1) Abort is called if the job/transaction fails (2) Commit is called if the job/transaction succeeds To this end: A job's return code, if 0, will be forcibly set as -ECANCELED if that job has already concluded. Remove the now redundant check in the backup job implementation. We need to check for cancellation in both block_job_completed AND block_job_completed_single, because jobs may be cancelled between those two calls; for instance in transactions. This also necessitates an ABORTING -> ABORTING transition to be allowed. The check in block_job_completed could be removed, but there's no point in starting to attempt to succeed a transaction that we know in advance will fail. This does NOT affect mirror jobs that are "canceled" during their synchronous phase. The mirror job itself forcibly sets the canceled property to false prior to ceding control, so such cases will invoke the "commit" callback. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	75f710599f	blockjobs: add block_job_dismiss For jobs that have reached their CONCLUDED state, prior to having their last reference put down (meaning jobs that have completed successfully, unsuccessfully, or have been canceled), allow the user to dismiss the job's lingering status report via block-job-dismiss. This gives management APIs the chance to conclusively determine if a job failed or succeeded, even if the event broadcast was missed. Note: block_job_do_dismiss and block_job_decommission happen to do exactly the same thing, but they're called from different semantic contexts, so both aliases are kept to improve readability. Note 2: Don't worry about the 0x04 flag definition for AUTO_DISMISS, she has a friend coming in a future patch to fill the hole where 0x02 is. Verbs: Dismiss: operates on CONCLUDED jobs only. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	3925cd3bc7	blockjobs: add NULL state Add a new state that specifically demarcates when we begin to permanently demolish a job after it has performed all work. This makes the transition explicit in the STM table and highlights conditions under which a job may be demolished. Alongside this state, add a new helper command "block_job_decommission", which transitions to the NULL state and puts down our implicit reference. This separates instances in the code for "block_job_unref" which merely undo a matching "block_job_ref" with instances intended to initiate the full destruction of the object. This decommission action also sets a number of fields to make sure that block internals or external users that are holding a reference to a job to see when it "finishes" are convinced that the job object is "done." This is necessary, for instance, to do a block_job_cancel_sync on a created object which will not make any progress. Now, all jobs must go through block_job_decommission prior to being freed, giving us start-to-finish state machine coverage for jobs. Transitions: Created -> Null: Early failure event before the job is started Concluded -> Null: Standard transition. Verbs: None. This should not ever be visible to the monitor. +---------+ \|UNDEFINED\| +--+------+ \| +--v----+ +---------+CREATED+------------------+ \| +--+----+ \| \| \| \| \| +--v----+ +------+ \| +---------+RUNNING<----->PAUSED\| \| \| +--+-+--+ +------+ \| \| \| \| \| \| \| +------------------+ \| \| \| \| \| \| +--v--+ +-------+ \| \| +---------+READY<------->STANDBY\| \| \| \| +--+--+ +-------+ \| \| \| \| \| \| +--v-----+ +--v------+ \| \| \|ABORTING+--->CONCLUDED<-------------+ \| +--------+ +--+------+ \| \| \| +--v-+ \| \|NULL<---------------------+ +----+ Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	e0cf03647a	blockjobs: add CONCLUDED state add a new state "CONCLUDED" that identifies a job that has ceased all operations. The wording was chosen to avoid any phrasing that might imply success, error, or cancellation. The task has simply ceased all operation and can never again perform any work. ("finished", "done", and "completed" might all imply success.) Transitions: Running -> Concluded: normal completion Ready -> Concluded: normal completion Aborting -> Concluded: error and cancellations Verbs: None as of this commit. (a future commit adds 'dismiss') +---------+ \|UNDEFINED\| +--+------+ \| +--v----+ +---------+CREATED\| \| +--+----+ \| \| \| +--v----+ +------+ +---------+RUNNING<----->PAUSED\| \| +--+-+--+ +------+ \| \| \| \| \| +------------------+ \| \| \| \| +--v--+ +-------+ \| +---------+READY<------->STANDBY\| \| \| +--+--+ +-------+ \| \| \| \| +--v-----+ +--v------+ \| \|ABORTING+--->CONCLUDED<-------------+ +--------+ +---------+ Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	10a3fbb0f7	blockjobs: add ABORTING state Add a new state ABORTING. This makes transitions from normative states to error states explicit in the STM, and serves as a disambiguation for which states may complete normally when normal end-states (CONCLUDED) are added in future commits. Notably, Paused/Standby jobs do not transition directly to aborting, as they must wake up first and cooperate in their cancellation. Transitions: Created -> Aborting: can be cancelled (by the system) Running -> Aborting: can be cancelled or encounter an error Ready -> Aborting: can be cancelled or encounter an error Verbs: None. The job must finish cleaning itself up and report its final status. +---------+ \|UNDEFINED\| +--+------+ \| +--v----+ +---------+CREATED\| \| +--+----+ \| \| \| +--v----+ +------+ +---------+RUNNING<----->PAUSED\| \| +--+----+ +------+ \| \| \| +--v--+ +-------+ +---------+READY<------->STANDBY\| \| +-----+ +-------+ \| +--v-----+ \|ABORTING\| +--------+ Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	0ec4dfb8d6	blockjobs: add block_job_verb permission table Which commands ("verbs") are appropriate for jobs in which state is also somewhat burdensome to keep track of. As of this commit, it looks rather useless, but begins to look more interesting the more states we add to the STM table. A recurring theme is that no verb will apply to an 'undefined' job. Further, it's not presently possible to restrict the "pause" or "resume" verbs any more than they are in this commit because of the asynchronous nature of how jobs enter the PAUSED state; justifications for some seemingly erroneous applications are given below. ===== Verbs ===== Cancel: Any state except undefined. Pause: Any state except undefined; 'created': Requests that the job pauses as it starts. 'running': Normal usage. (PAUSED) 'paused': The job may be paused for internal reasons, but the user may wish to force an indefinite user-pause, so this is allowed. 'ready': Normal usage. (STANDBY) 'standby': Same logic as above. Resume: Any state except undefined; 'created': Will lift a user's pause-on-start request. 'running': Will lift a pause request before it takes effect. 'paused': Normal usage. 'ready': Will lift a pause request before it takes effect. 'standby': Normal usage. Set-speed: Any state except undefined, though ready may not be meaningful. Complete: Only a 'ready' job may accept a complete request. ======= Changes ======= (1) To facilitate "nice" error checking, all five major block-job verb interfaces in blockjob.c now support an errp parameter: - block_job_user_cancel is added as a new interface. - block_job_user_pause gains an errp paramter - block_job_user_resume gains an errp parameter - block_job_set_speed already had an errp parameter. - block_job_complete already had an errp parameter. (2) block-job-pause and block-job-resume will no longer no-op when trying to pause an already paused job, or trying to resume a job that isn't paused. These functions will now report that they did not perform the action requested because it was not possible. iotests have been adjusted to address this new behavior. (3) block-job-complete doesn't worry about checking !block_job_started, because the permission table guards against this. (4) test-bdrv-drain's job implementation needs to announce that it is 'ready' now, in order to be completed. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	c9de40505f	blockjobs: add state transition table The state transition table has mostly been implied. We're about to make it a bit more complex, so let's make the STM explicit instead. Perform state transitions with a function that for now just asserts the transition is appropriate. Transitions: Undefined -> Created: During job initialization. Created -> Running: Once the job is started. Jobs cannot transition from "Created" to "Paused" directly, but will instead synchronously transition to running to paused immediately. Running -> Paused: Normal workflow for pauses. Running -> Ready: Normal workflow for jobs reaching their sync point. (e.g. mirror) Ready -> Standby: Normal workflow for pausing ready jobs. Paused -> Running: Normal resume. Standby -> Ready: Resume of a Standby job. +---------+ \|UNDEFINED\| +--+------+ \| +--v----+ \|CREATED\| +--+----+ \| +--v----+ +------+ \|RUNNING<----->PAUSED\| +--+----+ +------+ \| +--v--+ +-------+ \|READY<------->STANDBY\| +-----+ +-------+ Notably, there is no state presently defined as of this commit that deals with a job after the "running" or "ready" states, so this table will be adjusted alongside the commits that introduce those states. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	58b295ba52	blockjobs: add status enum We're about to add several new states, and booleans are becoming unwieldly and difficult to reason about. It would help to have a more explicit bookkeeping of the state of blockjobs. To this end, add a new "status" field and add our existing states in a redundant manner alongside the bools they are replacing: UNDEFINED: Placeholder, default state. Not currently visible to QMP unless changes occur in the future to allow creating jobs without starting them via QMP. CREATED: replaces !!job->co && paused && !busy RUNNING: replaces effectively (!paused && busy) PAUSED: Nearly redundant with info->paused, which shows pause_count. This reports the actual status of the job, which almost always matches the paused request status. It differs in that it is strictly only true when the job has actually gone dormant. READY: replaces job->ready. STANDBY: Paused, but job->ready is true. New state additions in coming commits will not be quite so redundant: WAITING: Waiting on transaction. This job has finished all the work it can until the transaction converges, fails, or is canceled. PENDING: Pending authorization from user. This job has finished all the work it can until the job or transaction is finalized via block_job_finalize. This implies the transaction has converged and left the WAITING phase. ABORTING: Job has encountered an error condition and is in the process of aborting. CONCLUDED: Job has ceased all operations and has a return code available for query and may be dismissed via block_job_dismiss. NULL: Job has been dismissed and (should) be destroyed. Should never be visible to QMP. Some of these states appear somewhat superfluous, but it helps define the expected flow of a job; so some of the states wind up being synchronous empty transitions. Importantly, jobs can be in only one of these states at any given time, which helps code and external users alike reason about the current condition of a job unambiguously. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	75859b9420	blockjobs: model single jobs as transactions model all independent jobs as single job transactions. It's one less case we have to worry about when we add more states to the transition machine. This way, we can just treat all job lifetimes exactly the same. This helps tighten assertions of the STM graph and removes some conditionals that would have been needed in the coming commits adding a more explicit job lifetime management API. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	d4fce18844	blockjobs: fix set-speed kick If speed is '0' it's not actually "less than" the previous speed. Kick the job in this case too. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
Markus Armbruster	9af2398977	Include less of the generated modular QAPI headers In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-02 13:45:50 -06:00
Markus Armbruster	bbcad965bf	Drop superfluous includes of qapi/qmp/qjson.h Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-19-armbru@redhat.com>	2018-02-09 13:52:15 +01:00
Markus Armbruster	abb297ed44	Include qmp-commands.h exactly where needed Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-7-armbru@redhat.com> [OSX breakage fixed]	2018-02-09 13:52:10 +01:00
Markus Armbruster	e688df6bc4	Include qapi/error.h exactly where needed This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-5-armbru@redhat.com> [Semantic conflict with commit `34e304e975` resolved, OSX breakage fixed]	2018-02-09 13:50:17 +01:00
Kevin Wolf	ad90febaf2	blockjob: Pause job on draining any job BDS Block jobs already paused themselves when their main BlockBackend entered a drained section. This is not good enough: We also want to pause a block job and may not submit new requests if, for example, the mirror target node should be drained. This implements .drained_begin/end callbacks in child_job in order to consider all block nodes related to the job, and removes the BlockBackend callbacks which are unnecessary now because the root of the job main BlockBackend is always referenced with a child_job, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-12-22 15:05:32 +01:00
John Snow	aa9ef2e65b	blockjob: kick jobs on set-speed If users set an unreasonably low speed (like one byte per second), the calculated delay may exceed many hours. While we like to punish users for asking for stupid things, we do also like to allow users to correct their wicked ways. When a user provides a new speed, kick the job to allow it to recalculate its delay. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20171213204611.26276-1-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 10:54:13 -05:00
Alberto Garcia	3d5d319e12	blockjob: Make block_job_pause_all() keep a reference to the jobs Starting from commit `40840e419b` we are pausing all block jobs during bdrv_reopen_multiple() to prevent any of them from finishing and removing nodes from the graph while they are being reopened. It turns out that pausing a block job doesn't necessarily prevent it from finishing: a paused block job can still run its exit function from the main loop and call block_job_completed(). The mirror block job in particular always goes to the main loop while it is paused (by virtue of the bdrv_drained_begin() call in mirror_run()). Destroying a paused block job during bdrv_reopen_multiple() has two consequences: 1) The references to the nodes involved in the job are released, possibly destroying some of them. If those nodes were in the reopen queue this would trigger the problem originally described in commit `40840e419b`, crashing QEMU. 2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would not be doing all necessary bdrv_parent_drained_end() calls. I can reproduce problem 1) easily with iotest 030 by increasing STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking the iotest like in this example: https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html This patch keeps an additional reference to all block jobs between block_job_pause_all() and block_job_resume_all(), guaranteeing that they are kept alive. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-12-04 17:44:51 +01:00
Paolo Bonzini	fc24908e7d	blockjob: reimplement block_job_sleep_ns to allow cancellation This reverts the effects of commit `4afeffc857` ("blockjob: do not allow coroutine double entry or entry-after-completion", 2017-11-21) This fixed the symptom of a bug rather than the root cause. Canceling the wait on a sleeping blockjob coroutine is generally fine, we just need to make it work correctly across AioContexts. To do so, use a QEMUTimer that calls block_job_enter. Use a mutex to ensure that block_job_enter synchronizes correctly with block_job_sleep_ns. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Tested-By: Jeff Cody <jcody@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-29 15:26:21 +01:00
Paolo Bonzini	356f59b875	blockjob: introduce block_job_do_yield Hide the clearing of job->busy in a single function, and set it in block_job_enter. This lets block_job_do_yield verify that qemu_coroutine_enter is not used while job->busy = false. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Tested-By: Jeff Cody <jcody@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-29 15:11:14 +01:00
Paolo Bonzini	5bf1d5a73a	blockjob: remove clock argument from block_job_sleep_ns All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to support more than one clock when block_job_sleep_ns switches to a single timer stored in the BlockJob struct. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Tested-By: Jeff Cody <jcody@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-29 15:11:02 +01:00
Alberto Garcia	0a3e155f3f	blockjob: Remove the job from the list earlier in block_job_unref() When destroying a block job in block_job_unref() we should remove it from the job list before calling block_job_remove_all_bdrv(). This is because removing the BDSs can trigger an aio_poll() and wake up other jobs that might attempt to use the block job list. If that happens the job we're currently destroying should not be in that list anymore. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-28 16:59:24 +01:00
Jeff Cody	4afeffc857	blockjob: do not allow coroutine double entry or entry-after-completion When block_job_sleep_ns() is called, the co-routine is scheduled for future execution. If we allow the job to be re-entered prior to the scheduled time, we present a race condition in which a coroutine can be entered recursively, or even entered after the coroutine is deleted. The job->busy flag is used by blockjobs when a coroutine is busy executing. The function 'block_job_enter()' obeys the busy flag, and will not enter a coroutine if set. If we sleep a job, we need to leave the busy flag set, so that subsequent calls to block_job_enter() are prevented. This changes the prior behavior of block_job_cancel() being able to immediately wake up and cancel a job; in practice, this should not be an issue, as the coroutine sleep times are generally very small, and the cancel will occur the next time the coroutine wakes up. This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708 Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-21 11:51:18 -05:00
Markus Armbruster	977c736f80	qapi: Mechanically convert FOO_lookup[...] to FOO_str(...) Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-14-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2017-09-04 13:09:13 +02:00
sochin.jiang	4172a00373	fix: avoid an infinite loop or a dangling pointer problem in img_commit img_commit could fall into an infinite loop calling run_block_job() if its blockjob fails on any I/O error, fix this already known problem. Signed-off-by: sochin.jiang <sochin.jiang@huawei.com> Message-id: 1497509253-28941-1-git-send-email-sochin.jiang@huawei.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:54:46 +02:00
Paolo Bonzini	eb05e011e2	blockjob: use deferred_to_main_loop to indicate the coroutine has ended All block jobs are using block_job_defer_to_main_loop as the final step just before the coroutine terminates. At this point, block_job_enter should do nothing, but currently it restarts the freed coroutine. Now, the job->co states should probably be changed to an enum (e.g. BEFORE_START, STARTED, YIELDED, COMPLETED) subsuming block_job_started, job->deferred_to_main_loop and job->busy. For now, this patch eliminates the problematic reenter by removing the reset of job->deferred_to_main_loop (which served no purpose, as far as I could see) and checking the flag in block_job_enter. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-12-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	4fb588e95b	blockjob: reorganize block_job_completed_txn_abort This splits the part that touches job states from the part that invokes callbacks. It will make the code simpler to understand once job states will be protected by a different mutex than the AioContext lock. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-11-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	c8ab5c2dde	blockjob: group BlockJob transaction functions together Yet another pure code movement patch, preparing for the next change. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-9-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	4c241cf5d6	blockjob: introduce block_job_cancel_async, check iostatus invariants The new functions helps respecting the invariant that the coroutine is entered with false user_resume, zero pause count and no error recorded in the iostatus. Resetting the iostatus is now common to all of block_job_cancel_async, block_job_user_resume and block_job_iostatus_reset, albeit with slight differences: - block_job_cancel_async resets the iostatus, and resumes the job if there was an error, but the coroutine is not restarted immediately. For example the caller may continue with a call to block_job_finish_sync. - block_job_user_resume resets the iostatus. It wants to resume the job unconditionally, even if there was no error. - block_job_iostatus_reset doesn't resume the job at all. Maybe that's a bug but it should be fixed separately. block_job_iostatus_reset does the least common denominator, so add some checking but otherwise leave it as the entry point for resetting the iostatus. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-8-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	2caf63a903	blockjob: move iostatus reset inside block_job_user_resume Outside blockjob.c, the block_job_iostatus_reset function is used once in the monitor and once in BlockBackend. When we introduce the block job mutex, block_job_iostatus_reset's client is going to be the block layer (for which blockjob.c will take the block job mutex) rather than the monitor (which will take the block job mutex by itself). The monitor's call to block_job_iostatus_reset from the monitor comes just before the sole call to block_job_user_resume, so reset the iostatus directly from block_job_iostatus_reset. This will avoid the need to introduce separate block_job_iostatus_reset and block_job_iostatus_reset_locked APIs. After making this change, move the function together with the others that were moved in the previous patch. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-7-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	88691b37f8	blockjob: separate monitor and blockjob APIs We have two different headers for block job operations, blockjob.h and blockjob_int.h. The former contains APIs called by the monitor, the latter contains APIs called by the block job drivers and the block layer itself. Keep the two APIs separate in the blockjob.c file too. This will be useful when transitioning away from the AioContext lock, because there will be locking policies for the two categories, too---the monitor will have to call new block_job_lock/unlock APIs, while blockjob APIs will take care of this for the users. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170508141310.8674-6-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	f321dcb57f	blockjob: introduce block_job_pause/resume_all Remove use of block_job_pause/resume from outside blockjob.c, thus making them static. The new functions are used by the block layer, so place them in blockjob_int.h. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-5-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	05b0d8e3b8	blockjob: introduce block_job_early_fail Outside blockjob.c, block_job_unref is only used when a block job fails to start, and block_job_ref is not used at all. The reference counting thus is pretty well hidden. Introduce a separate function to be used by block jobs; because block_job_ref and block_job_unref now become static, move them earlier in blockjob.c. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-4-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	9f086abbe4	blockjob: remove iostatus_reset callback This is unused since commit `66a0fae` ("blockjob: Don't touch BDS iostatus", 2016-05-19). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-3-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	6573d9c638	blockjob: remove unnecessary check !job is always checked prior to the call, drop it from here. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-2-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Fam Zheng	aef4278c5a	blockjob: Use bdrv_coroutine_enter to start coroutine Resuming and especially starting of the block job coroutine, could be issued in the main thread. However the coroutine's "home" ctx should be set to the same context as job->blk. Use bdrv_coroutine_enter to ensure that. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2017-04-11 20:07:15 +08:00
John Snow	600ac6a0ef	blockjob: add devops to blockjob backends This lets us hook into drained_begin and drained_end requests from the backend level, which is particularly useful for making sure that all jobs associated with a particular node (whether the source or the target) receive a drain request. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170316212351.13797-4-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-22 13:26:27 -04:00
John Snow	e3796a245a	blockjob: add block_job_start_shim The purpose of this shim is to allow us to pause pre-started jobs. The purpose of that is to allow us to buffer a pause request that will be able to take effect before the job ever does any work, allowing us to create jobs during a quiescent state (under which they will be automatically paused), then resuming the jobs after the critical section in any order, either: (1) -block_job_start -block_job_resume (via e.g. drained_end) (2) -block_job_resume (via e.g. drained_end) -block_job_start The problem that requires a startup wrapper is the idea that a job must start in the busy=true state only its first time-- all subsequent entries require busy to be false, and the toggling of this state is otherwise handled during existing pause and yield points. The wrapper simply allows us to mandate that a job can "start," set busy to true, then immediately pause only if necessary. We could avoid requiring a wrapper, but all jobs would need to do it, so it's been factored out here. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170316212351.13797-2-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-22 13:26:27 -04:00
Paolo Bonzini	d79df2a2ce	blockjob: avoid recursive AioContext locking Streaming or any other block job hangs when performed on a block device that has a non-default iothread. This happens because the AioContext is acquired twice by block_job_defer_to_main_loop_bh and then released only once by BDRV_POLL_WHILE. (Insert rants on recursive mutexes, which unfortunately are a temporary but necessary evil for iothreads at the moment). Luckily, the reason for the double acquisition is simple; the function acquires the AioContext for both the job iothread and the BDS iothread, in case the BDS iothread was changed while the job was running. It is therefore enough to skip the second acquisition when the two AioContexts are one and the same. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490118490-5597-1-git-send-email-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-22 13:26:27 -04:00
Kevin Wolf	bbc02b90bc	blockjob: Factor out block_job_remove_all_bdrv() In some cases, we want to remove op blockers on intermediate nodes before the whole block job transaction has completed (because they block restoring the final graph state during completion). Provide a function for this. The whole block job lifecycle is a bit messed up and it's hard to actually do all things in the right order, but I'll leave simplifying this for another day. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	76d554e20b	blockjob: Add permissions to block_job_add_bdrv() Block jobs don't actually do I/O through the the reference they create with block_job_add_bdrv(), but they might want to use the permisssion system to express what the block job does to intermediate nodes. This adds permissions to block_job_add_bdrv() to provide the means to request permissions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	c6cc12bfa7	blockjob: Add permissions to block_job_create() This functions creates a BlockBackend internally, so the block jobs need to tell it what they want to do with the BB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	d7086422b1	block: Add error parameter to blk_insert_bs() Now that blk_insert_bs() requests the BlockBackend permissions for the node it attaches to, it can fail. Instead of aborting, pass the errors to the callers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	6d0eb64d5c	block: Add permissions to blk_new() We want every user to be specific about the permissions it needs, so we'll pass the initial permissions as parameters to blk_new(). A user only needs to call blk_set_perm() if it wants to change the permissions after the fact. The permissions are stored in the BlockBackend and applied whenever a BlockDriverState should be attached in blk_insert_bs(). This does not include actually choosing the right set of permissions everywhere yet. Instead, the usual FIXME comment is added to each place and will be addressed in individual patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Daniel P. Berrange	0ab8ed18a6	trace: switch to modular code generation for sub-directories Introduce rules in the top level Makefile that are able to generate trace.[ch] files in every subdirectory which has a trace-events file. The top level directory is handled specially, so instead of creating trace.h, it creates trace-root.h. This allows sub-directories to include the top level trace-root.h file, without ambiguity wrt to the trace.g file in the current sub-dir. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170125161417.31949-7-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-31 17:11:18 +00:00
John Snow	5ccac6f186	blockjob: add block_job_start Instead of automatically starting jobs at creation time via backup_start et al, we'd like to return a job object pointer that can be started manually at later point in time. For now, add the block_job_start mechanism and start the jobs automatically as we have been doing, with conversions job-by-job coming in later patches. Of note: cancellation of unstarted jobs will perform all the normal cleanup as if the job had started, particularly abort and clean. The only difference is that we will not emit any events, because the job never actually started. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1478587839-9834-5-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00

1 2 3 4 5

228 Commits