Commit Graph

362 Commits

Author SHA1 Message Date
Fam Zheng
ee7d7aabda nbd: Always call "close_fn" in nbd_client_new
Rename the parameter "close" to "close_fn" to disambiguous with
close(2).

This unifies error handling paths of NBDClient allocation:
nbd_client_new will shutdown the socket and call the "close_fn" callback
if negotiation failed, so the caller don't need a different path than
the normal close.

The returned pointer is never used, make it void in preparation for the
next patch.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1452760863-25350-2-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-15 18:58:01 +01:00
Fam Zheng
e40e5027f6 block: Add check on mirror target
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450932306-13717-4-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-01-07 21:30:18 +01:00
Fam Zheng
05e4d14bf3 block: Rename BLOCK_OP_TYPE_MIRROR to BLOCK_OP_TYPE_MIRROR_SOURCE
It's necessary to distinguish source and target before we can add
blockdev-mirror, because we would want a concrete type of operation to
check on target bs before starting.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450932306-13717-2-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-01-07 21:30:17 +01:00
Stefan Hajnoczi
bd44feb754 block: add BlockLimits.max_iov field
The maximum number of struct iovec elements depends on the
BlockDriverState.  The raw-posix and iSCSI protocols have a maximum of
IOV_MAX but others could have different values.

Cc: Peter Lieven <pl@kamp.de>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-22 16:01:07 +08:00
Fam Zheng
bbe1ef2686 block: Remove prototype of bdrv_swap from header
The function has gone.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Max Reitz
8b13976d3f block: Add opaque value to the amend CB
Add an opaque value which is to be passed to the bdrv_amend_options()
status callback.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Kevin Wolf
145f598e4a block: Introduce bs->explicit_options
bs->options doesn't only contain options that the user explicitly
requested, but also option that were derived from flags, the filename or
inherited from the parent node.

For reopen, it is important to know the difference because reopening the
parent can change inherited values in child nodes, but it shouldn't
change any options that were explicitly specified for the child.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:43 +01:00
Kevin Wolf
8e2160e2c7 block: Add infrastructure for option inheritance
Options are not actually inherited from the parent node yet, but this
commit lays the grounds for doing so.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
4cdd01d32e block: Pass driver-specific options to .bdrv_refresh_filename()
In order to decide whether a blkdebug: filename can be produced or a
json: one is necessary, blkdebug checked whether bs->options had more
options than just "config", "x-image" or "image" (the latter including
nested options). That doesn't work well when generic block layer options
are present.

This patch passes an option QDict to the driver that contains only
driver-specific options, i.e. the options for the general block layer as
well as child nodes are already filtered out. Works much better this
way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
260fecf13b block: Exclude nested options only for children in append_open_options()
Some drivers have nested options (e.g. blkdebug rule arrays), which
don't belong to a child node and shouldn't be removed. Don't remove all
options with "." in their name, but check for the complete prefixes of
actually existing child nodes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
d9b7b05703 block: Allow references for backing files
For bs->file, using references to existing BDSes has been possible for a
while already. This patch enables the same for bs->backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18 14:34:42 +01:00
Kevin Wolf
5365f44dfa qcow2: Add .bdrv_join_options callback
qcow2 accepts a few driver-specific options that overlap semantically
(e.g. "overlap-check" is an alias of "overlap-check.template", and any
missing cache size option is derived from the given ones).

When bdrv_reopen() merges the set of updated options with left out
options that should be kept at their old value, we need to consider this
and filter out any duplicates (which would generally cause errors
because new and old value would contradict each other).

This patch adds a .bdrv_join_options callback to BlockDriver and
implements it for qcow2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-12-18 14:34:42 +01:00
Eric Blake
a31939e6c8 blkdebug: Merge hand-rolled and qapi BlkdebugEvent enum
No need to keep two separate enums, where editing one is likely
to forget the other.  Now that we can specify a qapi enum prefix,
we don't even have to change the bulk of the uses.

get_event_by_name() could perhaps be replaced by qapi_enum_parse(),
but I left that for another day.

CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1447836791-369-20-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-12-17 08:21:27 +01:00
Fam Zheng
61408b250e block: Don't wait serialising for non-COR read requests
The assertion problem was noticed in 06c3916b35, but it wasn't
completely fixed, because even though the req is not marked as
serialising, it still gets serialised by wait_serialising_requests
against other serialising requests, which could lead to the same
assertion failure.

Fix it by even more explicitly skipping the serialising for this
specific case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-03 11:08:07 +08:00
Denis V. Lunev
7cb1448149 migration: implement bdrv_all_find_vmstate_bs helper
The patch also ensures proper locking for the operation.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
a9085f9b55 snapshot: create bdrv_all_create_snapshot helper
to create snapshot for all loaded block drivers.

The patch also ensures proper locking.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
723ccda1a0 snapshot: create bdrv_all_find_snapshot helper
to check that snapshot is available for all loaded block drivers.
The check bs != bs1 in hmp_info_snapshots is an optimization. The check
for availability of this snapshot will return always true as the list
of snapshots was collected from that image.

The patch also ensures proper locking.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
4c1cdbaad0 snapshot: create bdrv_all_goto_snapshot helper
to switch to snapshot on all loaded block drivers.

The patch also ensures proper locking.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
9b00ea376d snapshot: create bdrv_all_delete_snapshot helper
to delete snapshots from all loaded block drivers.

The patch also ensures proper locking.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
25af925fff snapshot: return error code from bdrv_snapshot_delete_by_id_or_name
this will make code better in the next patch

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
e9ff957ac2 snapshot: create helper to test that block drivers supports snapshots
The patch enforces proper locking for this operation.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Alberto Garcia
aece5edc96 block: Update copyright of the accounting code
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 80a2278e3ec2dafd5daab20a7cb2d6a9b83371e4.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:47 +01:00
Alberto Garcia
96e4dedaff block: Add average I/O queue depth to BlockDeviceTimedStats
This patch adds two new fields to BlockDeviceTimedStats that track the
average number of pending read and write requests for a block device.

The values are calculated for the period of time defined for that
interval.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: fd31fef53e2714f2f30d59ed58ca2f67ec9ab926.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:46 +01:00
Alberto Garcia
979e9b03fc block: Compute minimum, maximum and average I/O latencies
This patch keeps track of the minimum, maximum and average latencies
of I/O operations during a certain interval of time.

The values are exposed in the BlockDeviceTimedStats structure.

An option to define the intervals to collect these statistics will be
added in a separate patch.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: c7382dc89622c64f918d09f32815827772628f8e.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Alberto Garcia
362e9299b3 block: Allow configuring whether to account failed and invalid ops
This patch adds two options, "stats-account-invalid" and
"stats-account-failed", that can be used to decide whether invalid and
failed I/O operations must be used when collecting statistics for
latency and last access time.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: ebc7e5966511a342cad428a392c5f5ad56b15213.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Alberto Garcia
7ee12dafe9 block: Add statistics for failed and invalid I/O operations
This patch adds the block_acct_failed() and block_acct_invalid()
functions to allow keeping track of failed and invalid I/O operations.

The number of failed and invalid operations is exposed in
BlockDeviceStats.

We don't keep track of the time spent on invalid operations because
they are cancelled immediately when they are started.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: a7256ccb883a86356b1c6c46b5a29ed5448546a5.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Alberto Garcia
cb38fffbc9 block: Add idle_time_ns to BlockDeviceStats
This patch adds the new field 'idle_time_ns' to the BlockDeviceStats
structure, indicating the time that has passed since the previous I/O
operation.

It also adds the block_acct_idle_time_ns() call, to ensure that all
references to the clock type used for accounting are in the same
place. This will later allow us to use a different clock for iotests.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 7d8cfcf931453e1a2443e6626e8c1edc347c7c8a.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
John Snow
78f51fde88 block: Add BlockJobTxn support to backup_run
Allow a BlockJobTxn to be passed into backup_run, which
will allow the job to join a transactional group if present.

Propagate this new parameter outward into new QMP helper
functions in blockdev.c to allow transaction commands to
pass forward their BlockJobTxn object in a forthcoming patch.

[split up from a patch originally by Stefan and Fam. --js]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-12-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
Fam Zheng
c55a832fdd block: Add block job transactions
Sometimes block jobs must execute as a transaction group.  Finishing
jobs wait until all other jobs are ready to complete successfully.
Failure or cancellation of one job cancels the other jobs in the group.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-10-git-send-email-jsnow@redhat.com
[Rewrite the implementation which is now contained in block_job_completed.
--Fam]
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
Fam Zheng
a689dbf2df blockjob: Add "completed" and "ret" in BlockJob
They are set when block_job_completed is called.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-8-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
Fam Zheng
57901ecb8e blockjob: Add .commit and .abort block job actions
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-7-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
Fam Zheng
18930ba3d1 blockjob: Introduce reference count and fix reference to job->bs
Add reference count to block job, meanwhile move the ownership of the
reference to job->bs from the caller (which is released in two
completion callbacks) to the block job itself. It is necessary for
block_job_complete_sync to work, because block job shouldn't live longer
than its bs, as asserted in bdrv_delete.

Now block_job_complete_sync can be simplified.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-6-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng
df9a681dc9 qed: Implement .bdrv_drain
The "need_check_timer" is used to clear the "NEED_CHECK" flag in the
image header after a grace period once metadata update has finished. In
compliance to the bdrv_drain semantics we should make sure it remains
deleted once .bdrv_drain is called.

We cannot reuse qed_need_check_timer_cb because here it doesn't satisfy
the assertion.  Do the "plug" and "flush" calls manually.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-10-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng
67da1dc5ce block: Introduce BlockDriver.bdrv_drain callback
Drivers can have internal request sources that generate IO, like the
need_check_timer in QED. Since we want quiesced periods that contain
nested event loops in block layer, we need to have a way to disable such
event sources.

Block drivers must implement the "bdrv_drain" callback if it has any
internal sources that can generate I/O activity, like a timer or a
worker thread (even in a library) that can schedule QEMUBH in an
asynchronous callback.

Update the comments of bdrv_drain and bdrv_drained_begin accordingly.

Like bdrv_requests_pending(), we should consider all the children of bs.
Before, the while loop just works, as bdrv_requests_pending() already
tracks its children; now we mustn't miss the callback, so recurse down
explicitly.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1447064214-29930-9-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng
83c98d7b92 block: Drop BlockDriver.bdrv_ioctl
Now the callback is not used any more, drop the field along with all
implementations in block drivers, which are iscsi and raw.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-8-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng
8b45f6878d block: Add ioctl parameter fields to BlockRequest
The two fields that will be used by ioctl handling code later are added
as union, because it's used exclusively by ioctl code which dosn't need
the four fields in the other struct of the union.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-6-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:42 +01:00
Fam Zheng
ebde595ce6 block: Add more types for tracked request
We'll track more request types besides read and write, change the
boolean field to an enum.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:08 +01:00
Alberto Garcia
a0d64a61db throttle: Use bs->throttle_state instead of bs->io_limits_enabled
There are two ways to check for I/O limits in a BlockDriverState:

- bs->throttle_state: if this pointer is not NULL, it means that this
  BDS is member of a throttling group, its ThrottleTimers structure
  has been initialized and its I/O limits are ready to be applied.

- bs->io_limits_enabled: if true it means that the throttle_state
  pointer is valid _and_ the limits are currently enabled.

The latter is used in several places to check whether a BDS has I/O
limits configured, but what it really checks is whether requests
are being throttled or not. For example, io_limits_enabled can be
temporarily set to false in cases like bdrv_read_unthrottled() without
otherwise touching the throtting configuration of that BDS.

This patch replaces bs->io_limits_enabled with bs->throttle_state in
all cases where what we really want to check is the existence of I/O
limits, not whether they are currently enabled or not.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Max Reitz
c69a4dd899 block: Make bdrv_states public
When inserting a BDS tree into a BB, we will need to add the root BDS to
this list. Since we will want to do that in the blockdev-insert-medium
implementation in blockdev.c, we will need access to it there.

This patch is not exactly elegant, but bdrv_states will be removed in
the future anyway because we no longer need it since we have BBs.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:46 +01:00
Fam Zheng
fbe3fc5cb3 aio: Introduce aio-epoll.c
To minimize code duplication, epoll is hooked into aio-posix's
aio_poll() instead of rolling its own. This approach also has both
compile-time and run-time switchability.

1) When QEMU starts with a small number of fds in the event loop, ppoll
is used.

2) When QEMU starts with a big number of fds, or when more devices are
hot plugged, epoll kicks in when the number of fds hits the threshold.

3) Some fds may not support epoll, such as tty based stdio. In this
case, it falls back to ppoll.

A rough benchmark with scsi-disk on virtio-scsi dataplane (epoll gets
enabled from 64 onward). Numbers are in MB/s.

===============================================
             |     master     |     epoll
             |                |
scsi disks # | read    randrw | read    randrw
-------------|----------------|----------------
1            | 86      36     | 92      45
8            | 87      43     | 86      41
64           | 71      32     | 70      38
128          | 48      24     | 58      31
256          | 37      19     | 57      28
===============================================

To comply with aio_{disable,enable}_external, we always use ppoll when
aio_external_disabled() is true.

[Removed #ifdef CONFIG_EPOLL around AioContext epollfd field declaration
since the field is also referenced outside CONFIG_EPOLL code.
--Stefan]

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446177989-6702-4-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09 09:59:47 +00:00
Fam Zheng
37fcee5d11 aio: Introduce aio_context_setup
This is the place to initialize platform specific bits of AioContext.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446177989-6702-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09 09:59:32 +00:00
Fam Zheng
5ceb9e3928 aio: Introduce aio_external_disabled
This allows AioContext users to check the enable/disable state of
external clients.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446177989-6702-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09 09:59:32 +00:00
Pavel Dovgalyuk
df281b80b9 bottom halves: introduce bh call function
This patch introduces aio_bh_call function. It is used to execute
bottom halves as callbacks without adding them to the queue.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162450.8676.56980.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:03 +01:00
Fam Zheng
51288d7917 block: Introduce "drained begin/end" API
The semantics is that after bdrv_drained_begin(bs), bs will not get new external
requests until the matching bdrv_drained_end(bs).

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Fam Zheng
c1e1e5fa8f aio: introduce aio_{disable,enable}_external
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Fam Zheng
dca21ef23b aio: Add "is_external" flag for event handlers
All callers pass in false, and the real external ones will switch to
true in coming patches.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Alberto Garcia
d87d01e16a throttle: Remove throttle_group_lock/unlock()
The group throttling code was always meant to handle its locking
internally. However, bdrv_swap() was touching the ThrottleGroup
structure directly and therefore needed an API for that.

Now that bdrv_swap() no longer exists there's no need for the
throttle_group_lock() API anymore.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
281d22d86c block: Add BlockBackendRootState
This structure will store some of the state of the root BDS if the BDS
tree is removed, so that state can be restored once a new BDS tree is
inserted.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
973f2ddf7b block/throttle-groups: Make incref/decref public
Throttle groups are not necessarily referenced by BDSs alone; a later
patch will essentially allow BBs to reference them, too. Make the
ref/unref functions public so that reference can be properly accounted
for.

Their interface is slightly adjusted in that they return and take a
ThrottleState pointer, respectively, instead of a ThrottleGroup pointer.
Functionally, they are equivalent, but since ThrottleGroup is not meant
to be used outside of block/throttle-groups.c, ThrottleState is easier
to handle.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
373340b26c block: Move I/O status and error actions into BB
These options are only relevant for the user of a whole BDS tree (like a
guest device or a block job) and should thus be moved into the
BlockBackend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00