mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Anthony Liguori	90c45b3031	Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: (32 commits) osdep: Less restrictive F_SEFL in qemu_dup_flags() qemu-iotests: add testcases for mirroring on-source-error/on-target-error qmp: add pull_event function mirror: add support for on-source-error/on-target-error iostatus: forward block_job_iostatus_reset to block job qemu-iotests: add mirroring test case mirror: implement completion qmp: add drive-mirror command mirror: introduce mirror job block: introduce BLOCK_JOB_READY event block: add block-job-complete block: rename block_job_complete to block_job_completed block: export dirty bitmap information in query-block block: introduce new dirty bitmap functionality block: add bdrv_open_backing_file block: add bdrv_query_stats block: add bdrv_query_info qemu-config: Add new -add-fd command line option monitor: Prevent removing fd from set during init monitor: Enable adding an inherited fd to an fd set ... Conflicts: vl.c Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2012-10-29 10:34:05 -05:00
Paolo Bonzini	b952b5589a	mirror: add support for on-source-error/on-target-error Error management is important for mirroring; otherwise, an error on the target (even something as "innocent" as ENOSPC) requires to start again with a full copy. Similar to on_read_error/on_write_error, two separate knobs are provided for on_source_error (reads) and on_target_error (writes). The default is 'report' for both. The 'ignore' policy will leave the sector dirty, so that it will be retried later. Thus, it will not cause corruption. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-10-24 10:26:22 +02:00
Paolo Bonzini	893f7ebafe	mirror: introduce mirror job This patch adds the implementation of a new job that mirrors a disk to a new image while letting the guest continue using the old image. The target is treated as a "black box" and data is copied from the source to the target in the background. This can be used for several purposes, including storage migration, continuous replication, and observation of the guest I/O in an external program. It is also a first step in replacing the inefficient block migration code that is part of QEMU. The job is possibly never-ending, but it is logically structured into two phases: 1) copy all data as fast as possible until the target first gets in sync with the source; 2) keep target in sync and ensure that reopening to the target gets a correct (full) copy of the source data. The second phase is indicated by the progress in "info block-jobs" reporting the current offset to be equal to the length of the file. When the job is cancelled in the second phase, QEMU will run the job until the source is clean and quiescent, then it will report successful completion of the job. In other words, the BLOCK_JOB_CANCELLED event means that the target may _not_ be consistent with a past state of the source; the BLOCK_JOB_COMPLETED event means that the target is consistent with a past state of the source. (Note that it could already happen that management lost the race against QEMU and got a completion event instead of cancellation). It is not yet possible to complete the job and switch over to the target disk. The next patches will fix this and add many refinements to the basic idea introduced here. These include improved error management, some tunable knobs and performance optimizations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-10-24 10:26:19 +02:00
Paolo Bonzini	d7d512f609	block: add close notifiers The first user of close notifiers will be the embedded NBD server. It would be possible to use them to do some of the ad hoc processing (e.g. for block jobs and I/O limits) that is currently done by bdrv_close. Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-10-23 22:39:32 +02:00
Paolo Bonzini	1d809098aa	stream: add on-error argument This patch adds support for error management to streaming. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-09-28 19:40:56 +02:00
Paolo Bonzini	32c81a4a6e	block: introduce block job error The following behaviors are possible: 'report': The behavior is the same as in 1.1. An I/O error, respectively during a read or a write, will complete the job immediately with an error code. 'ignore': An I/O error, respectively during a read or a write, will be ignored. For streaming, the job will complete with an error and the backing file will be left in place. For mirroring, the sector will be marked again as dirty and re-examined later. 'stop': The job will be paused and the job iostatus will be set to failed or nospace, while the VM will keep running. This can only be specified if the block device has rerror=stop and werror=stop or enospc. 'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others. In all cases, even for 'report', the I/O error is reported as a QMP event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR. It is possible that while stopping the VM a BLOCK_IO_ERROR event will be reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa. This is not really avoidable since stopping the VM completes all pending I/O requests. In fact, it is already possible now that a series of BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop calls bdrv_drain_all and this can generate further errors. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-09-28 19:40:56 +02:00
Paolo Bonzini	92aa5c6d77	iostatus: move BlockdevOnError declaration to QAPI This will let block-stream reuse the enum. Places that used the enums are renamed accordingly. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-09-28 19:40:26 +02:00
Paolo Bonzini	ff06f5f351	iostatus: rename BlockErrorAction, BlockQMPEventAction We want to remove knowledge of BLOCK_ERR_STOP_ENOSPC from drivers; drivers should only be told whether to stop/report/ignore the error. On the other hand, we want to keep using the nicer BlockErrorAction name in the drivers. So rename the enums, while leaving aside the names of the enum values for now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-09-28 19:14:32 +02:00
Paolo Bonzini	2f0c9fe64c	block: move job APIs to separate files Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-09-28 19:14:26 +02:00
Paolo Bonzini	7e03a9342f	block: fix documentation of block_job_cancel_sync Do this in a separate commit before we move the functions to blockjob.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-09-28 19:08:27 +02:00
Jeff Cody	747ff60263	block: add live block commit functionality This adds the live commit coroutine. This iteration focuses on the commit only below the active layer, and not the active layer itself. The behaviour is similar to block streaming; the sectors are walked through, and anything that exists above 'base' is committed back down into base. At the end, intermediate images are deleted, and the chain stitched together. Images are restored to their original open flags upon completion. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-09-28 18:23:12 +02:00
Jeff Cody	dc1c13d969	block: remove keep_read_only flag from BlockDriverState struct The keep_read_only flag is no longer used, in favor of the bdrv flag BDRV_O_ALLOW_RDWR. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-09-24 15:15:13 +02:00
Jeff Cody	e971aa1273	block: Framework for reopening files safely This is based on Supriya Kannery's bdrv_reopen() patch series. This provides a transactional method to reopen multiple images files safely. Image files are queue for reopen via bdrv_reopen_queue(), and the reopen occurs when bdrv_reopen_multiple() is called. Changes are staged in bdrv_reopen_prepare() and in the equivalent driver level functions. If any of the staged images fails a prepare, then all of the images left untouched, and the staged changes for each image abandoned. Block drivers are passed a reopen state structure, that contains: * BDS to reopen * flags for the reopen * opaque pointer for any driver-specific data that needs to be persistent from _prepare to _commit/_abort * reopen queue pointer, if the driver needs to queue additional BDS for a reopen Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-09-24 15:15:11 +02:00
Luiz Capitulino	9aeaddff26	block: block_int: include qerror.h Several block/ files are relying on qerror.h being provided by qapi-types.h. Fix this, as a future commit will change qapi-types.h not to provide qerror.h. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>	2012-08-13 13:20:50 -03:00
Stefan Hajnoczi	bfe8043e92	qcow2: implement lazy refcounts Lazy refcounts is a performance optimization for qcow2 that postpones refcount metadata updates and instead marks the image dirty. In the case of crash or power failure the image will be left in a dirty state and repaired next time it is opened. Reducing metadata I/O is important for cache=writethrough and cache=directsync because these modes guarantee that data is on disk after each write (hence we cannot take advantage of caching updates in RAM). Refcount metadata is not needed for guest->file block address translation and therefore does not need to be on-disk at the time of write completion - this is the motivation behind the lazy refcount optimization. The lazy refcount optimization must be enabled at image creation time: qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough Update qemu-iotests 031 and 036 since the extension header size changes when we add feature bit table entries. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-06 22:39:14 +02:00
Markus Armbruster	2b584959ed	block: Geometry and translation hints are now useless, purge them There are two producers of these hints: drive_init() on behalf of -drive, and hd_geometry_guess(). The only consumer of the hint is hd_geometry_guess(). The callers of hd_geometry_guess() call it only when drive_init() didn't set the hints. Therefore, drive_init()'s hints are never used. Thus, hd_geometry_guess() only ever sees hints it produced itself in a prior call. Only the first call computes something, subsequent calls just repeat the first call's results. However, hd_geometry_guess() is never called more than once: the device models don't, and the block device is destroyed on unplug. Thus, dropping the repeat feature doesn't break anything now. If a block device wasn't destroyed on unplug and could be reused with a new device, then repeating old results would be wrong. Thus, dropping the repeat feature prevents future breakage. This renders the hints unused. Purge them from the block layer. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-17 16:48:31 +02:00
Kevin Wolf	4534ff5426	qemu-img check -r for repairing images The QED block driver already provides the functionality to not only detect inconsistencies in images, but also fix them. However, this functionality cannot be manually invoked with qemu-img, but the check happens only automatically during bdrv_open(). This adds a -r switch to qemu-img check that allows manual invocation of an image repair. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Jim Meyering	eba25057b9	block: prevent snapshot mode $TMPDIR symlink attack In snapshot mode, bdrv_open creates an empty temporary file without checking for mkstemp or close failure, and ignoring the possibility of a buffer overrun given a surprisingly long $TMPDIR. Change the get_tmp_filename function to return int (not void), so that it can inform its two callers of those failures. Also avoid the risk of buffer overrun and do not ignore mkstemp or close failure. Update both callers (in block.c and vvfat.c) to propagate temp-file-creation failure to their callers. get_tmp_filename creates and closes an empty file, while its callers later open that presumed-existing file with O_CREAT. The problem was that a malicious user could provoke mkstemp failure and race to create a symlink with the selected temporary file name, thus causing the qemu process (usually root owned) to open through the symlink, overwriting an attacker-chosen file. This addresses CVE-2012-2652. http://bugzilla.redhat.com/CVE-2012-2652 Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2012-05-30 14:48:40 +08:00
Paolo Bonzini	fa4478d5c8	block: wait for job callback in block_job_cancel_sync The limitation on not having I/O after cancellation cannot really be kept. Even streaming has a very small race window where you could cancel a job and have it report completion. If this window is hit, bdrv_change_backing_file() will yield and possibly cause accesses to dangling pointers etc. So, let's just assume that we cannot know exactly what will happen after the coroutine has set busy to false. We can set a very lax condition: - if we cancel the job, the coroutine won't set it to false again (and hence will not call co_sleep_ns again). - block_job_cancel_sync will wait for the coroutine to exit, which pretty much ensures no race. Instead, we track the coroutine that executes the job and put very strict conditions on what to do while it is quiescent (busy = false). First of all, the coroutine must never set busy = false while the job has been cancelled. Second, the coroutine can be reentered arbitrarily while it is quiescent, so you cannot really do anything but co_sleep_ns at that time. This condition is obeyed by the block_job_sleep_ns function. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 10:32:12 +02:00
Paolo Bonzini	4513eafe92	block: add block_job_sleep_ns This function abstracts the pretty complex semantics of the "busy" member of BlockJob. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 10:32:12 +02:00
Paolo Bonzini	e023b2e244	block: fix snapshot on QED QED's opaque data includes a pointer back to the BlockDriverState. This breaks when bdrv_append shuffles data between bs_new and bs_top. To avoid this, add a "rebind" function that tells the driver about the new relationship between the BlockDriverState and its opaque. The patch also adds rebind to VVFAT for completeness, even though it is not used with live snapshots. Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 10:32:12 +02:00
Stefan Hajnoczi	c83c66c3b5	block: add 'speed' optional parameter to block-stream Allow streaming operations to be started with an initial speed limit. This eliminates the window of time between starting streaming and issuing block-job-set-speed. Users should use the new optional 'speed' parameter instead so that speed limits are in effect immediately when the job starts. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2012-04-27 11:44:50 -03:00
Stefan Hajnoczi	882ec7ce53	block: change block-job-set-speed argument from 'value' to 'speed' Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2012-04-27 11:44:50 -03:00
Stefan Hajnoczi	9e6636c72d	block: use Error mechanism instead of -errno for block_job_set_speed() There are at least two different errors that can occur in block_job_set_speed(): the job might not support setting speeds or the value might be invalid. Use the Error mechanism to report the error where it occurs. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2012-04-27 11:44:50 -03:00
Stefan Hajnoczi	fd7f8c6537	block: use Error mechanism instead of -errno for block_job_create() The block job API uses -errno return values internally and we convert these to Error in the QMP functions. This is ugly because the Error should be created at the point where we still have all the relevant information. More importantly, it is hard to add new error cases to this case since we quickly run out of -errno values without losing information. Go ahead and use Error directly and don't convert later. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2012-04-27 11:44:50 -03:00
Kevin Wolf	6744cbab8c	qcow2: Version 3 images This adds the basic infrastructure to qcow2 to handle version 3 images. It includes code to create v3 images, allow header updates for v3 images and checks feature bits. It still misses support for zero clusters, so this is not a fully compliant implementation of v3 yet. The default for creating new images stays at v2 for now. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:29 +02:00
Paolo Bonzini	dc534f8fc0	block: document job API I am not sure that these are really proper GtkDoc, but they follow the existing documentation in block_int.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	3e914655f2	block: fix streaming/closing race Streaming can issue I/O while qcow2_close is running. This causes the L2 caches to become very confused or, alternatively, could cause a segfault when the streaming coroutine is reentered after closing its block device. The fix is to cancel streaming jobs when closing their underlying device. The cancellation must be synchronous, on the other hand qemu_aio_wait will not restart a coroutine that is sleeping in co_sleep. So add a flag saying whether streaming has in-flight I/O. If the busy flag is false, the coroutine is quiescent and, when cancelled, will not issue any new I/O. This protects streaming against closing, but not against deleting. We have a reference count protecting us against concurrent deletion, but I still added an assertion to ensure nothing bad happens. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	85e8dab1ef	aio: move BlockDriverAIOCB to qemu-aio.h And remove several block_int.h inclusions that should not be there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:39 +02:00
Jeff Cody	8802d1fdd4	qapi: Introduce blockdev-group-snapshot-sync command This is a QAPI/QMP only command to take a snapshot of a group of devices. This is similar to the blockdev-snapshot-sync command, except blockdev-group-snapshot-sync accepts a list devices, filenames, and formats. It is attempted to keep the snapshot of the group atomic; if the creation or open of any of the new snapshots fails, then all of the new snapshots are abandoned, and the name of the snapshot image that failed is returned. The failure case should not interrupt any operations. Rather than use bdrv_close() along with a subsequent bdrv_open() to perform the pivot, the original image is never closed and the new image is placed 'in front' of the original image via manipulation of the BlockDriverState fields. Thus, once the new snapshot image has been successfully created, there are no more failure points before pivoting to the new snapshot. This allows the group of disks to remain consistent with each other, even across snapshot failures. Signed-off-by: Jeff Cody <jcody@redhat.com> Acked-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-29 15:48:33 +01:00
Paolo Bonzini	b6a127a156	block: drop aio_multiwrite in BlockDriver These were never used. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-29 12:48:47 +01:00
Paolo Bonzini	56116a1469	block: remove unused fields in BlockDriverState sync_aiocb is unused since commit `ce1a14d` (Dynamically allocate AIO Completion Blocks., 2006-08-07). private is unused since commit `56a1493` (drive cleanup fixes., 2009-09-25). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-29 12:48:47 +01:00
Luiz Capitulino	f36f394952	block: bdrv_eject(): Make eject_flag a real bool Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com>	2012-02-22 17:23:05 -02:00
Stefan Hajnoczi	f08f2ddae0	block: add .bdrv_co_write_zeroes() interface The ability to zero regions of an image file is a useful primitive for higher-level features such as image streaming or zero write detection. Image formats may support an optimized metadata representation instead of writing zeroes into the image file. This allows zero writes to be potentially faster than regular write operations and also preserve sparseness of the image file. The .bdrv_co_write_zeroes() interface should be implemented by block drivers that wish to provide efficient zeroing. Note that this operation is different from the discard operation, which may leave the contents of the region indeterminate. That means discarded blocks are not guaranteed to contain zeroes and may contain junk data instead. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:50 +01:00
Marcelo Tosatti	c8c3080f4a	block: add support for partial streaming Add support for streaming data from an intermediate section of the image chain (see patch and documentation for details). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 14:49:18 +01:00
Stefan Hajnoczi	4f1043b4ff	block: add image streaming block job Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 11:45:26 +01:00
Stefan Hajnoczi	eeec61f291	block: add BlockJob interface for long-running operations Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 11:45:26 +01:00
Stefan Hajnoczi	470c05047a	block: make copy-on-read a per-request flag Previously copy-on-read could only be enabled for all requests to a block device. This means requests coming from the guest as well as QEMU's internal requests would perform copy-on-read when enabled. For image streaming we want to support finer-grained behavior than just populating the image file from its backing image. Image streaming supports partial streaming where a common backing image is preserved. In this case guest requests should not perform copy-on-read because they would indiscriminately copy data which should be left in a backing image from the backing chain. Introduce a per-request flag for copy-on-read so that a block device can process both regular and copy-on-read requests. Overlapping reads and writes still need to be serialized for correctness when copy-on-read is happening, so add an in-flight reference count to track this. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 11:45:26 +01:00
Stefan Hajnoczi	53fec9d3fd	block: add interface to toggle copy-on-read The bdrv_enable_copy_on_read()/bdrv_disable_copy_on_read() functions can be used to programmatically enable or disable copy-on-read for a block device. Later patches add the actual copy-on-read logic. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-05 14:51:38 +01:00
Stefan Hajnoczi	dbffbdcfff	block: add request tracking The block layer does not know about pending requests. This information is necessary for copy-on-read since overlapping requests must be serialized to prevent races that corrupt the image. The BlockDriverState gets a new tracked_request list field which contains all pending requests. Each request is a BdrvTrackedRequest record with sector_num, nb_sectors, and is_write fields. Note that request tracking is always enabled but hopefully this extra work is so small that it doesn't justify adding an enable/disable flag. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-05 14:51:38 +01:00
Stefan Hajnoczi	6aebab140d	block: drop .bdrv_is_allocated() interface Now that all block drivers have been converted to .bdrv_co_is_allocated() we can drop .bdrv_is_allocated(). Note that the public bdrv_is_allocated() interface is still available but is in fact a synchronous wrapper around .bdrv_co_is_allocated(). Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-05 14:51:37 +01:00
Stefan Hajnoczi	376ae3f1cb	block: add .bdrv_co_is_allocated() This patch adds the .bdrv_co_is_allocated() interface which is identical to .bdrv_is_allocated() but runs in coroutine context. Running in coroutine context implies that other coroutines might be performing I/O at the same time. Therefore it must be safe to run while the following BlockDriver functions are in-flight: .bdrv_co_readv() .bdrv_co_writev() .bdrv_co_flush() .bdrv_co_is_allocated() The new .bdrv_co_is_allocated() interface is useful because it can be used when a VM is running, whereas .bdrv_is_allocated() is a synchronous interface that does not cope with parallel requests. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-05 14:51:36 +01:00
Zhi Yong Wu	98f90dba5e	block: add I/O throttling algorithm Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-05 14:51:35 +01:00
Zhi Yong Wu	0563e19151	block: add the blockio limits command line support Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-05 14:51:35 +01:00
Anthony Liguori	0f15423c32	block: allow migration to work with image files (v3) Image files have two types of data: immutable data that describes things like image size, backing files, etc. and mutable data that includes offset and reference count tables. Today, image formats aggressively cache mutable data to improve performance. In some cases, this happens before a guest even starts. When dealing with live migration, since a file is open on two machines, the caching of meta data can lead to data corruption. This patch addresses this by introducing a mechanism to invalidate any cached mutable data a block driver may have which is then used by the live migration code. NB, this still requires coherent shared storage. Addressing migration without coherent shared storage (i.e. NFS) requires additional work. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2011-11-21 14:58:48 -06:00
Kevin Wolf	eb489bb1ec	block: Introduce bdrv_co_flush_to_os qcow2 has a writeback metadata cache, so flushing a qcow2 image actually consists of writing back that cache to the protocol and only then flushes the protocol in order to get everything stable on disk. This introduces a separate bdrv_co_flush_to_os to reflect the split. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-11-11 14:02:59 +01:00
Kevin Wolf	c68b89acd6	block: Rename bdrv_co_flush to bdrv_co_flush_to_disk There are two different types of flush that you can do: Flushing one level up to the OS (i.e. writing data to the host page cache) or flushing it all the way down to the disk. The existing functions flush to the disk, reflect this in the function name. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-11-11 14:02:59 +01:00
Luiz Capitulino	b202381800	qapi: Convert query-block Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2011-10-27 11:48:47 -02:00
Luiz Capitulino	d6bf279e7a	block: iostatus: Drop BDRV_IOS_INVAL A future commit will convert bdrv_info() to the QAPI and it won't provide IOS_INVAL. Luckily all we have to do is to add a new 'iostatus_enabled' member to BlockDriverState and use it instead. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2011-10-27 11:48:47 -02:00
Paolo Bonzini	6db39ae2e2	block: change discard to co_discard Since coroutine operation is now mandatory, convert both bdrv_discard implementations to coroutines. For qcow2, this means taking the lock around the operation. raw-posix remains synchronous. The bdrv_discard callback is then unused and can be eliminated. Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-10-21 17:34:14 +02:00

1 2 3

147 Commits