mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Jim Meyering	a7e47d4bfc	sheepdog: don't leak socket file descriptor upon connection failure Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2012-08-22 10:47:14 -05:00
Paolo Bonzini	1bd075f29e	iscsi: fix races between task completion and abort This patch fixes two main issues with block/iscsi.c: 1) iscsi_task_mgmt_abort_task_async calls iscsi_scsi_task_cancel which was also directly called in iscsi_aio_cancel 2) a race between task completion and task abortion could happen cause the scsi_free_scsi_task were done before iscsi_schedule_bh has finished. To fix this, all the freeing of IscsiTasks and releasing of the AIOCBs is centralized in iscsi_bh_cb, independent of whether the SCSI command has completed or was cancelled. 3) iscsi_aio_cancel was not synchronously waiting for the end of the command. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-08-20 15:58:47 +02:00
Paolo Bonzini	cfb3f5064a	iscsi: simplify iscsi_schedule_bh It is always used with the same callback, remove the argument. And its return value is never used, assume allocation succeeds. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-08-20 15:58:47 +02:00
Paolo Bonzini	27cbd828c6	iscsi: move iscsi_schedule_bh and iscsi_readv_writev_bh_cb Put these functions at the beginning, to avoid forward references in the next patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-08-20 15:58:47 +02:00
Paolo Bonzini	b209091957	Revert "iscsi: Fix NULL dereferences / races between task completion and abort" This reverts commit `64e69e8092`. The commit returned immediately from iscsi_aio_cancel, risking corruption in case the following happens: guest qemu target ========================================================================= send write 1 --------> send write 1 --------> cancel write 1 ------> cancel write 1 ------> <------------------ cancellation processed send write 2 --------> send write 2 --------> <---------------- completed write 2 <------------------ completed write 2 <---------------- completed write 1 <---------------- cancellation not done Here, the guest would see write 2 superseding write 1, when in fact the outcome could have been the opposite. The right behavior is to return only after the target says whether the cancellation was done or not, and it will be implemented by the next three patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-08-20 15:50:45 +02:00
Kevin Wolf	65bd155c73	vmdk: Read footer for streamOptimized images The footer takes precedence over the header when it exists. It contains the real grain directory offset that is missing in the header. Without this patch, streamOptimized images with a footer cannot be read. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2012-08-17 13:27:02 +02:00
Kevin Wolf	7a736bfa4e	vmdk: Fix header structure Commit `bb45ded9` swapped gd_offset and rgd_offset. This is wrong. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-17 11:14:19 +02:00
Stefan Priebe	64e69e8092	iscsi: Fix NULL dereferences / races between task completion and abort Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> Acked-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-15 13:16:22 +02:00
Corey Bryant	2e1e79dae7	block: Convert close calls to qemu_close This patch converts all block layer close calls, that correspond to qemu_open calls, to qemu_close. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-15 10:48:57 +02:00
Corey Bryant	6165f4d85d	block: Convert open calls to qemu_open This patch converts all block layer open calls to qemu_open. Note that this adds the O_CLOEXEC flag to the changed open paths when the O_CLOEXEC macro is defined. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-15 10:48:57 +02:00
Corey Bryant	e174082835	block: Prevent detection of /dev/fdset/ as floppy Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-15 10:48:57 +02:00
Anthony Liguori	53810bab3a	Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: qemu-iotests: skip 039 with ./check -nocache block: add BLOCK_O_CHECK for qemu-img check qcow2: mark image clean after repair succeeds qed: mark image clean after repair succeeds blockdev: flip default cache mode from writethrough to writeback virtio-blk: disable write cache if not negotiated virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE qemu-iotests: Save some sed processes ahci: Fix sglist memleak in ahci_dma_rw_buf() ahci: Fix ahci cdrom read corruptions for reads > 128k virtio-blk: fix use-after-free while handling scsi commands	2012-08-11 19:48:50 -05:00
Anthony Liguori	312942619a	Merge remote-tracking branch 'bonzini/scsi-next' into staging * bonzini/scsi-next: scsi-disk: add support for the UNMAP command scsi-disk: improve out-of-range LBA detection for WRITE SAME scsi-disk: more assertions and resets for aiocb virtio-scsi: do not compare 32-bit QEMU tags against 64-bit virtio-scsi tags iscsi: Pick default initiator-name based on the name of the VM iscsi: reorganize code for parse_initiator_name iscsi: do not leak initiator_name	2012-08-11 17:11:23 -05:00
Stefan Hajnoczi	058f8f16db	block: add BLOCK_O_CHECK for qemu-img check Image formats with a dirty bit, like qed and qcow2, repair dirty image files upon open with BDRV_O_RDWR. Performing automatic repair when qemu-img check runs is not ideal because the bdrv_open() call repairs the image before the actual bdrv_check() call from qemu-img.c. Fix this "double repair" since it leads to confusing output from qemu-img check. Tell the block driver that this image is being opened just for bdrv_check(). This skips automatic repair and qemu-img.c can invoke it manually with bdrv_check(). Update the golden output for qemu-iotests 039 to reflect the new qemu-img check output. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-10 10:25:12 +02:00
Stefan Hajnoczi	acbe59829e	qcow2: mark image clean after repair succeeds The dirty bit is cleared after image repair succeeds in qcow2_open(). Move this into qcow2_check() so that all callers benefit from this behavior when fix mode is enabled. This is necessary so qemu-img check can call .bdrv_check() and mark the image clean. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-10 10:25:12 +02:00
Stefan Hajnoczi	b10170aca0	qed: mark image clean after repair succeeds The dirty bit is cleared after image repair succeeds in qed_open(). Move this into qed_check() so that all callers benefit from this behavior when fix=true. This is necessary so qemu-img check can call .bdrv_check() and mark the image clean. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-10 10:25:12 +02:00
Ronnie Sahlberg	31459f463a	iscsi: Pick default initiator-name based on the name of the VM This patch updates the iscsi layer to automatically pick a 'unique' initiator-name based on the name of the vm in case the user has not set an explicit iqn-name to use. Create a new function qemu_get_vm_name() that returns the name of the VM, if specified. This way we can thus create default names to use as the initiator name based on the guest session. If the VM is not named via the '-name' command line argument, the iscsi initiator-name used wiull simply be iqn.2008-11.org.linux-kvm If a name for the VM was specified with the '-name' option, iscsi will use a default initiatorname of iqn.2008-11.org.linux-kvm:<name> These names are just the default iscsi initiator name that qemu will generate/use only when the user has not set an explicit initiator name via the commandlines or config files. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>	2012-08-09 15:04:09 +02:00
Paolo Bonzini	f2ef4a6dd9	iscsi: reorganize code for parse_initiator_name Merge the occurrences of the "iqn.2008-11.org.linux-kvm" string to avoid duplication. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-08-08 14:51:59 +02:00
Paolo Bonzini	b93c94f7ec	iscsi: do not leak initiator_name The argument of iscsi_create_context is never freed by libiscsi, which in fact calls strdup on it. Avoid a leak. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-08-08 14:51:59 +02:00
Stefan Hajnoczi	bfe8043e92	qcow2: implement lazy refcounts Lazy refcounts is a performance optimization for qcow2 that postpones refcount metadata updates and instead marks the image dirty. In the case of crash or power failure the image will be left in a dirty state and repaired next time it is opened. Reducing metadata I/O is important for cache=writethrough and cache=directsync because these modes guarantee that data is on disk after each write (hence we cannot take advantage of caching updates in RAM). Refcount metadata is not needed for guest->file block address translation and therefore does not need to be on-disk at the time of write completion - this is the motivation behind the lazy refcount optimization. The lazy refcount optimization must be enabled at image creation time: qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough Update qemu-iotests 031 and 036 since the extension header size changes when we add feature bit table entries. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-06 22:39:14 +02:00
Stefan Hajnoczi	c61d0004bc	qcow2: introduce dirty bit This patch adds an incompatible feature bit to mark images that have not been closed cleanly. When a dirty image file is opened a consistency check and repair is performed. Update qemu-iotests 031 and 036 since the extension header size changes when we add feature bit table entries. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-08-06 22:39:14 +02:00
Markus Armbruster	4480e0f924	vvfat: Do not clobber the user's geometry vvfat creates a virtual VFAT filesystem with a certain logical geometry that depends on its options. It sets the "geometry hint" to this geometry. It is the only block driver to do this. The geometry hint is about about physical geometry, and used only by certain hard disk device models. vvfat's hint is normally invisible for device models, because bdrv_open() puts a raw format on top of vvfat's fat protocol. That raw format is where drive_init() puts the user's geometry (if any), and where the device model gets it from. Nobody complained, because the default physical geometry is the same as vvfat's logical geometry: opts LCHS def. PCHS 1024,16,63 same :32: 1024,16,63 same :16: 1024,16,63 same :12: 64,16,63 same Except when you specify :floppy: opts LCHS def. PCHS :floppy: 80, 2,36 5,16,63 :32:floppy: 80, 2,36 5,16,63 :16:floppy: 80, 2,36 5,16,63 :12:floppy: 80, 2,18 2,16,63 Silly thing to do for use with a hard disk. However, the "raw" format can be suppressed by adding an redundant-looking "format=vvfat" to "file=fat:FOO". Then, vvfat's hint clobbers the user's geometry, i.e. -drive options cyls, heads, secs get silently ignored. Don't do that. No change without format=vvfat. With it, the user's hard disk geometry (-drive options cyls, heads, secs) is now obeyed, and the default hard disk geometry with :floppy: now matches the one without format=vvfat. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-17 16:48:30 +02:00
Markus Armbruster	f91cbefe2d	vvfat: Fix partition table Unless parameter ":floppy:" is given, vvfat creates a virtual image with DOS MBR defining a single partition which holds the FAT file system. The size of the virtual image depends on the width of the FAT: 32 MiB (CHS 64, 16, 63) for 12 bit FAT, 504 MiB (CHS 1024, 16, 63) for 16 and 32 bit FAT, leaving (6416-1)63 = 64449 and (102416-1)64 = 1032129 sectors for the partition. However, it screws up the end of the partition in the MBR: FAT width param. start CHS end CHS start LBA size :32: 0,1,1 1023,14,63 63 1032065 :16: 0,1,1 1023,14,55 63 1032057 :12: 0,1,1 63,14,55 63 64377 The actual FAT file system nevertheless assumes the partition has 1032129 or 64449 sectors. Oops. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-17 16:48:30 +02:00
Christoph Hellwig	19db9b9042	sheepdog: do not blindly memset all read buffers Only buffers that map to unallocated blocks need to be zeroed. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-17 16:48:29 +02:00
MORITA Kazutaka	cddd4ac7a2	sheepdog: always use coroutine-based network functions This reduces some code duplication. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-17 16:48:29 +02:00
Anthony Liguori	23797df3d9	Merge remote-tracking branch 'mjt/mjt-iov2' into staging * mjt/mjt-iov2: rewrite iov_send_recv() and move it to iov.c cleanup qemu_co_sendv(), qemu_co_recvv() and friends export iov_send_recv() and use it in iov_send() and iov_recv() rename qemu_sendv to iov_send, change proto and move declarations to iov.h change qemu_iovec_to_buf() to match other to,from_buf functions consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent allow qemu_iovec_from_buffer() to specify offset from which to start copying consolidate qemu_iovec_memset{,_skip}() into single function and use existing iov_memset() rewrite iov_* functions change iov_* function prototypes to be more appropriate virtio-serial-bus: use correct lengths in control_out() message Conflicts: tests/Makefile Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2012-07-09 12:35:06 -05:00
Anthony Liguori	715cc00ce1	Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: (24 commits) block: Factor bdrv_read_unthrottled() out of guess_disk_lchs() qtest: Tidy up temporary files properly fdc: Drop broken code for user-defined floppy geometry fdc_test: introduce test_sense_interrupt fdc_test: update media_change test fdc: fix interrupt handling fdc: rewrite seek and DSKCHG bit handling block: introduce bdrv_swap, implement bdrv_append on top of it block: copy over job and dirty bitmap fields in bdrv_append raw: hook into blkdebug blkdebug: optionally tie errors to a specific sector blkdebug: store list of active rules blkdebug: pass getlength to underlying file blkdebug: tiny cleanup blkdebug: remove sync i/o events sheepdog: traverse pending_list from the first for each time sheepdog: split outstanding list into inflight and pending sheepdog: make sure we don't free aiocb before sending all requests sheepdog: use coroutine based socket functions in coroutine context sheepdog: restart I/O when socket becomes ready in do_co_req() ...	2012-07-09 10:29:40 -05:00
Paolo Bonzini	5c171afa4c	raw: hook into blkdebug Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:02 +02:00
Paolo Bonzini	e4780db429	blkdebug: optionally tie errors to a specific sector This makes blkdebug scripts more powerful, and independent of the exact sequence of operations performed by streaming. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:02 +02:00
Paolo Bonzini	571cd43e57	blkdebug: store list of active rules This prepares for the next patch, where some active rules may actually not trigger depending on input to readv/writev. Store the active rules in a SIMPLEQ (so that it can be emptied easily with QSIMPLEQ_INIT), and fetch the errno/once/immediately arguments from there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:02 +02:00
Paolo Bonzini	e130225587	blkdebug: pass getlength to underlying file This is required when using blkdebug with raw format. Unlike qcow2/QED, raw asks blkdebug for the length of the file, it doesn't get it from a header. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:02 +02:00
Paolo Bonzini	368e8dd10a	blkdebug: tiny cleanup Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:02 +02:00
Paolo Bonzini	820100fd15	blkdebug: remove sync i/o events These are unused, except (by mistake more or less) in QED. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:02 +02:00
MORITA Kazutaka	7dc1cde05b	sheepdog: traverse pending_list from the first for each time The pending list can be modified in other coroutine context sd_co_rw_vector, so we need to traverse the list from the first again after we send the pending request. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:02 +02:00
MORITA Kazutaka	c292ee6a67	sheepdog: split outstanding list into inflight and pending outstanding_list_head is used for both pending and inflight requests. This patch splits it and improves readability. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:02 +02:00
MORITA Kazutaka	1d732d7d7c	sheepdog: make sure we don't free aiocb before sending all requests This patch increments the pending counter before sending requests, and make sures that aiocb is not freed while sending them. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:01 +02:00
MORITA Kazutaka	b97564f4c5	sheepdog: use coroutine based socket functions in coroutine context This removes blocking network I/Os in coroutine context. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:01 +02:00
MORITA Kazutaka	2dfcca3b68	sheepdog: restart I/O when socket becomes ready in do_co_req() Currently, no one reenters the yielded coroutine. This fixes it. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:01 +02:00
MORITA Kazutaka	1b6ac9985a	sheepdog: fix dprintf format strings This fixes warnings about dprintf format in debug mode. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:01 +02:00
Stefan Hajnoczi	206e6d8551	qcow2: preserve free_byte_offset when qcow2_alloc_bytes() fails When qcow2_alloc_clusters() error handling code was introduced in commit `5d757b563d`, the value of free_byte_offset was clobbered in the error case. This patch keeps free_byte_offset at 0 so we will try to allocate clusters again next time this function is called. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:01 +02:00
Stefan Hajnoczi	b35278f754	qcow2: fix #ifdef'd qcow2_check_refcounts() callers The DEBUG_ALLOC qcow2.h macro enables additional consistency checks throughout the code. This makes it easier to spot corruptions that are introduced during development. Since consistency check is an expensive operation the DEBUG_ALLOC macro is used to compile checks out in normal builds and qcow2_check_refcounts() calls missed the addition of a new function argument. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-07-09 15:53:01 +02:00
Ronnie Sahlberg	622695a458	ISCSI: force use of sg for SMC and SSC devices If the device we open is a SMC or SSC device, then force the use of sg. We dont have any medium changer or tape emulation so only passthrough via real sg or scsi-generic via iscsi would work anyway. Forcing sg also makes qemu skip trying to read from the device to guess the image format by reading from the device (find_image_format()). SMC devices do not implement READ6/10/12/16 so it is not possible to read from them (SSC have different CDBs). With this patch I can successfully manage a SMC device wiht iscsi in passthrough mode. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> [Added TYPE_TAPE handling - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-07-02 10:18:41 +02:00
Ronnie Sahlberg	983924532f	ISCSI: Add SCSI passthrough via scsi-generic to libiscsi Update iscsi to allow passthrough of SG_IO scsi commands when the iscsi device is forced to be scsi-generic. Implement both bdrv_ioctl() and bdrv_aio_ioctl() in the iscsi backend, emulate the SG_IO ioctl and pass the SCSI commands across to the iscsi target. This allows end-to-end passthrough of SCSI all the way from the guest, to qemu, via scsi-generic, then libiscsi all the way to the iscsi target. To activate this you need to specify that the iscsi lun should be treated as a scsi-generic device. Example: -device lsi -device scsi-generic,drive=MyISCSI \ -drive file=iscsi://10.1.1.125/iqn.ronnie.test/1,if=none,id=MyISCSI Note, you can currently not boot a qemu guest from a scsi device. Note, This only works when the host is linux, since the emulation relies on definitions of SG_IO from the scsi-generic implementation in the linux kernel. It should be fairly easy to re-implement some structures similar enough for non-linux hosts to do the same style of passthrough via a fake scsi generic layer and libiscsi if need be. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-07-02 10:18:41 +02:00
Kevin Wolf	94282e7146	raw-posix: Fix build without is_allocated support Move the declaration of s into the #ifdef sections that actually make use of it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-06-24 01:04:45 +02:00
Stefan Hajnoczi	af7b708db2	qcow2: fix autoclear image header update The autoclear feature bits can be used for qcow2 file format features that are safe to "drop" by old programs that do not understand the feature. Upon opening the image file unknown autoclear feature bits are cleared and the image file header is rewritten, but this was happening too early in the code when critical header fields were not yet loaded. Process autoclear feature bits after all necessary header information has been loaded. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:43 +02:00
Kevin Wolf	b7ab0fea37	qcow2: Fix avail_sectors in cluster allocation code avail_sectors should really be the number of sectors from the start of the allocation, not from the start of the write request. We're lucky enough that this mistake didn't cause any real bug. avail_sectors is only used in the intialiser of QCowL2Meta: .nb_available = MIN(requested_sectors, avail_sectors), m->nb_available in turn is only used for COW at the end of the allocation. A COW occurs only if the request wasn't cluster aligned, which in turn would imply that requested_sectors was less than avail_sectors (both in the original and in the fixed version). In this case avail_sectors is ignored and therefore the mistake doesn't cause any misbehaviour. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:43 +02:00
Kevin Wolf	cdba7fee1d	qcow2: Simplify calculation for COW area at the end copy_sectors() always uses the sum (cluster_offset + n_start) or (start_sect + n_start), so if some value is added to both cluster_offset and start_sect, and subtracted from n_start, it's cancelled out anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:43 +02:00
Paolo Bonzini	6af4e9ead4	qcow2: always operate caches in writeback mode Writethrough does not need special-casing anymore in the qcow2 caches. The block layer adds flushes after every guest-initiated data write, and these will also flush the qcow2 caches to the OS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:43 +02:00
MORITA Kazutaka	e0d93a89b9	sheepdog: add coroutine_fn markers to coroutine functions Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Josh Durgin	b11f38fcdf	rbd: hook up cache options Writeback caching was added in Ceph 0.46, and writethrough will be in 0.47. These are controlled by general config options, so there's no need to check for librbd version. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Kevin Wolf	166acf546f	qcow2: Support for fixing refcount inconsistencies Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Kevin Wolf	ccf34716ee	qemu-img check: Print fixed clusters and recheck When any inconsistencies have been fixed, print the statistics and run another check to make sure everything is correct now. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Kevin Wolf	4534ff5426	qemu-img check -r for repairing images The QED block driver already provides the functionality to not only detect inconsistencies in images, but also fix them. However, this functionality cannot be manually invoked with qemu-img, but the check happens only automatically during bdrv_open(). This adds a -r switch to qemu-img check that allows manual invocation of an image repair. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Paolo Bonzini	6ef228fc0d	stream: move rate limiting to a separate header file Make the code reusable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Paolo Bonzini	188a7bbf94	stream: move is_allocated_above to block.c Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Paolo Bonzini	f9749f28b7	stream: tweak usage of bdrv_co_is_allocated is_allocated_base has complex semantics that are not really usable outside streaming. Split the check in two parts, where the allocated state for the top bs is moved to the caller. The resulting function is more generally useful. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Paolo Bonzini	5500316ded	block: implement is_allocated for raw Either FIEMAP, or SEEK_DATA+SEEK_HOLE can be used to implement the is_allocated callback for raw files. On Linux ext4, btrfs and XFS all support it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Zhi Yong Wu	87267753a3	qcow2: fix endianness conversion Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Zhi Yong Wu	833e40858c	qcow2: remove a line of unnecessary code Commit `3948d1d4` removed the pointer argument we filled in with l2_offset but forgot to remove the unnecessary l2_offset assignment. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:42 +02:00
Kevin Wolf	1417d7e40e	qcow2: Silence false warning Some gcc versions seem not to be able to figure out that the switch statement covers all possible values and that c is therefore always initialised. Add a default branch for them. Reported-by: malc <av1474@comtv.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: malc <av1474@comtv.ru>	2012-06-15 15:52:45 +04:00
Michael Tokarev	2fc8ae1dd7	cleanup qemu_co_sendv(), qemu_co_recvv() and friends The same as for non-coroutine versions in previous patches: rename arguments to be more obvious, change type of arguments from int to size_t where appropriate, and use common code for send and receive paths (with one extra argument) since these are exactly the same. Use common iov_send_recv() directly. qemu_co_sendv(), qemu_co_recvv(), and qemu_co_recv() are now trivial #define's merely adding one extra arg. qemu_co_sendv() and qemu_co_recvv() callers are converted to different argument order and extra `iov_cnt' argument. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2012-06-11 23:12:11 +04:00
Michael Tokarev	d5e6b1619c	change qemu_iovec_to_buf() to match other to,from_buf functions It now allows specifying offset within qiov to start from and amount of bytes to copy. Actual implementation is just a call to iov_to_buf(). Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2012-06-11 23:12:11 +04:00
Michael Tokarev	1b093c480a	consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent qemu_iovec_concat() is currently a wrapper for qemu_iovec_copy(), use the former (with extra "0" arg) in a few places where it is used. Change skip argument of qemu_iovec_copy() from uint64_t to size_t, since size of qiov itself is size_t, so there's no way to skip larger sizes. Rename it to soffset, to make it clear that the offset is applied to src. Also change the only usage of uint64_t in hw/9pfs/virtio-9p.c, in v9fs_init_qiov_from_pdu() - all callers of it actually uses size_t too, not uint64_t. One added restriction: as for all other iovec-related functions, soffset must point inside src. Order of argumens is already good: qemu_iovec_memset(QEMUIOVector qiov, size_t offset, int c, size_t bytes) vs: qemu_iovec_concat(QEMUIOVector dst, QEMUIOVector *src, size_t soffset, size_t sbytes) (note soffset is after _src_ not dst, since it applies to src; for memset it applies to qiov). Note that in many places where this function is used, the previous call is qemu_iovec_reset(), which means many callers actually want copy (replacing dst content), not concat. So we may want to add a wrapper like qemu_iovec_copy() with the same arguments but which calls qemu_iovec_reset() before _concat(). Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2012-06-11 23:12:11 +04:00
Michael Tokarev	03396148bc	allow qemu_iovec_from_buffer() to specify offset from which to start copying Similar to qemu_iovec_memset(QEMUIOVector qiov, size_t offset, int c, size_t bytes); the new prototype is: qemu_iovec_from_buf(QEMUIOVector qiov, size_t offset, const void *buf, size_t bytes); The processing starts at offset bytes within qiov. This way, we may copy a bounce buffer directly to a middle of qiov. This is exactly the same function as iov_from_buf() from iov.c, so use the existing implementation and rename it to qemu_iovec_from_buf() to be shorter and to match the utility function. As with utility implementation, we now assert that the offset is inside actual iovec. Nothing changed for current callers, because `offset' parameter is new. While at it, stop using "bounce-qiov" in block/qcow2.c and copy decrypted data directly from cluster_data instead of recreating a temp qiov for doing that. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2012-06-11 23:12:11 +04:00
Michael Tokarev	3d9b49254f	consolidate qemu_iovec_memset{,_skip}() into single function and use existing iov_memset() This patch combines two functions into one, and replaces the implementation with already existing iov_memset() from iov.c. The new prototype of qemu_iovec_memset(): size_t qemu_iovec_memset(qiov, size_t offset, int fillc, size_t bytes) It is different from former qemu_iovec_memset_skip(), and I want to make other functions to be consistent with it too: first how much to skip, second what, and 3rd how many of it. It also returns actual number of bytes filled in, which may be less than the requested `bytes' if qiov is smaller than offset+bytes, in the same way iov_memset() does. While at it, use utility function iov_memset() from iov.h in posix-aio-compat.c, where qiov was used. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2012-06-11 23:07:44 +04:00
Paolo Bonzini	7456e4ce8d	build: move block/ objects to nested Makefile.objs Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-06-07 09:21:13 +02:00
Jim Meyering	c2d76497b6	block: prevent snapshot mode $TMPDIR symlink attack In snapshot mode, bdrv_open creates an empty temporary file without checking for mkstemp or close failure, and ignoring the possibility of a buffer overrun given a surprisingly long $TMPDIR. Change the get_tmp_filename function to return int (not void), so that it can inform its two callers of those failures. Also avoid the risk of buffer overrun and do not ignore mkstemp or close failure. Update both callers (in block.c and vvfat.c) to propagate temp-file-creation failure to their callers. get_tmp_filename creates and closes an empty file, while its callers later open that presumed-existing file with O_CREAT. The problem was that a malicious user could provoke mkstemp failure and race to create a symlink with the selected temporary file name, thus causing the qemu process (usually root owned) to open through the symlink, overwriting an attacker-chosen file. This addresses CVE-2012-2652. http://bugzilla.redhat.com/CVE-2012-2652 Signed-off-by: Jim Meyering <meyering@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-30 10:18:20 +02:00
MORITA Kazutaka	6f3c714eb7	sheepdog: fix return value of do_load_save_vm_state bdrv_save_vmstate and bdrv_load_vmstate should return the vmstate size on success, and -errno on error. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-30 09:58:39 +02:00
Anthony Liguori	306761537f	Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: fdc-test: introduced qtest no_media_on_start and cmos qtest for floppy fdc: fix media detection fdc: floppy drive should be visible after start without media qemu-iotests: mark 035 qcow2-only qcow2: Check qcow2_alloc_clusters_at() return value sheepdog: use heap instead of stack for BDRVSheepdogState sheepdog: return -errno on error sheepdog: mark image as snapshot when tag is specified qemu-img: Explain how rebase operation can be used to perform a 'diff' operation. qcow2: don't leak buffer for unexpected qcow_version in header	2012-05-29 04:30:49 -05:00
Ronnie Sahlberg	f4dfa67f04	ISCSI: Switch to using READ16/WRITE16 for I/O to the LUN This allows using LUNs bigger than 2TB. Keep using READ10 for other device types such as MMC. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>	2012-05-28 14:04:16 +02:00
Ronnie Sahlberg	6bcd1346bb	ISCSI: Only call READCAPACITY16 for SBC devices, use READCAPACITY10 for MMC Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>	2012-05-28 14:04:15 +02:00
Ronnie Sahlberg	dbfff6d776	ISCSI: get device type at connection time This is needed to avoid READ CAPACITY(16) for MMC devices. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-05-28 14:04:14 +02:00
Paolo Bonzini	c7b4a95202	ISCSI: change num_blocks to 64-bit Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-05-28 14:04:14 +02:00
Ronnie Sahlberg	c9b9f6824f	ISCSI: redo how we set up the events Call qemu_notify_event() after updating events. Otherwise, If we add an event for -is-writeable but the socket is already writeable there may be a delay before the event callback is actually triggered. Those delays would in particular hurt performance during BIOS boot and when the GRUB bootloader reads the kernel and initrd. But first call out to the socket write functions directly, and only set up the write event if the socket is full. This will happen very rarely and this improves performance. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>	2012-05-28 14:04:06 +02:00
Kevin Wolf	df02179189	qcow2: Check qcow2_alloc_clusters_at() return value When using qcow2_alloc_clusters_at(), the cluster allocation code checked the wrong variable for an error code. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-25 18:12:54 +02:00
MORITA Kazutaka	b6fc8245e9	sheepdog: use heap instead of stack for BDRVSheepdogState bdrv_create() is called in coroutine context now, so we cannot use more stack than 1 MB in the function if we use ucontext coroutine. This patch allocates BDRVSheepdogState, whose size is 4 MB, on the heap in sd_create(). Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-25 18:12:54 +02:00
MORITA Kazutaka	cb595887cc	sheepdog: return -errno on error On error, BlockDriver APIs should return -errno instead of -1. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-25 18:12:54 +02:00
MORITA Kazutaka	622b6057be	sheepdog: mark image as snapshot when tag is specified When a snapshot tag is specified in the filename, the opened image is a snapshot. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-25 18:12:54 +02:00
Jim Meyering	b6c147622d	qcow2: don't leak buffer for unexpected qcow_version in header Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-25 18:12:54 +02:00
Kevin Wolf	c44bfe4637	qcow2: Don't ignore failure to clear autoclear flags Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-14 17:02:19 +02:00
Anthony Liguori	04120e3bb0	block: fix warning introduced in `efcc7a23` Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2012-05-10 09:10:42 -05:00
Paolo Bonzini	efcc7a2324	stream: do not copy unallocated sectors from the base Unallocated sectors should really never be accessed by the guest, so there's no need to copy them during the streaming process. If they are read by the guest during streaming, guest-initiated copy-on-read will copy them (we're in the base == NULL case, which enables copy on read). If they are read after we disconnect the image from the base, they will read as zeroes anyway. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 11:01:59 +02:00
Paolo Bonzini	b21d677ee9	stream: fix ratelimiting corner case This fixes inability to make progress in streaming if the quota is set to less than the amount of data that an I/O operation has to write. In this case, limit->dispatched + n will always be above the quota and, due to the "goto retry" to recheck cancellation and allocation, streaming will livelock. This can be reproduced with "block_job_set_speed ide0-hd0 1b". Of course, with this patch the requested limit will not be obeyed. That could be done with another patch that caps is_allocated's n argument by the slice quota. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 11:01:59 +02:00
Paolo Bonzini	f6133def92	stream: pass new base image format to bdrv_change_backing_file When an image is modified to point to the new backing file, the backing file format is set to NULL, which means auto-probe. This is wrong, in fact it is a small security problem. Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 11:01:59 +02:00
Paolo Bonzini	fa4478d5c8	block: wait for job callback in block_job_cancel_sync The limitation on not having I/O after cancellation cannot really be kept. Even streaming has a very small race window where you could cancel a job and have it report completion. If this window is hit, bdrv_change_backing_file() will yield and possibly cause accesses to dangling pointers etc. So, let's just assume that we cannot know exactly what will happen after the coroutine has set busy to false. We can set a very lax condition: - if we cancel the job, the coroutine won't set it to false again (and hence will not call co_sleep_ns again). - block_job_cancel_sync will wait for the coroutine to exit, which pretty much ensures no race. Instead, we track the coroutine that executes the job and put very strict conditions on what to do while it is quiescent (busy = false). First of all, the coroutine must never set busy = false while the job has been cancelled. Second, the coroutine can be reentered arbitrarily while it is quiescent, so you cannot really do anything but co_sleep_ns at that time. This condition is obeyed by the block_job_sleep_ns function. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 10:32:12 +02:00
Paolo Bonzini	4513eafe92	block: add block_job_sleep_ns This function abstracts the pretty complex semantics of the "busy" member of BlockJob. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 10:32:12 +02:00
Paolo Bonzini	e023b2e244	block: fix snapshot on QED QED's opaque data includes a pointer back to the BlockDriverState. This breaks when bdrv_append shuffles data between bs_new and bs_top. To avoid this, add a "rebind" function that tells the driver about the new relationship between the BlockDriverState and its opaque. The patch also adds rebind to VVFAT for completeness, even though it is not used with live snapshots. Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 10:32:12 +02:00
Paolo Bonzini	469ef350e1	block: update in-memory backing file and format These are needed to print "info block" output correctly. QCOW2 does this because it needs it to write the header, but QED does not, and common code is the right place to do it. Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 10:32:11 +02:00
Paolo Bonzini	5f3777945d	block: push bdrv_change_backing_file error checking up from drivers This check applies to all drivers, but QED lacks it. Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-10 10:32:11 +02:00
Anthony Liguori	7c652c1eaf	Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: fdc: simplify media change handling qcow2: lock on prealloc block: make bdrv_create adopt coroutine qcow2: Limit COW to where it's needed sheepdog: switch to writethrough mode if cluster doesn't support flush	2012-05-08 09:38:41 -05:00
Zhi Yong Wu	15552c4ad3	qcow2: lock on prealloc preallocate() will be locked. This is required because qcow2_alloc_cluster_link_l2() assumes that it runs under a lock that it can drop while COW is being performed. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-07 19:33:18 +02:00
Kevin Wolf	54e6814360	qcow2: Limit COW to where it's needed This fixes a regression introduced in commit `250196f1`. The bug leads to data corruption, found during an Autotest run with a Fedora 8 guest. Consider a write request whose first part is covered by an already allocated cluster, but additional clusters need to be newly allocated. When counting the number of clusters to allocate, the qcow2 code would decide to do COW for all remaining clusters of the write request, even if some of them are already allocated. If during this COW operation another write request is issued that touches the same cluster, it will still refer to the old cluster. When the COW completes, the first request will update the L2 table and the second write request will be lost. Note that the requests need not overlap, it's enough for them to touch the same cluster. This patch ensures that only clusters that really require COW are considered for allocation. In this case any other request writing to the same cluster will be an allocating write and gets serialised. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-07 19:33:18 +02:00
MORITA Kazutaka	115c2b5a68	sheepdog: switch to writethrough mode if cluster doesn't support flush This is necessary for qemu to work with the older version of Sheepdog which doesn't support SD_OP_FLUSH_VDI. Signed-off-by: MORITA Kazutaka <morita.kazutaka@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-07 19:33:18 +02:00
Ronnie Sahlberg	fa6acb0c2f	ISCSI: Add support for thin-provisioning via discard/UNMAP and bigger LUNs Update the configure test for libiscsi support to detect version 1.3 or later. Version 1.3 of libiscsi provides both READCAPACITY16 as well as UNMAP commands. Update the iscsi block layer to use READCAPACITY16 to detect the size of the LUN instead of READCAPACITY10. This allows support for LUNs larger than 2TB. Update to implement bdrv_aio_discard() using the UNMAP command. This allows us to use thin-provisioned LUNs from TGTD and other iSCSI targets that support thin-provisioning. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> [squashed in subsequent patch from Ronnie to fix off-by-one in LBA count] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-05-04 10:39:18 +02:00
Josh Durgin	787f31330e	rbd: add discard support Change the write flag to an operation type in RBDAIOCB, and make the buffer optional since discard doesn't use it. Discard is first included in librbd 0.1.2 (which is in Ceph 0.46). If librbd is too old, leave out qemu_rbd_aio_discard entirely, so the old behavior is preserved. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-02 18:41:42 +02:00
Zhi Yong Wu	647cc47223	qcow2: fix the return value -ENOENT -> -EEXIST Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-02 18:39:39 +02:00
Kevin Wolf	7242411460	qcow2: Don't hold cache references across yield If cache references are held while the coroutine has yielded, the cache may get used up and abort() when it can't find a free entry. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-02 18:39:39 +02:00
Kevin Wolf	60651f901a	qcow2: Remove unused parameter in do_alloc_cluster_offset Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-02 18:39:39 +02:00
Stefan Weil	b9531b6eed	block/qcow2: Add missing GCC_FMT_ATTR to function report_unsupported() Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-05-02 18:39:39 +02:00
Pavel Borzenkov	83affaa622	raw-posix: Do not use CONFIG_COCOA macro Use __APPLE__ and __MACH__ macros instead of CONFIG_COCOA to detect Mac OS X host. The patch is based on Ben Leslie's patch: http://patchwork.ozlabs.org/patch/97859/ Signed-off-by: Ben Leslie <benno@benno.id.au> Signed-off-by: Pavel Borzenkov <pavel.borzenkov@gmail.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Andreas Färber <andreas.faerber@web.de>	2012-05-01 00:16:58 +02:00
Anthony Liguori	a8b69b8e24	Merge remote-tracking branch 'qmp/queue/qmp' into staging * qmp/queue/qmp: qapi: fix qmp_balloon() conversion qemu-iotests: add block-stream speed value test case block: add 'speed' optional parameter to block-stream block: change block-job-set-speed argument from 'value' to 'speed' block: use Error mechanism instead of -errno for block_job_set_speed() block: use Error mechanism instead of -errno for block_job_create()	2012-04-27 12:00:06 -05:00
Stefan Hajnoczi	c83c66c3b5	block: add 'speed' optional parameter to block-stream Allow streaming operations to be started with an initial speed limit. This eliminates the window of time between starting streaming and issuing block-job-set-speed. Users should use the new optional 'speed' parameter instead so that speed limits are in effect immediately when the job starts. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2012-04-27 11:44:50 -03:00
Stefan Hajnoczi	882ec7ce53	block: change block-job-set-speed argument from 'value' to 'speed' Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2012-04-27 11:44:50 -03:00
Stefan Hajnoczi	9e6636c72d	block: use Error mechanism instead of -errno for block_job_set_speed() There are at least two different errors that can occur in block_job_set_speed(): the job might not support setting speeds or the value might be invalid. Use the Error mechanism to report the error where it occurs. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2012-04-27 11:44:50 -03:00
Stefan Hajnoczi	fd7f8c6537	block: use Error mechanism instead of -errno for block_job_create() The block job API uses -errno return values internally and we convert these to Error in the QMP functions. This is ugly because the Error should be created at the point where we still have all the relevant information. More importantly, it is hard to add new error cases to this case since we quickly run out of -errno values without losing information. Go ahead and use Error directly and don't convert later. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2012-04-27 11:44:50 -03:00
Kevin Wolf	b3adf53a3a	nbd: Fix uninitialised use of s->sock s->sock is assigned only afterwards, so we're really registering an aio_fd_handler for file descriptor 0 here. Not exactly what we intended. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-04-26 17:54:22 +02:00
Anthony Liguori	1f8bcac09a	Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: (38 commits) qemu-iotests: Fix test 031 for qcow2 v3 support qemu-iotests: Add -o and make v3 the default for qcow2 qcow2: Zero write support qemu-iotests: Test backing file COW with zero clusters qemu-iotests: add a simple test for write_zeroes qcow2: Support for feature table header extension qcow2: Support reading zero clusters qcow2: Version 3 images qcow2: Ignore reserved bits in check_refcounts qcow2: Ignore reserved bits in refcount table entries qcow2: Simplify count_cow_clusters qcow2: Refactor qcow2_free_any_clusters qcow2: Ignore reserved bits in L1/L2 entries qcow2: Fail write_compressed when overwriting data qcow2: Ignore reserved bits in count_contiguous_clusters() qcow2: Ignore reserved bits in get_cluster_offset qcow2: Save disk size in snapshot header Specification for qcow2 version 3 qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at() iotests: Resolve test failures caused by hostname ...	2012-04-23 14:27:04 -05:00
Kevin Wolf	621f058940	qcow2: Zero write support Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:30 +02:00
Kevin Wolf	cfcc4c62ff	qcow2: Support for feature table header extension Instead of printing an ugly bitmask, qemu can now print a more helpful string even for yet unknown features. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:29 +02:00
Kevin Wolf	6377af48b0	qcow2: Support reading zero clusters This adds support for reading zero clusters in version 3 images. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:29 +02:00
Kevin Wolf	6744cbab8c	qcow2: Version 3 images This adds the basic infrastructure to qcow2 to handle version 3 images. It includes code to create v3 images, allow header updates for v3 images and checks feature bits. It still misses support for zero clusters, so this is not a fully compliant implementation of v3 yet. The default for creating new images stays at v2 for now. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:29 +02:00
Kevin Wolf	afdf0abe77	qcow2: Ignore reserved bits in check_refcounts Also don't infer the cluster type directly from the L2 entries, but use qcow2_get_cluster_type() to keep everything in a single place. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:29 +02:00
Kevin Wolf	76dc9e0c8f	qcow2: Ignore reserved bits in refcount table entries Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:29 +02:00
Kevin Wolf	143550a83e	qcow2: Simplify count_cow_clusters count_cow_clusters() tries to reuse existing functions, and all it achieves is to make things much more complicated than they really are: Everything needs COW, unless it's a normal cluster with refcount 1. This patch implements the obvious way of doing this, and by using qcow2_get_cluster_type() it gets rid of all flag magic. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:28 +02:00
Kevin Wolf	c7a4c37a0f	qcow2: Refactor qcow2_free_any_clusters Zero clusters will add another cluster type. Refactor the open-coded cluster type detection into a switch of QCOW2_CLUSTER_* options so that the detection is in a single place. This makes it easier to add new cluster types. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:28 +02:00
Kevin Wolf	8e37f681d5	qcow2: Ignore reserved bits in L1/L2 entries This changes the still existing places that assume that the only flags are QCOW_OFLAG_COPIED and QCOW_OFLAG_COMPRESSED to properly mask out reserved bits. It does not convert bdrv_check yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:28 +02:00
Kevin Wolf	b0b6862e5e	qcow2: Fail write_compressed when overwriting data qcow2_alloc_compressed_cluster_offset() already fails if the copied flag is set, because qcow2_write_compressed() doesn't perform COW as it would have to do to allow this. However, what we really want to check here is whether the cluster is allocated or not. With internal snapshots the copied flag may not be set on allocated clusters. Check the cluster offset instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:27 +02:00
Kevin Wolf	2bfcc4a0a0	qcow2: Ignore reserved bits in count_contiguous_clusters() Until now, count_contiguous_clusters() has an argument that allowed to specify flags that should be ignored in the comparison, i.e. that are allowed to change between contiguous clusters. This patch changes the function so that it ignores all flags by default now and you need to pass the flags on which it should stop. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:27 +02:00
Kevin Wolf	68d000a390	qcow2: Ignore reserved bits in get_cluster_offset With this change, reading from a qcow2 image ignores all reserved bits that are set in an L1 or L2 table entry. Now get_cluster_offset() assigns *cluster_offset only the offset without any other flags. The cluster type is not longer encoded in the offset, but a positive return value in case of success. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:27 +02:00
Kevin Wolf	90b277593d	qcow2: Save disk size in snapshot header This allows that different snapshots of an image can have different sizes, which is a requirement for enabling image resizing even with images that have internal snapshots. We don't do the actual support for it now, but make sure that the additional field is present and not completely ignored in all version 3 images. When trying to load a snapshot of different size, it returns an error. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:57:27 +02:00
Kevin Wolf	f24423bd90	qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at() Refcount block allocation and refcount table growth rely on s->free_cluster_index pointing to somewhere after the current allocation. Change qcow2_alloc_cluster_at() to fulfill this assumption. Without this change it could happen that a newly allocated refcount block and the allocated data block point to the same area in the image file, causing data corruption in the long run. This fixes a bug that became first visible after commit `250196f1`. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-20 15:56:19 +02:00
Paolo Bonzini	bafbd6a1c6	aio: remove process_queue callback and qemu_aio_process_queue Both unused after the previous patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-19 16:37:53 +02:00
Paolo Bonzini	7fe7b68b32	nbd: do not block in nbd_wr_sync if no data at all is available Right now, nbd_wr_sync will hang if no data at all is available on the socket and the other side is not going to provide any. Relax this by making it loop only for writes or partial reads. This fixes a race where one thread is executing qemu_aio_wait() and another is executing main_loop_wait(). Then, the select() call in main_loop_wait() can return stale data and call the "readable" callback with no data in the socket. Reported-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-04-19 16:36:43 +02:00
Paolo Bonzini	185b43386a	nbd: consistently return negative errno values In the next patch we need to look at the return code of nbd_wr_sync. To avoid percolating the socket_error() ugliness all around, let's handle errors by returning negative errno values. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-04-19 16:36:43 +02:00
Paolo Bonzini	fc19f8a02e	nbd: consistently check for <0 or >=0 This prepares for the following patch, which changes -1 return values to negative errno. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-04-19 16:36:43 +02:00
Paolo Bonzini	dd3e8ac413	nbd: avoid out of bounds access to recv_coroutine array This can happen with a buggy or malicious server. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-04-19 16:36:42 +02:00
Kevin Wolf	2795ecf681	qcow2: Fix return value of alloc_refcount_block Someone forgot something in commit 29c1a730... Documenting the right return value is not enough, you also need to actually return it in the code. This bug sometimes causes error return values even when everything has succeeded: The new offset of the refcount block is truncated to 32 bits and interpreted as signed. At least with small cluster sizes it's easy to get a negative return value this way. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2012-04-19 16:03:27 +02:00
Kevin Wolf	8dc0a5e7a0	qcow2: Fix error handling in qcow2_alloc_cluster_offset If do_alloc_cluster_offset() fails, the error handling code tried to remove the request from the in-flight queue, to which it wasn't added yet, resulting in a NULL pointer dereference. m->nb_clusters really only becomes != 0 when the request is in the list. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-19 16:03:27 +02:00
Stefan Weil	4e35b92a51	block: Fix spelling in comment (ineffcient -> inefficient) Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-19 15:48:52 +02:00
Anthony Liguori	bb5d8dd757	Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: (46 commits) qed: remove incoming live migration blocker qed: honor BDRV_O_INCOMING for incoming live migration migration: clear BDRV_O_INCOMING flags on end of incoming live migration qed: add bdrv_invalidate_cache to be called after incoming live migration blockdev: open images with BDRV_O_INCOMING on incoming live migration block: add a function to clear incoming live migration flags block: Add new BDRV_O_INCOMING flag to notice incoming live migration block stream: close unused files and update ->backing_hd qemu-iotests: Fix call syntax for qemu-io qemu-iotests: Fix call syntax for qemu-img qemu-iotests: Test unknown qcow2 header extensions qemu-iotests: qcow2.py sheepdog: fix send req helpers sheepdog: implement SD_OP_FLUSH_VDI operation block: bdrv_append() fixes qed: track dirty flag status qemu-img: add dirty flag status qed: image fragmentation statistics qemu-img: add image fragmentation statistics block: document job API ...	2012-04-10 08:16:12 -05:00
Benoît Canet	50d30c2675	qed: remove incoming live migration blocker Signed-off-by: Benoit Canet <benoit.canet@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 16:29:12 +02:00
Benoît Canet	2d1f3c2360	qed: honor BDRV_O_INCOMING for incoming live migration From original commit with Patchwork-id: 31108 by Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> "The QED image format includes a file header bit to mark images dirty. QED normally checks dirty images on open and fixes inconsistent metadata. This is undesirable during live migration since the dirty bit may be set if the source host is modifying the image file. The check should be postponed until migration completes. Skip operations that modify the image file if the BDRV_O_INCOMING flag is set." Signed-off-by: Benoit Canet <benoit.canet@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 16:29:04 +02:00
Benoît Canet	c82954e529	qed: add bdrv_invalidate_cache to be called after incoming live migration The QED image is reopened to flush metadata and check consistency. Signed-off-by: Benoit Canet <benoit.canet@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 16:28:27 +02:00
Marcelo Tosatti	5a67a1048e	block stream: close unused files and update ->backing_hd Close the now unused images that were part of the previous backing file chain and adjust ->backing_hd, backing_filename and backing_format properly. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=801449 Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 15:11:37 +02:00
Liu Yuan	eb09218077	sheepdog: fix send req helpers We should return if reading of the header fails. Cc: Kevin Wolf <kwolf@redhat.com> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:41 +02:00
Liu Yuan	47622c44d0	sheepdog: implement SD_OP_FLUSH_VDI operation Flush operation is supposed to flush the write-back cache of sheepdog cluster. By issuing flush operation, we can assure the Guest of data reaching the sheepdog cluster storage. Cc: Kevin Wolf <kwolf@redhat.com> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:41 +02:00
Dong Xu Wang	d68dbee80e	qed: track dirty flag status Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:41 +02:00
Dong Xu Wang	11c9c615c8	qed: image fragmentation statistics Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	9f25eccc1c	block: set job->speed in block_set_speed There is no need to do this in every implementation of set_speed (even though there is only one right now). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	3e914655f2	block: fix streaming/closing race Streaming can issue I/O while qcow2_close is running. This causes the L2 caches to become very confused or, alternatively, could cause a segfault when the streaming coroutine is reentered after closing its block device. The fix is to cancel streaming jobs when closing their underlying device. The cancellation must be synchronous, on the other hand qemu_aio_wait will not restart a coroutine that is sleeping in co_sleep. So add a flag saying whether streaming has in-flight I/O. If the busy flag is false, the coroutine is quiescent and, when cancelled, will not issue any new I/O. This protects streaming against closing, but not against deleting. We have a reference count protecting us against concurrent deletion, but I still added an assertion to ensure nothing bad happens. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	eb9566d13e	vdi: change goto to loop Finally reindent all code and change goto statements to a loop. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	4eea78e634	vdi: do not create useless iovecs Reads and writes to the underlying file can also occur with the simple non-vectored I/O interfaces. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	a7a43aa199	vdi: leave bounce buffering to block layer vdi.c really works as if it implemented bdrv_read and bdrv_write. However, because only vector I/O is supported by the asynchronous callbacks, it went through extra pain to bounce-buffer the I/O. This can be handled by the block layer now that the format is coroutine-based. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	bfc45fc183	vdi: move aiocb fields to locals Most of the AIOCB really holds local variables that need to persist across callback invocation. It can go away now. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	4de659e8eb	vdi: merge aio_read_cb and aio_write_cb into callers Now inline the former AIO callbacks into vdi_co_readv and vdi_co_writev. While many cleanups are possible, the code now really looks synchronous. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	0c7bfc321b	vdi: move end-of-I/O handling at the end The next step is to take code that only triggers after the first operation, and move it at the end of vdi_aio_read_cb and vdi_aio_write_cb. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	3d46a75aa5	vdi: basic conversion to coroutines Even a basic conversion changing the bdrv_aio_readv/bdrv_aio_writev calls to bdrv_co_readv/bdrv_co_writev, and callbacks to goto statements can eliminate a lot of code. This is because error handling is simplified and indirections through bottom halves can go away. After this patch, I/O to the underlying file already happens via coroutines, but the code still looks a lot like if asynchronous I/O was being used. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Zhang Shengju	c088b69136	block/vpc: write checksum back to footer after check After validation check, the 'checksum' is not written back to footer, which leave it with zero. This results in errors while loadding it under Microsoft's Hyper-V environment, and also errors from utilities like Citrix's vhd-util. Signed-off-by: Zhang Shengju <sean_zhang@trendmicro.com.cn> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:40 +02:00
Paolo Bonzini	29cdb2513c	block: push recursive flushing up from drivers Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-04-05 14:54:39 +02:00
Kevin Wolf	3948d1d487	qcow2: Remove unused parameter in get_cluster_table() Since everything goes through the cache, callers don't use the L2 table offset any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-04-05 14:54:39 +02:00
Stefan Weil	fb7c8e8a2d	block/curl: Replace usleep by g_usleep The function usleep is not available for all supported platforms: at least some versions of MinGW don't support it. usleep was also declared obsolete by POSIX.1-2001. The function g_usleep is part of glib2.0, so it is available for all supported platforms. Using nanosleep would also be possible but needs more code. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-04-03 09:34:34 +01:00
Kevin Wolf	250196f19c	qcow2: Reduce number of I/O requests If the first part of a write request is allocated, but the second isn't and it can be allocated so that the resulting area is contiguous, handle it at once. This is a common case for sequential writes. After this patch, alloc_cluster_offset() only checks if the clusters are already allocated or how many new clusters can be allocated contigouosly. The actual cluster allocation is split off into a new function do_alloc_cluster_offset(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-03-12 15:14:07 +01:00
Kevin Wolf	256900b16b	qcow2: Add qcow2_alloc_clusters_at() This function allows to allocate clusters at a given offset in the image file. This is useful if you want to allocate the second part of an area that must be contiguous. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-03-12 15:14:07 +01:00
Kevin Wolf	bf319ece56	qcow2: Factor out count_cow_clusters Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-03-12 15:14:07 +01:00
Kevin Wolf	259b217310	qcow2: Add error messages in qcow2_truncate qemu-img resize has some limitations with qcow2, but the user is only told that "this image format does not support resize". Quite confusing, so add some more detailed error_report() calls and change "this image format" into "this image". Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-03-12 15:14:06 +01:00
Kevin Wolf	3cce16f44d	qcow2: Add some tracing Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-03-12 15:14:06 +01:00
Stefan Hajnoczi	14fe292d86	qed: do not evict in-use L2 table cache entries The L2 table cache reduces QED metadata reads that would be required when translating LBAs to offsets into the image file. Since requests execute in parallel it is possible to share an L2 table between multiple requests. There is a potential data corruption issue when an in-use L2 table is evicted from the cache because the following situation occurs: 1. An allocating write performs an update to L2 table "A". 2. Another request needs L2 table "B" and causes table "A" to be evicted. 3. A new read request needs L2 table "A" but it is not cached. As a result the L2 update from #1 can overlap with the L2 fetch from #3. We must avoid doing overlapping I/O requests here since the worst case outcome is that the L2 fetch completes before the L2 update and yields stale data. In that case we would effectively discard the L2 update and lose data clusters! Thanks to Benoît Canet <benoit.canet@gmail.com> for extensive testing and debugging which lead to discovery of this bug. Reported-by: Benoît Canet <benoit.canet@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Tested-by: Benoît Canet <benoit.canet@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-03-12 15:14:06 +01:00
Stefan Weil	75d1234103	block/vmdk: Fix warning from splint (comparision of unsigned value) l1_entry_sectors will never be less than 0. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-03-07 13:03:51 +00:00
Kevin Wolf	64ca6aee4f	qcow2: Reject too large header extensions Image files that make qemu-img info read several gigabytes into the unknown header extensions list are bad. Just fail opening the image if an extension claims to be larger than the header extension area. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-02-29 12:48:47 +01:00
Kevin Wolf	fd29b4bbef	qcow2: Fix offset in qcow2_read_extensions The spec says that the length of extensions is padded to 8 bytes, not the offset. Currently this is the same because the header size is a multiple of 8, so this is only about compatibility with future changes to the header size. While touching it, move the calculation to a common place instead of duplicating it for each header extension type. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-02-29 12:48:47 +01:00
Kevin Wolf	423477e556	qcow2: Fix build with DEBUG_EXT enabled Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-29 12:48:47 +01:00
Luiz Capitulino	f36f394952	block: bdrv_eject(): Make eject_flag a real bool Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com>	2012-02-22 17:23:05 -02:00
MORITA Kazutaka	6d1acda8f1	sheepdog: fix co_recv coroutine context The co_recv coroutine has two things that will try to enter it: 1. The select(2) read callback on the sheepdog socket. 2. The aio_add_request() blocking operations, including a coroutine mutex. This patch fixes it by setting NULL to co_recv before sending data. In future, we should make the sheepdog driver fully coroutine-based and simplify request handling. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:51 +01:00
Kevin Wolf	75bab85ca0	qcow2: Keep unknown header extension when rewriting header If we want header extensions to work as compatible extensions, we can't destroy yet unknown header extensions when rewriting the header (e.g. for changing the backing file). Save all unknown header extensions in a list of blobs and include them in a new header. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:51 +01:00
Kevin Wolf	e24e49e619	qcow2: Update whole header at once In order to switch the backing file, qcow2 issues multiple write requests that only changed a part of the image header. Any failure after the first one would leave the header in an corrupted state. With this patch, the whole header is written at once, so we can't fail in the middle. At the same time, this gives us a reusable functions that updates all fields of the qcow2 header and not only the backing file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:51 +01:00
Kevin Wolf	ecd880d9ee	vpc: Round up image size during fixed image creation The geometry calculation algorithm from the VHD spec rounds the image size down if it doesn't exactly match a geometry. During image conversion, this causes the image to be truncated. For dynamic images, we already have code in place to round up instead, let's do the same for fixed images. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:51 +01:00
Charles Arnold	24da78dbb5	vpc: Add support for Fixed Disk type The Virtual Hard Disk Image Format Specification allows for three types of hard disk formats, Fixed, Dynamic, and Differencing. Qemu currently only supports Dynamic disks. This patch adds support for the Fixed Disk format. Usage: Example 1: qemu-img create -f vpc -o type=fixed <filename> [size] Example 2: qemu-img convert -O vpc -o type=fixed <input filename> <output filename> While it is also allowed to specify '-o type=dynamic', the default disk type remains Dynamic and is what is used when the type is left unspecified. Signed-off-by: Charles Arnold <carnold@suse.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:51 +01:00
Ronnie Sahlberg	f9dadc9855	iSCSI: add configuration variables for iSCSI This patch adds configuration variables for iSCSI to set initiator-name to use when logging in to the target, which type of header-digest to negotiate with the target and username and password for CHAP authentication. This allows specifying a initiator-name either from the command line -iscsi initiator-name=iqn.2004-01.com.example:test or from a configuration file included with -readconfig [iscsi] initiator-name = iqn.2004-01.com.example:test header-digest = CRC32C\|CRC32C-NONE\|NONE-CRC32C\|NONE user = CHAP username password = CHAP password If you use several different targets, you can also configure this on a per target basis by using a group name: [iscsi "iqn.target.name"] ... The configuration file can be read using -readconfig. Example : qemu-system-i386 -drive file=iscsi://127.0.0.1/iqn.ronnie.test/1 -readconfig iscsi.conf Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:50 +01:00
Stefan Hajnoczi	0e71be1932	qed: add .bdrv_co_write_zeroes() support Zero writes are a dedicated interface for writing regions of zeroes into the image file. If clusters are not yet allocated it is possible to use an efficient metadata representation which keeps the image file compact and does not store individual zero bytes. Implementing this for the QED image format is fairly straightforward. The only issue is that when a zero write touches an existing cluster we have to allocate a bounce buffer and perform a regular write. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:50 +01:00
Stefan Hajnoczi	6e4f59bd0d	qed: replace is_write with flags field Per-request attributes like read/write are currently implemented as bool fields in the QEDAIOCB struct. This becomes unwiedly as the number of attributes grows. For example, the qed_aio_setup() function would have to take multiple bool arguments and at call sites it would be hard to distinguish the meaning of each bool. Instead use a flags field with bitmask constants. This will be used when zero write support is added. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-02-09 16:17:50 +01:00
Li Zhi Hui	2b16c9ffb2	qcow: Use bdrv functions to replace file operation Since common file operation functions lack of error detection and use much more I/O syscalls, so change them to bdrv series functions and reduce I/O request. Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 14:49:18 +01:00
Li Zhi Hui	84b0ec020f	qcow: Return real error code in qcow_open Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 14:49:18 +01:00
Stefan Weil	641543b76b	block/vdi: Zero unused parts when allocating a new block (fix #919242 ) The new block was filled with zero when it was allocated by g_malloc0, but when it was reused later and only partially used, data from the previously allocated block were still present and written to the new block. This caused the problems reported by bug #919242 (https://bugs.launchpad.net/qemu/+bug/919242). Now the unused parts of the new block which are before and after the data are always filled with zero, so it is no longer necessary to zero the whole block with g_malloc0. I also updated the copyright comment. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 14:49:18 +01:00
Marcelo Tosatti	c8c3080f4a	block: add support for partial streaming Add support for streaming data from an intermediate section of the image chain (see patch and documentation for details). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 14:49:18 +01:00
Stefan Hajnoczi	5094a6c016	block: rate-limit streaming operations This patch implements rate-limiting for image streaming. If we've exceeded the bandwidth quota for a 100 ms time slice we sleep the coroutine until the next slice begins. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 11:45:26 +01:00
Stefan Hajnoczi	4f1043b4ff	block: add image streaming block job Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 11:45:26 +01:00
Stefan Hajnoczi	031380d877	block: replace unchecked strdup/malloc/calloc with glib Most of the codebase as been converted to use glib memory allocation functions. There are still a few instances of malloc/calloc in the block layer and qemu-io. Replace them, especially since they do not check the strdup/malloc/calloc return value. Reported-by: Dr David Alan Gilbert <davidagilbert@uk.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 11:39:03 +01:00
Gregory Farnum	bd60324706	rbd: wire up snapshot removal and rollback functionality Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-01-26 11:39:03 +01:00
Paolo Bonzini	6b620ca3b0	prepare for future GPLv2+ relicensing All files under GPLv2 will get GPLv2+ changes starting tomorrow. event_notifier.c and exec-obsolete.h were only ever touched by Red Hat employees and can be relicensed now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2012-01-13 10:55:56 -06:00
Stefan Hajnoczi	8d98734651	vvfat: avoid leaking file descriptor in commit_one_file() Reported-by: Dr David Alan Gilbert <davidagilbert@uk.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-01-13 10:36:59 +00:00
Paolo Bonzini	128aa58947	move corking functions to osdep.c Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2011-12-22 11:53:58 +01:00
Paolo Bonzini	7a706633e9	nbd: add support for NBD_CMD_TRIM Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2011-12-22 11:53:57 +01:00
Paolo Bonzini	1486d04a1b	nbd: add support for NBD_CMD_FLUSH Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2011-12-22 11:53:57 +01:00
Paolo Bonzini	2c7989a9b1	nbd: add support for NBD_CMD_FLAG_FUA Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2011-12-22 11:53:57 +01:00
Paolo Bonzini	ecda3447d1	nbd: allow multiple in-flight requests Allow sending up to 16 requests, and drive the replies to the coroutine that did the request. The code is written to be exactly the same as before this patch when MAX_NBD_REQUESTS == 1 (modulo the extra mutex and state). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2011-12-22 11:53:57 +01:00
Paolo Bonzini	d9b09f13ca	nbd: split requests qemu-nbd has a limit of slightly less than 1M per request. Work around this in the nbd block driver. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2011-12-22 11:53:57 +01:00
Paolo Bonzini	ae255e523c	nbd: switch to asynchronous operation Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2011-12-22 11:53:57 +01:00
Paolo Bonzini	8c5135f90e	sheepdog: move coroutine send/recv function to generic code Outside coroutines, avoid busy waiting on EAGAIN by temporarily making the socket blocking. The API of qemu_recvv/qemu_sendv is slightly different from do_readv/do_writev because they do not handle coroutines. It returns the number of bytes written before encountering an EAGAIN. The specificity of yielding on EAGAIN is entirely in qemu-coroutine.c. Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2011-12-22 11:53:53 +01:00
Li Zhi Hui	16d2fc002a	block/cow: Return real error code Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-15 12:40:33 +01:00
Kevin Wolf	c2c9a46609	qcow2: Allow >4 GB VM state This is a compatible extension to the snapshot header format that allows saving a 64 bit VM state size. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-15 12:40:33 +01:00
Josh Durgin	b9c532903f	rbd: always set out parameter in qemu_rbd_snap_list The caller expects psn_tab to be NULL when there are no snapshots or an error occurs. This results in calling g_free on an invalid address. Reported-by: Oliver Francke <Oliver@filoo.de> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-15 12:40:08 +01:00
Li Zhi Hui	28c1202ba6	block/qcow2.c: call qcow2_free_snapshots in the function of qcow2_close Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-15 12:40:08 +01:00
Paolo Bonzini	91977c2e5f	block: qemu_aio_get does not return NULL Initially done with the following semantic patch: @ rule1 @ expression E; statement S; @@ E = qemu_aio_get (...); ( - if (E == NULL) { ... } \| - if (E) { <... S ...> } ) which however missed occurrences in linux-aio.c and posix-aio-compat.c. Those were done by hand. The change in vdi_aio_setup's caller was also done by hand. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-15 12:40:08 +01:00
Paolo Bonzini	ad54ae80c7	block: bdrv_aio_* do not return NULL Initially done with the following semantic patch: @ rule1 @ expression E; statement S; @@ E = ( bdrv_aio_readv \| bdrv_aio_writev \| bdrv_aio_flush \| bdrv_aio_discard \| bdrv_aio_ioctl ) (...); ( - if (E == NULL) { ... } \| - if (E) { <... S ...> } ) which however missed the occurrence in block/blkverify.c (as it should have done), and left behind some unused variables. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-15 12:40:07 +01:00
Dong Xu Wang	3a93113a00	fix typo: delete redundant semicolon Double semicolons should be single. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2011-12-06 09:56:41 +00:00
Anthony Liguori	eb5d5beaeb	Merge remote-tracking branch 'kwolf/for-anthony' into staging	2011-12-05 09:39:25 -06:00
Stefan Hajnoczi	e94d138733	cow: use bdrv_co_is_allocated() Now that bdrv_co_is_allocated() is available we can use it instead of the synchronous bdrv_is_allocated() interface. This is a follow-up that Kevin Wolf <kwolf@redhat.com> pointed out after applying the series that introduces bdrv_co_is_allocated(). It is safe to make cow_read() a coroutine_fn because its only caller is a coroutine_fn. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-05 14:51:38 +01:00
Stefan Hajnoczi	e8ee5e4c47	coroutine: add qemu_co_queue_restart_all() It's common to wake up all waiting coroutines. Introduce the qemu_co_queue_restart_all() function to do this instead of looping over qemu_co_queue_next() in every caller. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-05 14:51:38 +01:00
Stefan Hajnoczi	81145834d3	cow: convert to .bdrv_co_is_allocated() The cow block driver does not keep internal state for cluster lookups. This means it is safe to perform cluster lookups in coroutine context without risk of race conditions that corrupt internal state. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-05 14:51:37 +01:00
Stefan Hajnoczi	e850b35a1f	vdi: convert to .bdrv_co_is_allocated() It is trivial to switch from the synchronous .bdrv_is_allocated() interface to .bdrv_co_is_allocated() since vdi_is_allocated() does not block. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-12-05 14:51:37 +01:00

... 2 3 4 5 6 ...

818 Commits