mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Paolo Bonzini	2125e5ea6e	curl: convert CURLAIOCB to byte values This is in preparation for the conversion from bdrv_aio_readv to bdrv_co_preadv, and it also requires changing some of the size_t values to uint64_t. This was broken before for disks > 2TB, but now it would break at 4GB. Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-6-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:50 -04:00
Paolo Bonzini	3ce6a729b5	curl: split curl_find_state/curl_init_state If curl_easy_init fails, a CURLState is left with s->in_use = 1. Split curl_init_state in two, so that we can distinguish the two failures and call curl_clean_state if needed. While at it, simplify curl_find_state, removing a dummy loop. The aio_poll loop is moved to the sole caller that needs it. Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-5-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:45 -04:00
Paolo Bonzini	456af34629	curl: avoid recursive locking of BDRVCURLState mutex The curl driver has a ugly hack where, if it cannot find an empty CURLState, it just uses aio_poll to wait for one to be empty. This is probably buggy when used together with dataplane, and the simplest way to fix it is to use coroutines instead. A more immediate effect of the bug however is that it can cause a recursive call to curl_readv_bh_cb and recursively taking the BDRVCURLState mutex. This causes a deadlock. The fix is to unlock the mutex around aio_poll, but for cleanliness we should also take the mutex around all calls to curl_init_state, even if reaching the unlock/lock pair is impossible. The same is true for curl_clean_state. Reported-by: Kun Wei <kuwei@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170515100059.15795-4-pbonzini@redhat.com Cc: qemu-stable@nongnu.org Cc: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:17 -04:00
Paolo Bonzini	34db05e7ff	curl: never invoke callbacks with s->mutex held All curl callbacks go through curl_multi_do, and hence are called with s->mutex held. Note that with comments, and make curl_read_cb drop the lock before invoking the callback. Likewise for curl_find_buf, where the callback can be invoked by the caller. Cc: qemu-stable@nongnu.org Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-3-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:17 -04:00
Paolo Bonzini	675a775633	curl: strengthen assertion in curl_clean_state curl_clean_state should only be called after all AIOCBs have been completed. This is not so obvious for the call from curl_detach_aio_context, so assert that. Cc: qemu-stable@nongnu.org Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-2-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:03 -04:00
Peter Krempa	327c8ebd70	block: curl: Allow passing cookies via QCryptoSecret Since cookies can contain sensitive data (session ID, etc ...) it is desired to hide them from the prying eyes of users. Add a possibility to pass them via the secret infrastructure. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1447413 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: f4a22cdebdd0bca6a13a43a2a6deead7f2ec4bb3.1493906281.git.pkrempa@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:31:08 -04:00
Stefan Hajnoczi	3a8760664d	-----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZFcjPAAoJEJykq7OBq3PIj0AH/iHH1SJthVn/4psSG2v99WkZ wPEUA72u48jZzA5/anf6B2EIXyJczBpMHwlNhrPZ4QuZqiP3E2ivyBPT4eKn8f06 dFm+9Up2WccFRAB3BMUJ+x3aUFCnBXTY5zHsDeE5pGp2eOi+07xdzLGI2j660ouB ltiJGHuxfnQs1vwcx69gh7Q9o1Eeq+3jSBHHIWllwPg9hT0eZxZinu2xSbw+ATFl 3NeWD9kF8afIqvUg0p66Jpmyp2oFijIWjcRU03pGP1NUAxq4FU22JJhGz0casFji Sur6EWQ0nxXbEHXVwuvbq84fktXWFHMhXeKIAU3aQykqi6zZyMxB0OazzQVSMn4= =qACI -----END PGP SIGNATURE----- Merge tag 'tracing-pull-request' into staging # gpg: Signature made Fri 12 May 2017 10:38:07 AM EDT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * tag 'tracing-pull-request': trace: add sanity check Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:39:35 -04:00
Stefan Hajnoczi	b54933eed5	-----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZFciYAAoJEJykq7OBq3PINCgH/1IXhfLR6AtPKAa0b2oABWEW lD90HiHIxeqYIDiVnppD6CEnIZ6VBmImSyRr6V9Fk7OTkCucVRLOXFZF63d7ETDo PBF8+FJ8YgdClJ80f3HkmPu6vzvnRzcs5mUBrrKhZPAy2adxNmIpfir/78p+Aomd HxCYl3jF6aVUtFEtzTwAmDjXU2NUNPQABAAKiGT5jwSUcksxogg7LSIM/ZlFzNhc ZMI9h6doRP0v3kLyh429qVEgXpCh7VaD9Swv6H3/sSHJdCBP5MILrTADrK+45tmL nIxHruOyt63Jgd8acGDdvYz1oHjPXb2Cn/87SiONsqwoyFJmkqbVXvOg4gqPuJ8= =yXgr -----END PGP SIGNATURE----- Merge tag 'block-pull-request' into staging # gpg: Signature made Fri 12 May 2017 10:37:12 AM EDT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * tag 'block-pull-request': aio: add missing aio_notify() to aio_enable_external() block: Simplify BDRV_BLOCK_RAW recursion coroutine: remove GThread implementation Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:39:23 -04:00
Stefan Hajnoczi	3753e255da	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJZFHXJAAoJEH8JsnLIjy/WvdsP/3mTGDZJO4M0HXWBWYQBwdGE EuMyGxn8Y0IUIBNx8sogAY9yQ/aQUh1b3bLWg7MAvZvQpi2S448Ak/2agaRFTWXJ vVv1nP0uigfgElSZZKTyj2SyhghrGkBou98hyYOd05YAApXmxoBKvTgC1di0Ec5K 86Mc91Cr3udh4Q5w5Ss2IxiN+fNoHhHdOOm8gkWM4iQl6fmDJ8lBBxoiU+K2uXX+ /flAVRgogFglKb0EAq8eSdy4qFRyp1G43yzOJKNiM3BuYeXpipnyO5mY3ynrEv5Y 3213hJ/rriY15xHcKtkPxEqd32iX2qCRGFXWMMiq4r527JG/B40SEJQaxfHu33U2 Et3+0MxoDHFseXE2sSjs8c0DKQiU5/hf7RkzqOMPMNluIX4ykrSyOdKzj4EFEowr GGLobm4Hr/rv8YDhRhllyKZ6Xzrjihy5WH5T9+HVfOlf35AxjJGZYqfCouhCiP15 0Nxz3qWfDH/U4zvhY+PIfQ51kvGl02oMZF6uu48kvzwdbIKgnImKg5HbG5R85mYQ BFqCjaEWjSUWxrqbZ/uRBUqavCLJm9C3mEWLCzGBmXNNs4Th8kXftuo1br+QHtr5 aT7xoxUJs/nmBoCKYhlCSrxjqCa0clenvNeiO4Q44UxWmOFQi9SM5myeSuCIaa+S qnYHZLZ74gVZpQaoPkjE =SA0O -----END PGP SIGNATURE----- Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging Block layer patches # gpg: Signature made Thu 11 May 2017 10:31:37 AM EDT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: (58 commits) MAINTAINERS: Add qemu-progress to the block layer qcow2: Discard/zero clusters by byte count qcow2: Assert that cluster operations are aligned qcow2: Optimize write zero of unaligned tail cluster iotests: Add test 179 to cover write zeroes with unmap iotests: Improve _filter_qemu_img_map qcow2: Optimize zero_single_l2() to minimize L2 churn qcow2: Make distinction between zero cluster types obvious qcow2: Name typedef for cluster type qcow2: Correctly report status of preallocated zero clusters block: Update comments on BDRV_BLOCK_* meanings qcow2: Use consistent switch indentation qcow2: Nicer variable names in qcow2_update_snapshot_refcount() tests: Add coverage for recent block geometry fixes blkdebug: Add ability to override unmap geometries blkdebug: Simplify override logic blkdebug: Add pass-through write_zero and discard support blkdebug: Refactor error injection blkdebug: Sanity check block layer guarantees qemu-io: Switch 'map' output to byte-based reporting ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:39:08 -04:00
Anthony Xu	5651743c90	trace: add sanity check If trace backend is set to TRACE_NOP, trace_get_vcpu_event_count returns 0, cause bitmap_new call abort. The abort can be triggered as follows: $ ./configure --enable-trace-backend=nop --target-list=x86_64-softmmu $ gdb ./x86_64-softmmu/qemu-system-x86_64 -M q35,accel=kvm -m 1G (gdb) bt #0 0x00007ffff04e25f7 in raise () from /lib64/libc.so.6 #1 0x00007ffff04e3ce8 in abort () from /lib64/libc.so.6 #2 0x00005555559de905 in bitmap_new (nbits=<optimized out>) at /home/root/git/qemu2.git/include/qemu/bitmap.h:96 #3 cpu_common_initfn (obj=0x555556621d30) at qom/cpu.c:399 #4 0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bbb0) at qom/object.c:341 #5 0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bd30) at qom/object.c:341 #6 0x0000555555a11efc in object_initialize_with_type (data=data@entry=0x555556621d30, size=76560, type=type@entry=0x55555656bd30) at qom/object.c:376 #7 0x0000555555a12061 in object_new_with_type (type=0x55555656bd30) at qom/object.c:484 #8 0x0000555555a121c5 in object_new (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu") at qom/object.c:494 #9 0x00005555557f6e3d in pc_new_cpu (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu", apic_id=0, errp=errp@entry=0x5555565391b0 <error_fatal>) at /home/root/git/qemu2.git/hw/i386/pc.c:1101 #10 0x00005555557fa33e in pc_cpus_init (pcms=pcms@entry=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc.c:1184 #11 0x00005555557fe0f6 in pc_q35_init (machine=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc_q35.c:121 #12 0x000055555574fbad in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4562 Signed-off-by: Anthony Xu <anthony.xu@intel.com> Message-id: 1494369432-15418-1-git-send-email-anthony.xu@intel.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:37:40 -04:00
Stefan Hajnoczi	321d1dba8b	aio: add missing aio_notify() to aio_enable_external() The main loop uses aio_disable_external()/aio_enable_external() to temporarily disable processing of external AioContext clients like device emulation. This allows monitor commands to quiesce I/O and prevent the guest from submitting new requests while a monitor command is in progress. The aio_enable_external() API is currently broken when an IOThread is in aio_poll() waiting for fd activity when the main loop re-enables external clients. Incrementing ctx->external_disable_cnt does not wake the IOThread from ppoll(2) so fd processing remains suspended and leads to unresponsive emulated devices. This patch adds an aio_notify() call to aio_enable_external() so the IOThread is kicked out of ppoll(2) and will re-arm the file descriptors. The bug can be reproduced as follows: $ qemu -M accel=kvm -m 1024 \ -object iothread,id=iothread0 \ -device virtio-scsi-pci,iothread=iothread0,id=virtio-scsi-pci0 \ -drive if=none,id=drive0,aio=native,cache=none,format=raw,file=test.img \ -device scsi-hd,id=scsi-hd0,drive=drive0 \ -qmp tcp::5555,server,nowait $ scripts/qmp/qmp-shell localhost:5555 (qemu) blockdev-snapshot-sync device=drive0 snapshot-file=sn1.qcow2 mode=absolute-paths format=qcow2 After blockdev-snapshot-sync completes the SCSI disk will be unresponsive. This leads to request timeouts inside the guest. Reported-by: Qianqian Zhu <qizhu@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170508180705.20609-1-stefanha@redhat.com Suggested-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:36:46 -04:00
Eric Blake	ee29d6adef	block: Simplify BDRV_BLOCK_RAW recursion Since we are already in coroutine context during the body of bdrv_co_get_block_status(), we can shave off a few layers of wrappers when recursing to query the protocol when a format driver returned BDRV_BLOCK_RAW. Note that we are already using the correct recursion later on in the same function, when probing whether the protocol layer is sparse in order to find out if we can add BDRV_BLOCK_ZERO to an existing BDRV_BLOCK_DATA\|BDRV_BLOCK_OFFSET_VALID. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170504173745.27414-1-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:36:46 -04:00
Daniel P. Berrange	33c53c54e4	coroutine: remove GThread implementation The GThread implementation is not functional enough to actually run QEMU reliably. While it was potentially useful for debugging, we have a scripts/qemugdb/coroutine.py to enable tracing of ucontext coroutines in GDB, so that removes the only reason for GThread to exist. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Acked-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:36:46 -04:00
Laurent Vivier	ecc1f5adee	maintainers: Add myself as linux-user reviewer I volunteer to review linux-user patches. Adding myself will help to not miss some of them. Signed-off-by: Laurent Vivier <laurent@vivier.eu> Acked-by: Riku Voipio <riku.voipio@linaro.org> Message-id: 20170510153950.29343-1-laurent@vivier.eu Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-11 13:31:11 -04:00
Kevin Wolf	d541e201bd	Block patches for the block queue. -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAlkUWPkSHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AQuYIAKu7UHbUc3rL4FOErA2kor/Lp55ApCfi WbBaeh06jNPXQ5/S7jie30WAlc0DT+wwfWlFTl7gnYsNmXI3yrBkbEbnZMA2uSEz qz+MEEx81v3BS7MDM5sKjoJVEt76zCS3f7MbrLYLrkdHg3AGo2tfMtIlABFM+I6T lkfxYqosR0pvN8bSPAyoAQvVKYefLdi+O0poG6WruCMcF58dmZn8GzwVfnncGbqz vsd2wcm983S+a7PgGT2VBNXfyBqZ0tHqn/gl5Bc5leZTEhV9DQSh8Z0DKBh+o5Cv 8iyrmklpGdr7sI+YHHkB1zecQxI854HaB/4+Hv8KudHXI9hVmqXyXaQ= =JcaJ -----END PGP SIGNATURE----- Merge remote-tracking branch 'mreitz/tags/pull-block-2017-05-11' into queue-block Block patches for the block queue. # gpg: Signature made Thu May 11 14:28:41 2017 CEST # gpg: using RSA key 0xF407DB0061D5CF40 # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * mreitz/tags/pull-block-2017-05-11: (22 commits) MAINTAINERS: Add qemu-progress to the block layer qcow2: Discard/zero clusters by byte count qcow2: Assert that cluster operations are aligned qcow2: Optimize write zero of unaligned tail cluster iotests: Add test 179 to cover write zeroes with unmap iotests: Improve _filter_qemu_img_map qcow2: Optimize zero_single_l2() to minimize L2 churn qcow2: Make distinction between zero cluster types obvious qcow2: Name typedef for cluster type qcow2: Correctly report status of preallocated zero clusters block: Update comments on BDRV_BLOCK_* meanings qcow2: Use consistent switch indentation qcow2: Nicer variable names in qcow2_update_snapshot_refcount() tests: Add coverage for recent block geometry fixes blkdebug: Add ability to override unmap geometries blkdebug: Simplify override logic blkdebug: Add pass-through write_zero and discard support blkdebug: Refactor error injection blkdebug: Sanity check block layer guarantees qemu-io: Switch 'map' output to byte-based reporting ... Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 14:34:56 +02:00
Max Reitz	8dd30c86dd	MAINTAINERS: Add qemu-progress to the block layer util/qemu-progress.c is currently unmaintained. The only user of its functionality is qemu-img, so it effectively is part of the block layer. Suggested-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170428165517.30341-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	d2cb36af2b	qcow2: Discard/zero clusters by byte count Passing a byte offset, but sector count, when we ultimately want to operate on cluster granularity, is madness. Clean up the external interfaces to take both offset and count as bytes, while still keeping the assertion added previously that the caller must align the values to a cluster. Then rename things to make sure backports don't get confused by changed units: instead of qcow2_discard_clusters() and qcow2_zero_clusters(), we now have qcow2_cluster_discard() and qcow2_cluster_zeroize(). The internal functions still operate on clusters at a time, and return an int for number of cleared clusters; but on an image with 2M clusters, a single L2 table holds 256k entries that each represent a 2M cluster, totalling well over INT_MAX bytes if we ever had a request for that many bytes at once. All our callers currently limit themselves to 32-bit bytes (and therefore fewer clusters), but by making this function 64-bit clean, we have one less place to clean up if we later improve the block layer to support 64-bit bytes through all operations (with the block layer auto-fragmenting on behalf of more-limited drivers), rather than the current state where some interfaces are artificially limited to INT_MAX at a time. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-13-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	f10ee139ad	qcow2: Assert that cluster operations are aligned We already audited (in commit `0c1bd469`) that qcow2_discard_clusters() is only passed cluster-aligned start values; but we can further tighten the assertion that the only unaligned end value is at EOF. Recent commits have taken advantage of an unaligned tail cluster, for both discard and write zeroes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-12-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	fbaa6bb3d3	qcow2: Optimize write zero of unaligned tail cluster We've already improved discards to operate efficiently on the tail of an unaligned qcow2 image; it's time to make a similar improvement to write zeroes. The special case is only valid at the tail cluster of a file, where we must recognize that any sectors beyond the image end would implicitly read as zero, and therefore should not penalize our logic for widening a partial cluster into writing the whole cluster as zero. However, note that for now, the special case of end-of-file is only recognized if there is no backing file, or if the backing file has the same length; that's because when the backing file is shorter than the active layer, we don't have code in place to recognize that reads of a sector unallocated at the top and beyond the backing end-of-file are implicitly zero. It's not much of a real loss, because most people don't use images that aren't cluster-aligned, or where the active layer is a different size than the backing layer (especially where the difference falls within a single cluster). Update test 154 to cover the new scenarios, using two images of intentionally differing length. While at it, fix the test to gracefully skip when run as ./check -qcow2 -o compat=0.10 154 since the older format lacks zero clusters already required earlier in the test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-11-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	e249d51952	iotests: Add test 179 to cover write zeroes with unmap No tests were covering write zeroes with unmap. Additionally, I needed to prove that my previous patches for correct status reporting and write zeroes optimizations actually had an impact. The test works for cluster_size between 8k and 2M (for smaller sizes, it fails because our allocation patterns are not contiguous with small clusters - in part, the largest consecutive allocation we tend to get is often bounded by the size covered by one L2 table). Note that testing for zero clusters is tricky: 'qemu-io map' reports whether data comes from the current layer of the image (useful for sniffing out which regions of the file have QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings; while 'qemu-img map' sees "zero":true for both unallocated and zero clusters for any qcow2 with no backing layer (so less useful at detecting true zero clusters), but reliably shows mappings. So we have to rely on both queries side-by-side at each point of the test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-10-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	d9ca2214bd	iotests: Improve _filter_qemu_img_map Although _filter_qemu_img_map documents that it scrubs offsets, it was only doing so for human mode. Of the existing tests using the filter (97, 122, 150, 154, 176), two of them are affected, but it does not hurt the validity of the tests to not require particular mappings (another test, 66, uses offsets but intentionally does not pass through _filter_qemu_img_map, because it checks that offsets are unchanged before and after an operation). Another justification for this patch is that it will allow a future patch to utilize 'qemu-img map --output=json' to check the status of preallocated zero clusters without regards to the mapping (since the qcow2 mapping can be very sensitive to the chosen cluster size, when preallocation is not in use). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-9-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	06cc5e2b2d	qcow2: Optimize zero_single_l2() to minimize L2 churn Similar to discard_single_l2(), we should try to avoid dirtying the L2 cache when the cluster we are changing already has the right characteristics. Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP is a requirement to unallocate a cluster (this is because the block layer clears that flag if discard.* flags during open requested that we never punch holes - see the conversation around commit `170f4b2e`, https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html). Therefore, this patch can only reuse a zero cluster as-is if either unmapping is not requested, or if the zero cluster was not associated with an allocation. Technically, there are some cases where an unallocated cluster already reads as all zeroes (namely, when there is no backing file [easy: check bs->backing], or when the backing file also reads as zeroes [harder: we can't check bdrv_get_block_status since we are already holding the lock]), where the guest would not immediately see a difference if we left that cluster unallocated. But if the user did not request unmapping, leaving an unallocated cluster is wrong; and even if the user DID request unmapping, keeping a cluster unallocated risks a subtle semantic change of guest-visible contents if a backing file is later added, and it is not worth auditing whether all internal uses such as mirror properly avoid an unmap request. Thus, this patch is intentionally limited to just clusters that are already marked as zero. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-8-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	fdfab37dfe	qcow2: Make distinction between zero cluster types obvious Treat plain zero clusters differently from allocated ones, so that we can simplify the logic of checking whether an offset is present. Do this by splitting QCOW2_CLUSTER_ZERO into two new enums, QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC. I tried to arrange the enum so that we could use 'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and 'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although I didn't actually end up taking advantage of the layout. In many cases, this leads to simpler code, by properly combining cases (sometimes, both zero types pair together, other times, plain zero is more like unallocated while allocated zero is more like normal). Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-7-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	3ef9521893	qcow2: Name typedef for cluster type Although it doesn't add all that much type safety (this is C, after all), it does add a bit of legibility to use the name QCow2ClusterType instead of a plain int. In particular, qcow2_get_cluster_offset() has an overloaded return type; a QCow2ClusterType on success, and -errno on failure; keeping the cluster type in a separate variable makes it slightly easier for the next patch to make further computations based on the type. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-6-eblake@redhat.com [mreitz: Use the new type in two more places (one of them pulled from the next patch)] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	4341df8a83	qcow2: Correctly report status of preallocated zero clusters We were throwing away the preallocation information associated with zero clusters. But we should be matching the well-defined semantics in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO \| BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved, while still reminding the user that reading from that offset is likely to read garbage. count_contiguous_clusters_by_type() is now used only for unallocated cluster runs, hence it gets renamed and tightened. Making this change lets us see which portions of an image are zero but preallocated, when using qemu-img map --output=json. The --output=human side intentionally ignores all zero clusters, whether or not they are preallocated. The fact that there is no change to qemu-iotests './check -qcow2' merely means that we aren't yet testing this aspect of qemu-img; a later patch will add a test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-5-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	4c41cb4955	block: Update comments on BDRV_BLOCK_* meanings We had some conflicting documentation: a nice 8-way table that described all possible combinations of DATA, ZERO, and OFFSET_VALID, contrasted with text that implied that OFFSET_VALID always meant raw data could be read directly. Furthermore, the text refers a lot to bs->file, even though the interface was updated back in `67a0fd2a` to let the driver pass back a specific BDS (not necessarily bs->file). As the 8-way table is the intended semantics, simplify the rest of the text to get rid of the confusion. ALLOCATED is always set by the block layer for convenience (drivers do not have to worry about it). RAW is used only internally, but by more than the raw driver. Document these additional items on the driver callback. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-4-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	bbd995d830	qcow2: Use consistent switch indentation Fix a couple of inconsistent indentations, before an upcoming patch further tweaks the switch statements. (best viewed with 'git diff -b'). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-3-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	b32cbae111	qcow2: Nicer variable names in qcow2_update_snapshot_refcount() In order to keep checkpatch happy when the next patch changes indentation, we first have to shorten some long lines. The easiest approach is to use a new variable in place of 'offset & L2E_OFFSET_MASK', except that 'offset' is the best name for that variable. Change '[old_]offset' to '[old_]entry' to make room. While touching things, also fix checkpatch warnings about unusual 'for' statements. Suggested by Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-2-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	40812d9373	tests: Add coverage for recent block geometry fixes Use blkdebug's new geometry constraints to emulate setups that have needed past regression fixes: write zeroes asserting when running through a loopback block device with max-transfer smaller than cluster size, and discard rounding away portions of requests not aligned to preferred boundaries. Also, add coverage that the block layer is honoring max transfer limits. For now, a single iotest performs all actions, with the idea that we can add future blkdebug constraint test cases in the same file; but it can be split into multiple iotests if we find reason to run one portion of the test in more setups than what are possible in the other. For reference, the final portion of the test (checking whether discard passes as much as possible to the lowest layers of the stack) works as follows: qemu-io: discard 30M at 80000001, passed to blkdebug blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than blkdebug's 512 align) blkdebug: discard 14371328 bytes at 80000512, passed to qcow2 qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than qcow2's 1M align) qcow2: discard 13M bytes at 77M, succeeds blkdebug: discard 15M bytes at 90M, passed to qcow2 qcow2: discard 15M bytes at 90M, succeeds blkdebug: discard 1356800 bytes at 105M, passed to qcow2 qcow2: discard 1M at 105M, succeeds qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's 1M align) blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than blkdebug's 512 align) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-10-eblake@redhat.com [mreitz: For cooperation with image locking, add -r to the qemu-io invocation which verifies the image content] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	430b26a82d	blkdebug: Add ability to override unmap geometries Make it easier to simulate various unusual hardware setups (for example, recent commits `3482b9b` and `b8d0a98` affect the Dell Equallogic iSCSI with its 15M preferred and maximum unmap and write zero sizing, or `b2f95fe` deals with the Linux loopback block device having a max_transfer of 64k), by allowing blkdebug to wrap any other device with further restrictions on various alignments. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-9-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	3dc834f879	blkdebug: Simplify override logic Rather than store into a local variable, then copy to the struct if the value is valid, then reporting errors otherwise, it is simpler to just store into the struct and report errors if the value is invalid. This however requires that the struct store a 64-bit number, rather than a narrower type. Likewise, setting a sane errno value in ret prior to the sequence of parsing and jumping to out: on error makes it easier for the next patch to add a chain of similar checks. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-8-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	63188c2450	blkdebug: Add pass-through write_zero and discard support In order to test the effects of artificial geometry constraints on operations like write zero or discard, we first need blkdebug to manage these actions. It also allows us to inject errors on those operations, just like we can for read/write/flush. We can also test the contract promised by the block layer; namely, if a device has specified limits on alignment or maximum size, then those limits must be obeyed (for now, the blkdebug driver merely inherits limits from whatever it is wrapping, but the next patch will further enhance it to allow specific limit overrides). This patch intentionally refuses to service requests smaller than the requested alignments; this is because an upcoming patch adds a qemu-iotest to prove that the block layer is correctly handling fragmentation, but the test only works if there is a way to tell the difference at artificial alignment boundaries when blkdebug is using a larger-than-default alignment. If we let the blkdebug layer always defer to the underlying layer, which potentially has a smaller granularity, the iotest will be thwarted. Tested by setting up an NBD server with export 'foo', then invoking: $ ./qemu-io qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo qemu-io> d 0 15M qemu-io> w -z 0 15M Pre-patch, the server never sees the discard (it was silently eaten by the block layer); post-patch it is passed across the wire. Likewise, pre-patch the write is always passed with NBD_WRITE (with 15M of zeroes on the wire), while post-patch it can utilize NBD_WRITE_ZEROES (for less traffic). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-7-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	d157ed5f72	blkdebug: Refactor error injection Rather than repeat the logic at each caller of checking if a Rule exists that warrants an error injection, fold that logic into inject_error(); and rename it to rule_check() for legibility. This will help the next patch, which adds two more callers that need to check rules for the potential of injecting errors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-6-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	e0ef439588	blkdebug: Sanity check block layer guarantees Commits `04ed95f4` and `1a62d0ac` updated the block layer to auto-fragment any I/O to fit within device boundaries. Additionally, when using a minimum alignment of 4k, we want to ensure the block layer does proper read-modify-write rather than requesting I/O on a slice of a sector. Let's enforce that the contract is obeyed when using blkdebug. For now, blkdebug only allows alignment overrides, and just inherits other limits from whatever device it is wrapping, but a future patch will further enhance things. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-5-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	6f3c90af3c	qemu-io: Switch 'map' output to byte-based reporting Mixing byte offset and sector allocation counts is a bit confusing. Also, reporting n/m sectors, where m decreases according to the remaining size of the file, isn't really adding any useful information; and reporting an offset at both the front and end of the line, with large amounts of whitespace, is pointless. Update the output to use byte counts and shorter lines, then adjust the affected tests (./check -qcow2 102, ./check -vpc 146). Note that 'qemu-io map' is MUCH weaker than 'qemu-img map'; the former only shows which regions of the active layer are allocated, without regards to where the allocation comes from or whether the allocated portion is known to read as zero (because it is using the weaker bdrv_is_allocated()); while the latter (especially in --output=json mode) reports more details from bdrv_get_block_status(). Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-4-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	4401fdc77c	qemu-io: Switch 'alloc' command to byte-based length For the 'alloc' command, accepting an offset in bytes but a length in sectors, and reporting output in sectors, is confusing. Do everything in bytes, and adjust the expected output accordingly. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-3-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:05 +02:00
Eric Blake	1bce6b4ce3	qemu-io: Improve alignment checks Several copy-and-pasted alignment checks exist in qemu-io, which could use some minor improvements: - Manual comparison against 0x1ff is not as clean as using our alignment macros (QEMU_IS_ALIGNED) from osdep.h. - The error messages aren't quite grammatically correct. Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-2-eblake@redhat.com Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:05 +02:00
John Snow	698bdfa07d	blockdev: use drained_begin/end for qmp_block_resize Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447551 If one tries to issue a block_resize while a guest is busy accessing the disk, it is possible that qemu may deadlock when invoking aio_poll from both the main loop and the iothread. Replace another instance of bdrv_drain_all that doesn't quite belong. Cc: qemu-stable@nongnu.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Christoph Hellwig	c03e7ef12a	nvme: Implement Write Zeroes Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: ported over from qemu-nvme.git to mainline] Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Anton Nefedov	b91127edd0	qemu-img: wait for convert coroutines to complete On error path (like i/o error in one of the coroutines), it's required to - wait for coroutines completion before cleaning the common structures - reenter dependent coroutines so they ever finish Introduced in `2d9187bc65`. Cc: qemu-stable@nongnu.org Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	22d5cd82e9	file-posix: Remove .bdrv_inactivate/invalidate_cache Now that the block layer takes care to request a lot less permissions for inactive nodes, the special-casing in file-posix isn't necessary any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	9c5e6594f1	block: Fix write/resize permissions for inactive images Format drivers for inactive nodes don't need write/resize permissions on their bs->file and can share write/resize with another VM (in fact, this is the whole point of keeping images inactive). Represent this fact in the op blocker system, so that image locking does the right thing without special-casing inactive images. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	38701b6aef	block: Inactivate parents before children The proper order for inactivating block nodes is that first the parents get inactivated and then the children. If we do things in this order, we can assert that we didn't accidentally leave a parent activated when one of its child nodes is inactive. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	cfa1a5723f	block: Drop permissions when migration completes With image locking, permissions affect other qemu processes as well. We want to be sure that the destination can run, so let's drop permissions on the source when migration completes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	4417ab7adf	block: New BdrvChildRole.activate() for blk_resume_after_migration() Instead of manually calling blk_resume_after_migration() in migration code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend activation with cache invalidation into a single function. This is achieved with a new callback in BdrvChildRole that is called by bdrv_invalidate_cache_all(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	ace21a5875	migration: Unify block node activation error handling Migration code activates all block driver nodes on the destination when the migration completes. It does so by calling bdrv_invalidate_cache_all() and blk_resume_after_migration(). There is one code path for precopy and one for postcopy migration, resulting in four function calls, which used to have three different failure modes. This patch unifies the behaviour so that failure to activate all block nodes is non-fatal, but the error message is logged and the VM isn't automatically started. 'cont' will retry activating the block nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	aa93c834f9	iotests: Extend test 066 066 was supposed to be a test "for discarding preallocated zero clusters", but it did so incompletely: While it did check the image file's integrity after the operation, it did not confirm that the clusters are indeed freed. This patch adds this test. In addition, new cases for writing to preallocated zero clusters are added. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	293073a56c	qcow2: Discard preallocated zero clusters In discard_single_l2(), we completely discard normal clusters instead of simply turning them into preallocated zero clusters. That means we should probably do the same with such preallocated zero clusters: Discard them instead of keeping them allocated. Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	564a6b6938	qcow2: Reuse preallocated zero clusters Instead of just freeing preallocated zero clusters and completely allocating them from scratch, reuse them. We cannot do this in handle_copied(), however, since this is a COW operation. Therefore, we have to add the new logic to handle_alloc() and simply return the existing offset if it exists. The only catch is that we have to convince qcow2_alloc_cluster_link_l2() not to free the old clusters (because we have reused them). Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	92413c16be	qcow2: Fix preallocation size formula When calculating the number of reftable entries, we should actually use the number of refblocks and not (wrongly[1]) re-calculate it. [1] "Wrongly" means: Dividing the number of clusters by the number of entries per refblock and rounding down instead of up. Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00

1 2 3 4 5 ...

53185 Commits