Commit Graph

30525 Commits

Author SHA1 Message Date
Vadim Rozenfeld
1c90ef2619 kvm: make hyperv hypercall and guest os id MSRs migratable.
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:55 +01:00
Paolo Bonzini
7bc3d711b4 kvm: make availability of Hyper-V enlightenments dependent on KVM_CAP_HYPERV
The MS docs specify HV_X64_MSR_HYPERCALL as a mandatory interface,
thus we must provide the MSRs even if the user only specified
features that, like relaxed timing, in principle don't require them.
And the MSRs are only there if the hypervisor has KVM_CAP_HYPERV.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:55 +01:00
Paolo Bonzini
234cc64796 KVM: fix coexistence of KVM and Hyper-V leaves
kvm_arch_init_vcpu's initialization of the KVM leaves at 0x40000100
is broken, because KVM_CPUID_FEATURES is left at 0x40000001.  Move
it to 0x40000101 if Hyper-V is enabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:55 +01:00
Radim Krčmář
977c7b6d89 kvm: print suberror on all internal errors
KVM introduced internal error exit reason and suberror at the same time,
and later extended it with internal error data.
QEMU does not report suberror on hosts between these two events because
we check for the extension. (half a year in 2009, but it is misleading)

Fix by removing KVM_CAP_INTERNAL_ERROR_DATA condition on printf.

(partially improved by bb44e0d12d and ba4047cf84 in the past)

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:54 +01:00
Eduardo Habkost
f0b9b11164 target-i386: kvm_check_features_against_host(): Kill feature word array
We don't need the ft[] array on kvm_check_features_against_host()
anymore, as we can simply use the feature_word_info[] array, that has
everything we need.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:54 +01:00
Eduardo Habkost
2bc65d2b02 target-i386: kvm_cpu_fill_host(): Fill feature words in a loop
Now that the kvm_cpu_fill_host() code is simplified, we can simply set
the feature word array using a simple loop.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:54 +01:00
Eduardo Habkost
2a573259eb target-i386: kvm_cpu_fill_host(): Set all feature words at end of function
Reorder the code so all the code that sets x86_cpu_def->features is at
the end of the function.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:54 +01:00
Eduardo Habkost
803a932706 target-i386: kvm_cpu_fill_host(): No need to check xlevel2
There's no need to check CPU xlevel2 before calling
kvm_arch_get_supported_cpuid(s, 0xC0000001, 0, R_EDX), because:

 * The kernel won't return any entry for 0xC0000000 if host CPU vendor
   is not Centaur (See kvm_dev_ioctl_get_supported_cpuid() on the kernel
   code)
 * Similarly, the kernel won't return any entry for 0xC0000001 if
   CPUID[0xC0000000].EAX is < 0xC0000001
 * kvm_arch_get_supported_cpuid() will return 0 if no entry is returned
   by the kernel for the requested leaf

For similar reasons, we can simply set x86_cpu_def->xlevel2 directly
instead of making it conditional, because it will be set to 0 CPU vendor
is not Centaur.

This will simplify the kvm_cpu_fill_host() code a little.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
[Remove unparseable comment. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:54 +01:00
Eduardo Habkost
b73dcfb16f target-i386: kvm_cpu_fill_host(): No need to check CPU vendor
There's no need to check CPU vendor before calling
kvm_arch_get_supported_cpuid(s, 0xC0000000, 0, R_EAX), because:

 * The kernel won't return any entry for 0xC0000000 if host CPU vendor
   is not Centaur (See kvm_dev_ioctl_get_cpuid() on the kernel code);
 * kvm_arch_get_supported_cpuid() will return 0 if no entry is returned
   by the kernel for the requested leaf.

This will simplify the kvm_cpu_fill_host() code a little.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:54 +01:00
Eduardo Habkost
7171a3933f target-i386: kvm_cpu_fill_host(): No need to check level
There's no need to check level (CPUID[0].EAX) before calling
kvm_arch_get_supported_cpuid(s, 0x7, 0, R_EBX), because:

 * The kernel won't return any entry for CPUID 7 if CPUID[0].EAX is < 7
   on the host (See kvm_dev_ioctl_get_cpuid() on the kernel code);
 * kvm_arch_get_supported_cpuid() will return 0 if no entry is returned
   by the kernel for the requested leaf.

This will simplify the kvm_cpu_fill_host() code a little.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:54 +01:00
Eduardo Habkost
81e207707e target-i386: kvm_cpu_fill_host(): Kill unused code
Those host_cpuid() calls are useless. They are leftovers from when the
old code using host_cpuid() was removed.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-03 17:33:54 +01:00
Anthony Liguori
0169c51155 Merge remote-tracking branch 'qemu-kvm/uq/master' into staging
* qemu-kvm/uq/master:
  kvm: always update the MPX model specific register
  KVM: fix addr type for KVM_IOEVENTFD
  KVM: Retry KVM_CREATE_VM on EINTR
  mempath prefault: fix off-by-one error
  kvm: x86: Separately write feature control MSR on reset
  roms: Flush icache when writing roms to guest memory
  target-i386: clear guest TSC on reset
  target-i386: do not special case TSC writeback
  target-i386: Intel MPX

Conflicts:
	exec.c

aliguori: fix trivial merge conflict in exec.c

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2014-01-24 15:52:44 -08:00
Anthony Liguori
1c51e68b18 Merge remote-tracking branch 'otubo/seccomp' into staging
* otubo/seccomp:
  seccomp: add some basic shared memory syscalls to the whitelist
  seccomp: add mkdir() and fchmod() to the whitelist

Message-id: 1390231004-18392-1-git-send-email-otubo@linux.vnet.ibm.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2014-01-24 15:52:16 -08:00
Anthony Liguori
7d64b2c2e2 Initial patch for QEMU GTK support on Windows
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJS3XsWAAoJEC+DHmz61iBpbSoH/05bF5kfogHUuqXdF/pGPB8H
 LmnaodaC0dnigKnEpwQrpiq5ZwkzRek4J35txrRKZcD+M5awjEnZEUJ9fP1sDTMi
 rxgewWIbUnISVtvw9pxz4CJumH3nRaVz7wmtcMd+4LNEuRhgwszfedDAQRLUKCmB
 xViGQeuauqUm0m12qx4F7rE7Dq5ru8uOl8Zf0u9QPpIreLmN8f/VZzE5NzApw0YV
 EZKpMsOPWs6mBUO978yjOH/wjncBoJp/dE2wkZm/LzCQtiNGMxTBlU7qvrxaID/O
 dVHtQGCRarStM3lkJhq3to3EXS9PtuLJ0zeWwAgncdGG+tw0becNfBEA4XLkudY=
 =CmlU
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'sweil/tags/for_anthony' into staging

Initial patch for QEMU GTK support on Windows

# gpg: Signature made Mon 20 Jan 2014 11:37:58 AM PST using RSA key ID FAD62069
# gpg: Can't check signature: public key not found

* sweil/tags/for_anthony:
  gtk: Support keyboard translation for hosts running Windows

Message-id: 1390246909-18757-1-git-send-email-sw@weilnetz.de
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2014-01-24 15:52:09 -08:00
Anthony Liguori
14ac4febb2 hda-codec: disable streams on reset
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJS3kkoAAoJEEy22O7T6HE4WJ4QAJYlYsrVThAi9bnpFB7qLve+
 uP8MWwK+Wxc+O2CrN38BOoIyb273xk+U6Vj1FEfh9C1rN2dcZ4fBsUZfqqGEOYpM
 LNgFWzTUwIXDHnxVVc8Dyp+DyPbRd4CE+ii/ukY7Xq1QiHJmcdxweMPQnIZQl9UZ
 4wTWWubK1u/We+J87udQPfE3ZT2+5ndcbKM2FcCW3j8yBLvZav6/DNL6alr5eJ2L
 WGB8mOyCNu5hjaOWP1mya8cRHO6Z5IbFXqKNGU2m78C9WBO3LaZY7IL+s6mPvTzq
 0SaOk/XfK4XCwpgbvfFQ8lnJ5tdx4dagjGpwZOHPE2+Sy1l/XZztqr+caju/maqF
 PnRrosGUvYnVvPxPJKpN02mbohBizGUwL0nbal5Knx+2quQ+5sHnTJb3a3nonfuC
 /pfqvOD4thb3L7994RKu2bKfTq6TK0/n64JanmXADmEKebb3bb5AijFTModcMd8z
 uW4NejmpqsNYKXWUQ3WUktZWCEaNE7CbxTuxgr7DTSY0b2gl82cLiDS0ZAiJ4zGd
 ddwsvkEadIXHWbsdMMM3kCQQvlO/gVNYB7xozUPqOJdghlTSQAtkMi8ReR+Hxqq+
 PVS7dM276sPjRdqqaJSdk3JiE10NzUSKyBiJZtCZ5dqqIuHPlaHMgx77MzBwlVNK
 PkkqSNr9UpmwHu8WVGhC
 =yVv9
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'kraxel/tags/pull-audio-2' into staging

hda-codec: disable streams on reset

# gpg: Signature made Tue 21 Jan 2014 02:17:12 AM PST using RSA key ID D3E87138
# gpg: Can't check signature: public key not found

* kraxel/tags/pull-audio-2:
  hda-codec: disable streams on reset

Message-id: 1390299589-5082-1-git-send-email-kraxel@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2014-01-24 15:51:39 -08:00
Anthony Liguori
f4b27793a8 usb core+hid: add support for microsoft os descriptors
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJS3kopAAoJEEy22O7T6HE4PjkP/0NZqOSU3e9GmgeBfEpvgFfP
 4axBVEycXJtGc9nNfahmEIyUkk1+DMa8oMHnVSoVzelrdm2VNHs6qk9yC9EPa9YG
 HTXZuLvHbuIVL/NXY5b+loh4fOjbOPWEthdUd+RpNfH545Azoi2T/cYgtNz8BTyf
 eO3vCZy69NJjcsuk0zlRhEEAhnhKb+jkYgzARecL6JrAC7uuDrIhdz9peT1Du8k8
 lUnCLHFXrDaZUprkLrgvB+TTHJw9IgRq1g0gV20Gu+hDzBOJsG3cKXAl4X2b3lzO
 Ih+jcORu4ubURvOJSqTpcnObsQWKJHDR3sPeMAVD2ZzCd82ZOeH/+cEhfVBp+dZX
 qV2XbFU6yEiQBbzIuZEbsE4XWQVEAsfa8MaP/SdodJ4WueW21Rq8KDlnWzalPfDc
 T0hofDyLLcpkr2O3AM7OfBfLa37B4joV5FnuinvOo5r6byMMNgAYp7gbDVeIzEXJ
 Cs8N6KF0m8wX0KNyFms49VMTQnKm0MbuTYFbDBkW5jnxKHZ1ZLXdIP1HvYCvWp3w
 or0hxGze6Mrv8Aya9HUu7K0BEEbz8OLhVISkpOi35PGii2PcSm3lYHHhsohRYOcW
 EHilDyCn3Q20RkwcHi6YMy43T3eSJJnYwN4VWEb4lnDT5sjh/ehthxdtjCQHt8+f
 O1cBVvdDyh/BZt6SSuy8
 =sFbE
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'kraxel/tags/pull-usb-2' into staging

usb core+hid: add support for microsoft os descriptors

# gpg: Signature made Tue 21 Jan 2014 02:21:29 AM PST using RSA key ID D3E87138
# gpg: Can't check signature: public key not found

* kraxel/tags/pull-usb-2:
  usb-hid: add microsoft os descriptor support
  usb: add support for microsoft os descriptors

Message-id: 1390299772-5368-1-git-send-email-kraxel@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2014-01-24 15:51:23 -08:00
Anthony Liguori
e9f526ab7b Merge remote-tracking branch 'bonzini/scsi-next' into staging
* bonzini/scsi-next:
  scsi: Support TEST UNIT READY in the dummy LUN0
  block: add .bdrv_reopen_prepare() stub for iscsi
  virtio-scsi: Prevent assertion on missed events
  virtio-scsi: Cleanup of I/Os that never started
  scsi: Assign cancel_io vector for scsi_disk_emulate_ops

Conflicts:
	block/iscsi.c

aliguori: resolve trivial merge conflict in block/iscsi.c

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2014-01-24 15:50:14 -08:00
Anthony Liguori
0d688cf7d8 Block patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJS4peVAAoJEH8JsnLIjy/WIfcP/0fezyrowrBSMNpqpbOI9bpy
 mjbo7tI1ZdvR2GlAsxZ8ollMNYtC611MHbnq48ywhRHJPMBPN9CYPJqwGtyzpA00
 f3gw2Lct9YMPRu0wiRclFaSSdPXVohqdL16AFRcsz8FVTD8MQkgAMu0q4+dHhs6p
 G2BvIyo3fjuS7jLJuXUmyyB4cOHI9/gw+B/ANF1BFWl0llcb5jLVxy44NGLwqovH
 Nxe4GbnMb6QYUhwibzGwMnImm8nrLHbocv0IMyX4zqBLap7BfQUJ9zs4kPUnd1rB
 /sB5dBKj5r7l8cuOyCZRkGo4gc1FyddjWlU5UpcqBuk92PXavsRM74C6H42CS3y1
 4pcEu0fAYBdfEaDAINUdbfT1eyMVjiNiFZT1NAgcz5JkTnuG3dThaDAKQEx7FCuq
 iP2n1nlpwSs9mzeifnZH6INyyEjoJzvBh3cIdBi3yEIJ4OVSLyrxi2djTEnRvAeh
 YXyMuJlscjelJpwsMCsWnkKp4vAuiXrefdQbKqZ9HwPD5oJmWi4PNvRndBiVj+Na
 CJfZROwAej3sAusUHbCB5pt5v+Fdb64bmTO5ph42ChnKIG9T2Piz2IQJPPTeVe9h
 cdtkd3tIV5xB6iXjy2xb4Y6y6Xqs1qEs91dyjQE1GszeBcairutwUmI0oYwHY5eJ
 tVyL8ptpNvfj9vKnDvE+
 =wLul
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'kwolf/tags/for-anthony' into staging

Block patches

# gpg: Signature made Fri 24 Jan 2014 08:40:53 AM PST using RSA key ID C88F2FD6
# gpg: Can't check signature: public key not found

* kwolf/tags/for-anthony: (93 commits)
  block: Switch bdrv_io_limits_intercept() to byte granularity
  qemu-iotests: Test pwritev RMW logic
  qemu-io: New command 'sleep'
  blkdebug: Make required alignment configurable
  iscsi: Set bs->request_alignment
  block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper
  block: Make bdrv_pread() a bdrv_prwv_co() wrapper
  block: Change coroutine wrapper to byte granularity
  block: Assert serialisation assumptions in pwritev
  block: Align requests in bdrv_co_do_pwritev()
  block: Allow wait_serialising_requests() at any point
  block: Make overlap range for serialisation dynamic
  block: Generalise and optimise COR serialisation
  block: Make zero-after-EOF work with larger alignment
  block: Allow waiting for overlapping requests between begin/end
  block: Switch BdrvTrackedRequest to byte granularity
  block: Introduce bdrv_co_do_pwritev()
  block: write: Handle COR dependency after I/O throttling
  block: Introduce bdrv_aligned_pwritev()
  block: Introduce bdrv_co_do_preadv()
  ...

Message-id: 1390584136-24703-1-git-send-email-kwolf@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2014-01-24 15:43:30 -08:00
Kevin Wolf
d5103588aa block: Switch bdrv_io_limits_intercept() to byte granularity
Request sizes used to be rounded down to the next sector boundary,
allowing to bypass the I/O limit. Now all requests are accounted for
with their exact byte size.

Reported-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:28 +01:00
Kevin Wolf
9e1cb96d9a qemu-iotests: Test pwritev RMW logic
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:25 +01:00
Kevin Wolf
cd33d02a10 qemu-io: New command 'sleep'
There is no easy way to check that a request correctly waits for a
different request. With a sleep command we can at least approximate it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
b35ee7fb23 blkdebug: Make required alignment configurable
The new 'align' option of blkdebug can be used in order to emulate
backends with a required 4k alignment on hosts which only really require
512 byte alignment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 17:40:03 +01:00
Paolo Bonzini
2c9880c45e iscsi: Set bs->request_alignment
The iSCSI backend already gets the block size from the READ CAPACITY
command it sends.  Save it so that the generic block layer gets it
too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
8407d5d7e2 block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper
Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_pwritev().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
a3ef657185 block: Make bdrv_pread() a bdrv_prwv_co() wrapper
Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_preadv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
775aa8b6e0 block: Change coroutine wrapper to byte granularity
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
28de2dcd88 block: Assert serialisation assumptions in pwritev
If a request calls wait_serialising_requests() and actually has to wait
in this function (i.e. a coroutine yield), other requests can run and
previously read data (like the head or tail buffer) could become
outdated. In this case, we would have to restart from the beginning to
read in the updated data.

However, we're lucky and don't actually need to do that: A request can
only wait in the first call of wait_serialising_requests() because we
mark it as serialising before that call, so any later requests would
wait. So as we don't wait in practice, we don't have to reload the data.

This is an important assumption that may not be broken or data
corruption will happen. Document it with some assertions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
3b8242e0ea block: Align requests in bdrv_co_do_pwritev()
This patch changes bdrv_co_do_pwritev() to actually be what its name
promises. If requests aren't properly aligned, it performs a RMW.

Requests touching the same block are serialised against the RMW request.
Further optimisation of this is possible by differentiating types of
requests (concurrent reads should actually be okay here).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
6460440f34 block: Allow wait_serialising_requests() at any point
We can only have a single wait_serialising_requests() call per request
because otherwise we can run into deadlocks where requests are waiting
for each other. The same is true when wait_serialising_requests() is not
at the very beginning of a request, so that other requests can be issued
between the start of the tracking and wait_serialising_requests().

Fix this by changing wait_serialising_requests() to ignore requests that
are already (directly or indirectly) waiting for the calling request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
7327145f63 block: Make overlap range for serialisation dynamic
Copy on Read wants to serialise with all requests touching the same
cluster, so wait_serialising_requests() rounded to cluster boundaries.
Other users like alignment RMW will have different requirements, though
(requests touching the same sector), so make it dynamic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
2dbafdc012 block: Generalise and optimise COR serialisation
Change the API so that specific requests can be marked serialising. Only
these requests are checked for overlaps then.

This means that during a Copy on Read operation, not all requests
overlapping other requests are serialised any more, but only those that
actually overlap with the specific COR request.

Also remove COR from function and variable names because this
functionality can be useful in other contexts.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
ec746e10cb block: Make zero-after-EOF work with larger alignment
Odd file sizes could make bdrv_aligned_preadv() shorten the request in
non-aligned ways. Fix it by rounding to the required alignment instead
of 512 bytes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
65afd211c7 block: Allow waiting for overlapping requests between begin/end
Previously, it was not possible to use wait_for_overlapping_requests()
between tracked_request_begin()/end() because it would wait for itself.

Ignore the current request in the overlap check and run more of the
bdrv_co_do_preadv/pwritev code with a BdrvTrackedRequest present.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
793ed47a7a block: Switch BdrvTrackedRequest to byte granularity
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
6601553e27 block: Introduce bdrv_co_do_pwritev()
This is going to become the bdrv_co_do_preadv() equivalent for writes.
In this patch, however, just a function taking byte offsets is created,
it doesn't align anything yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
244eadef5c block: write: Handle COR dependency after I/O throttling
First waiting for all COR requests to complete and calling the
throttling function afterwards means that the request could be delayed
and we still need to wait for the COR request even if it was issued only
after the throttled write request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
b404f72036 block: Introduce bdrv_aligned_pwritev()
This separates the part of bdrv_co_do_writev() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
1b0288ae7f block: Introduce bdrv_co_do_preadv()
Similar to bdrv_pread(), which aligns byte-aligned request to 512 byte
sectors, bdrv_co_do_preadv() takes a byte-aligned request and aligns it
to the alignment specified in bs->request_alignment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
d0c7f642f5 block: Introduce bdrv_aligned_preadv()
This separates the part of bdrv_co_do_readv() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:02 +01:00
Paolo Bonzini
c25f53b06e raw: Probe required direct I/O alignment
Add a bs->request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.

While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:02 +01:00
Paolo Bonzini
1b7fd72955 block: rename buffer_alignment to guest_block_size
The alignment field is now set to the value that is promised to the
guest, rather than required by the host.  The next patches will make
QEMU aware of the host-provided values, so make this clear.

The alignment is also not about memory buffers, but about the sectors on
the disk, change the documentation of the field.

At this point, the field is set by the device emulation, but completely
ignored by the block layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
339064d506 block: Don't use guest sector size for qemu_blockalign()
bs->buffer_alignment is set by the device emulation and contains the
logical block size of the guest device. This isn't something that the
block layer should know, and even less something to use for determining
the right alignment of buffers to be used for the host.

The new BlockLimits field opt_mem_alignment tells the qemu block layer
the optimal alignment to be used so that no bounce buffer must be used
in the driver.

This patch may change the buffer alignment from 4k to 512 for all
callers that used qemu_blockalign() with the top-level image format
BlockDriverState. The value was never propagated to other levels in the
tree, so in particular raw-posix never required anything else than 512.

While on disks with 4k sectors direct I/O requires a 4k alignment,
memory may still be okay when aligned to 512 byte boundaries. This is
what must have happened in practice, because otherwise this would
already have failed earlier. Therefore I don't expect regressions even
with this intermediate state. Later, raw-posix can implement the hook
and expose a different memory alignment requirement.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:01 +01:00
Kevin Wolf
1ff735bdc4 block: Detect unaligned length in bdrv_qiov_is_aligned()
For an O_DIRECT request to succeed, it's not only necessary that all
base addresses in the qiov are aligned, but also that each length in it
is aligned.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:01 +01:00
Kevin Wolf
e5354657a6 qemu_memalign: Allow small alignments
The functions used by qemu_memalign() require an alignment that is at
least sizeof(void*). Adjust it if it is too small.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
355ef4ac95 block: Update BlockLimits when they might have changed
When reopening with different flags, or when backing files disappear
from the chain, the limits may change. Make sure they get updated in
these cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
466ad822de block: Inherit opt_transfer_length
When there is a format driver between the backend, it's not guaranteed
that exposing the opt_transfer_length for the format driver results in
the optimal requests (because of fragmentation etc.), but it can't make
things worse, so let's just do it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
d34682cd4a block: Move initialisation of BlockLimits to bdrv_refresh_limits()
This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
dabfa6cc2e block: Fix bdrv_commit return value
bdrv_commit() could return 0 or 1 on success, depending on whether or
not the last sector was allocated in the overlay and whether the overlay
format had a .bdrv_make_empty callback.

Most callers ignored it, but qemu-img commit would print an error
message while the operation actually succeeded.

Also clean up the handling of I/O errors to return the real error code
instead of -EIO.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 16:53:51 +01:00
Jeff Cody
3722290074 block: update block commit documentation regarding image truncation
This updates the documentation for commiting snapshot images.
Specifically, this highlights what happens when the base image
is either smaller or larger than the snapshot image being committed.

In the case of the base image being smaller, it is resized to the
larger size of the snapshot image.  In the case of the base image
being larger, it is not resized automatically, but once the commit
has completed it is safe for the user to truncate the base image.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:12:49 +01:00
Jeff Cody
4da8358596 block: resize backing image during active layer commit, if needed
If the top image to commit is the active layer, and also larger than
the base image, then an I/O error will likely be returned during
block-commit.

For instance, if we have a base image with a virtual size 10G, and a
active layer image of size 20G, then committing the snapshot via
'block-commit' will likely fail.

This will automatically attempt to resize the base image, if the
active layer image to be committed is larger.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:12:49 +01:00