Commit Graph

61091 Commits

Author SHA1 Message Date
Peter Xu
3a7804c306 migration: allow fault thread to pause
Allows the fault thread to stop handling page faults temporarily. When
network failure happened (and if we expect a recovery afterwards), we
should not allow the fault thread to continue sending things to source,
instead, it should halt for a while until the connection is rebuilt.

When the dest main thread noticed the failure, it kicks the fault thread
to switch to pause state.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-7-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Peter Xu
14b1742eaa migration: allow src return path to pause
Let the thread pause for network issues.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-6-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Peter Xu
b411b844fb migration: allow dst vm pause on postcopy
When there is IO error on the incoming channel (e.g., network down),
instead of bailing out immediately, we allow the dst vm to switch to the
new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
new semaphore, until someone poke it for another attempt.

One note is that here on ram loading thread we cannot detect the
POSTCOPY_ACTIVE state, but we need to detect the more specific
POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all
the device states.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-5-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Peter Xu
b23c2ade25 migration: implement "postcopy-pause" src logic
Now when network down for postcopy, the source side will not fail the
migration. Instead we convert the status into this new paused state, and
we will try to wait for a rescue in the future.

If a recovery is detected, migration_thread() will reset its local
variables to prepare for that.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-4-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Peter Xu
a688d2c1ab migration: new postcopy-pause state
Introducing a new state "postcopy-paused", which can be used when the
postcopy migration is paused. It is targeted for postcopy network
failure recovery.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-3-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Peter Xu
e89f5ff2c3 migration: let incoming side use thread context
The old incoming migration is running in main thread and default
gcontext.  With the new qio_channel_add_watch_full() we can now let it
run in the thread's own gcontext (if there is one).

Currently this patch does nothing alone.  But when any of the incoming
migration is run in another iothread (e.g., the upcoming migrate-recover
command), this patch will bind the incoming logic to the iothread
instead of the main thread (which may already get page faulted and
hanged).

RDMA is not considered for now since it's not even using the QIO watch
framework at all.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180502104740.12123-2-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Juan Quintela
8c4598f2b1 migration: Define MultifdRecvParams sooner
Once there, we don't need the struct names anywhere, just the
typedefs.  And now also document all fields.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-05-15 20:24:27 +02:00
Juan Quintela
af8b7d2b09 migration: Transmit initial package through the multifd channels
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

--

Be network agnostic.
Add error checking for all values.
2018-05-15 20:24:27 +02:00
Juan Quintela
36c2f8be2c migration: Delay start of migration main routines
We need to make sure that we have started all the multifd threads.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15 20:24:27 +02:00
Juan Quintela
60df2d4ae5 migration: Create multifd channels
In both sides.  We still don't transmit anything through them.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15 20:24:27 +02:00
Juan Quintela
3854956ad7 migration: Export functions to create send channels
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15 20:24:27 +02:00
Juan Quintela
62c1e0ca73 migration: Be sure all recv channels are created
We need them before we start migration.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15 20:24:27 +02:00
Juan Quintela
667707078d migration: terminate_* can be called for other threads
Once there, make  count field to always be accessed with atomic
operations.  To make blocking operations, we need to know that the
thread is running, so create a bool to indicate that.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

--

Once here, s/terminate_multifd_*-threads/multifd_*_terminate_threads/
This is consistente with every other function
2018-05-15 20:24:27 +02:00
Juan Quintela
71bb07dbfc migration: Introduce multifd_recv_new_channel()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15 20:24:27 +02:00
Juan Quintela
7a169d745c migration: Set error state in case of error
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:27 +02:00
Juan Quintela
cdf338152f tests: Migration ppc now inlines its program
No need to write it to a file.  Just need a proper firmware O:-)

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2018-05-15 20:24:27 +02:00
Juan Quintela
2884100cc6 tests: Add migration precopy test
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2018-05-15 20:24:00 +02:00
Xiao Guangrong
701b1876c0 migration: fix saving normal page even if it's been compressed
Fix the bug introduced by da3f56cb2e (migration: remove
ram_save_compressed_page()), It should be 'return' rather than
'res'

Sorry for this stupid mistake :(

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180428081045.8878-1-xiaoguangrong@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15 20:24:00 +02:00
Peter Maydell
c416eecea5 Block layer patches:
- Switch AIO/callback based block drivers to a byte-based interface
 - Block jobs: Expose error string via query-block-jobs
 - Block job cleanups and fixes
 - hmp: Allow using a qdev id in block_set_io_throttle
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJa+v22AAoJEH8JsnLIjy/Wy5kP/21fPSvmyxMsSQ5lF5hlkodr
 qNFIV6EuQ46YXSr4KzK2Dw88YR18nI5SlMd6mzWF9qx7WhjTMeHhARG9G497cQty
 yDb3Y6dwiuhVndWLMzj/590miqk5TnJvFx5ii88oEnsrbcjKmTY78KMkl/q1bHSp
 qL7sBhI3zPol+y28mvILPXgKsqnabvS/cmsQJCISUfSdFsnsxXVABUPI/WKe1ecs
 UE3tl3cDA/0F8TYDerPUX1RZLJcr7yoXc91ieVWrzug4SStY3HZqWT8SShe2igN0
 w5eWshUBhccHKeiqKx8vdXN7MzNP4v5H2lstTdqV/zDQEXO9vqF1X91zaRBJNb+o
 A4M3lZU/U3xideBo7Hvp4euJ5f6ZKNswpeIC3Ppky788Q+HU/d+cQF+eZUxm+t8y
 vVtixTToSt52dIaAPMsssWCtVwkS4IFO9RLXJeRs94XR3ocrsSdJNOQPTodWafjF
 BXIF4wyKlPvVMzisvpj6jsjVR4Oq7J7P+EXq+hAMbio9WlsXssWu0ZedH8oV+mKl
 UkEB5TgRAaz8MpYNCCEhUwdmHLwlc/PaPN+BAlvQCnDWMtfiFTys6Sh9N9vv/c16
 HISDkR8sf7gPm3VLnt2fPoVOQMU3Gjtany+RySXfn1AzCRG3tPzdS/Ay+s1WpoYv
 7F2Pm9QLepP7xRB48yD3
 =JFBS
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Switch AIO/callback based block drivers to a byte-based interface
- Block jobs: Expose error string via query-block-jobs
- Block job cleanups and fixes
- hmp: Allow using a qdev id in block_set_io_throttle

# gpg: Signature made Tue 15 May 2018 16:33:10 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (37 commits)
  iotests: Add test for -U/force-share conflicts
  qemu-img: Use only string options in img_open_opts
  qemu-io: Use purely string blockdev options
  block: Document BDRV_REQ_WRITE_UNCHANGED support
  qemu-img: Check post-truncation size
  iotests: Add test for COR across nodes
  iotests: Copy 197 for COR filter driver
  iotests: Clean up wrap image in 197
  block: Support BDRV_REQ_WRITE_UNCHANGED in filters
  block/quorum: Support BDRV_REQ_WRITE_UNCHANGED
  block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes
  block: Add BDRV_REQ_WRITE_UNCHANGED flag
  block: BLK_PERM_WRITE includes ..._UNCHANGED
  block: Add COR filter driver
  iotests: Skip 181 and 201 without userfaultfd
  iotests: Add failure matching to common.qemu
  docs: Document the new default sizes of the qcow2 caches
  qcow2: Give the refcount cache the minimum possible size by default
  specs/qcow2: Clarify that compressed clusters have the COPIED bit reset
  Fix error message about compressed clusters with OFLAG_COPIED
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-15 17:02:00 +01:00
Babu Moger
ab8f992e3e i386: Add new property to control cache info
The property legacy-cache will be used to control the cache information.
If user passes "-cpu legacy-cache" then older information will
be displayed even if the hardware supports new information. Otherwise
use the statically loaded cache definitions if available.

Renamed the previous cache structures to legacy_*. If there is any change in
the cache information, then it needs to be initialized in builtin_x86_defs.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
Message-Id: <20180514164156.27034-3-babu.moger@amd.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-15 11:33:33 -03:00
Babu Moger
968ee4ad25 pc: add 2.13 machine types
Add pc-q35-2.13 and pc-i440fx-2.13 machine types

Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <20180514164156.27034-2-babu.moger@amd.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-15 11:33:33 -03:00
Babu Moger
fe52acd2a0 i386: Initialize cache information for EPYC family processors
Initialize pre-determined cache information for EPYC processors.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
Message-Id: <20180510204148.11687-5-babu.moger@amd.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-15 11:33:33 -03:00
Babu Moger
6aaeb05492 i386: Add cache information in X86CPUDefinition
Add cache information in X86CPUDefinition and CPUX86State.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180510204148.11687-3-babu.moger@amd.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-15 11:33:33 -03:00
Eduardo Habkost
7e3482f824 i386: Helpers to encode cache information consistently
Instead of having a collection of macros that need to be used in
complex expressions to build CPUID data, define a CPUCacheInfo
struct that can hold information about a given cache.  Helper
functions will take a CPUCacheInfo struct as input to encode
CPUID leaves for a cache.

This will help us ensure consistency between cache information
CPUID leaves, and make the existing inconsistencies in CPUID info
more visible.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
Message-Id: <20180510204148.11687-2-babu.moger@amd.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-15 11:33:33 -03:00
Jingqi Liu
0da0fb0628 x86/cpu: Enable CLDEMOTE(Demote Cache Line) cpu feature
The CLDEMOTE instruction hints to hardware that the cache line that
contains the linear address should be moved("demoted") from
the cache(s) closest to the processor core to a level more distant
from the processor core. This may accelerate subsequent accesses
to the line by other cores in the same coherence domain,
especially if the line was written by the core that demotes the line.

Intel Snow Ridge has added new cpu feature, CLDEMOTE.
The new cpu feature needs to be exposed to guest VM.

The bit definition:
CPUID.(EAX=7,ECX=0):ECX[bit 25] CLDEMOTE

The release document ref below link:
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf

Signed-off-by: Jingqi Liu <jingqi.liu@intel.com>
Message-Id: <1525406253-54846-1-git-send-email-jingqi.liu@intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-15 11:33:33 -03:00
Boqun Feng
a18495159a i386: add KnightsMill cpu model
A new cpu model called "KnightsMill" is added to model Knights Mill
processors.  Compared to "Skylake-Server" cpu model, the following
features are added:

	avx512_4vnniw avx512_4fmaps avx512pf avx512er avx512_vpopcntdq

and the following features are removed:

	pcid invpcid clflushopt avx512dq avx512bw clwb smap rtm mpx
	xsavec xgetbv1 hle

Signed-off-by: Boqun Feng <boqun.feng@intel.com>
Message-Id: <20180320000821.8337-1-boqun.feng@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-15 11:33:33 -03:00
Kevin Wolf
1fce860ea5 - Copy-on-read block driver
- The qcow2 default refcount cache size has been decreased
 - Various bug fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJa+uwxAAoJEPQH2wBh1c9AHLAH/24gj+Tg35wP9dssHcpfZgLJ
 Yx0FeTvIvPUvjXTfdATei+MgomwPHcMJIIldH/rwXINzj1gvf+I8Kfb4iLm23Pby
 60D02MwrQWPnYt9mypVtLB+yVnzTRjeFn0v84r1w2OXm+PUw79xlQdNa1Yi0pl29
 y8F6Xk3on8d4CVgYSS8oRAzhUm+AP4wwilYVXtnCx4Btghs6REh5we2xcYSaz3PF
 EM83tZiiXCAHvWiwiTTLf/Qt7IK5mN8QsR3NaaB4qGBDrpM+nkYkJTNfqMvvi4OY
 VqhD+6etTVTaieJwFsAY1Q379t/ov5V0pya1LOu9LN4xGxMMGkrBmggVErAU8Gk=
 =omzC
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'mreitz/tags/pull-block-2018-05-15' into queue-block

- Copy-on-read block driver
- The qcow2 default refcount cache size has been decreased
- Various bug fixes

# gpg: Signature made Tue May 15 16:18:25 2018 CEST
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2018-05-15: (21 commits)
  iotests: Add test for -U/force-share conflicts
  qemu-img: Use only string options in img_open_opts
  qemu-io: Use purely string blockdev options
  block: Document BDRV_REQ_WRITE_UNCHANGED support
  qemu-img: Check post-truncation size
  iotests: Add test for COR across nodes
  iotests: Copy 197 for COR filter driver
  iotests: Clean up wrap image in 197
  block: Support BDRV_REQ_WRITE_UNCHANGED in filters
  block/quorum: Support BDRV_REQ_WRITE_UNCHANGED
  block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes
  block: Add BDRV_REQ_WRITE_UNCHANGED flag
  block: BLK_PERM_WRITE includes ..._UNCHANGED
  block: Add COR filter driver
  iotests: Skip 181 and 201 without userfaultfd
  iotests: Add failure matching to common.qemu
  docs: Document the new default sizes of the qcow2 caches
  qcow2: Give the refcount cache the minimum possible size by default
  specs/qcow2: Clarify that compressed clusters have the COPIED bit reset
  Fix error message about compressed clusters with OFLAG_COPIED
  ...

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-15 16:19:53 +02:00
Max Reitz
4e7d73c5fb iotests: Add test for -U/force-share conflicts
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180502202051.15493-4-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
4615f87832 qemu-img: Use only string options in img_open_opts
img_open_opts() takes a QemuOpts and converts them to a QDict, so all
values therein are strings.  Then it may try to call qdict_get_bool(),
however, which will fail with a segmentation fault every time:

$ ./qemu-img info -U --image-opts \
    driver=file,filename=/dev/null,force-share=off
[1]    27869 segmentation fault (core dumped)  ./qemu-img info -U
--image-opts driver=file,filename=/dev/null,force-share=off

Fix this by using qdict_get_str() and comparing the value as a string.
Also, when adding a force-share value to the QDict, add it as a string
so it fits the rest of the dict.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180502202051.15493-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
2a01c01f9e qemu-io: Use purely string blockdev options
Currently, qemu-io only uses string-valued blockdev options (as all are
converted directly from QemuOpts) -- with one exception: -U adds the
force-share option as a boolean.  This in itself is already a bit
questionable, but a real issue is that it also assumes the value already
existing in the options QDict would be a boolean, which is wrong.

That has the following effect:

$ ./qemu-io -r -U --image-opts \
    driver=file,filename=/dev/null,force-share=off
[1]    15200 segmentation fault (core dumped)  ./qemu-io -r -U
--image-opts driver=file,filename=/dev/null,force-share=off

Since @opts is converted from QemuOpts, the value must be a string, and
we have to compare it as such.  Consequently, it makes sense to also set
it as a string instead of a boolean.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180502202051.15493-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
c1e3489dfa block: Document BDRV_REQ_WRITE_UNCHANGED support
Add BDRV_REQ_WRITE_UNCHANGED to the list of flags honored during pwrite
and pwrite_zeroes, and also add a note on when you absolutely need to
support it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180502140359.18222-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
5279b30392 qemu-img: Check post-truncation size
Some block drivers (iscsi and file-posix when dealing with device files)
do not actually support truncation, even though they provide a
.bdrv_truncate() method and will happily return success when providing a
new size that does not exceed the current size.  This is because these
drivers expect the user to resize the image outside of qemu and then
provide qemu with that information through the block_resize command
(compare cb1b83e740).

Of course, anyone using qemu-img resize will find that behavior useless.
So we should check the actual size of the image after the supposedly
successful truncation took place, emit an error if nothing changed and
emit a warning if the target size was not met.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1523065
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180421163957.29872-1-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
3e7a95feb9 iotests: Add test for COR across nodes
COR across nodes (that is, you have some filter node between the
actually COR target and the node that performs the COR) cannot reliably
work together with the permission system when there is no explicit COR
node that can request the WRITE_UNCHANGED permission for its child.
This is because COR (currently) sneaks its requests by the usual
permission checks, so it can work without a WRITE* permission; but if
there is a filter node in between, that will re-issue the request, which
then passes through the usual check -- and if nobody has requested a
WRITE_UNCHANGED permission, that check will fail.

There is no real direct fix apart from hoping that there is someone who
has requested that permission; in case of just the qemu-io HMP command
(and no guest device), however, that is not the case.  The real real fix
is to implement the copy-on-read flag through an implicitly added COR
node.  Such a node can request the necessary permissions as shown in
this test.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180421132929.21610-10-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
a62cbac4ce iotests: Copy 197 for COR filter driver
iotest 197 tests copy-on-read using the (now old) copy-on-read flag.
Copy it to 215 and modify it to use the COR filter driver instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180421132929.21610-9-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
5fdc0b73eb iotests: Clean up wrap image in 197
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-8-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
228345bf5d block: Support BDRV_REQ_WRITE_UNCHANGED in filters
Update the rest of the filter drivers to support
BDRV_REQ_WRITE_UNCHANGED.  They already forward write request flags to
their children, so we just have to announce support for it.

This patch does not cover the replication driver because that currently
does not support flags at all, and because it just grabs the WRITE
permission for its children when it can, so we should be fine just
submitting the incoming WRITE_UNCHANGED requests as normal writes.

It also does not cover format drivers for similar reasons.  They all use
bdrv_format_default_perms() as their .bdrv_child_perm() implementation
so they just always grab the WRITE permission for their file children
whenever possible.  In addition, it often would be difficult to
ascertain whether incoming unchanging writes end up as unchanging writes
in their files.  So we just leave them as normal potentially changing
writes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-7-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
1b1a920b71 block/quorum: Support BDRV_REQ_WRITE_UNCHANGED
We just need to forward it to quorum's children (except in case of a
rewrite because of corruption), but for that we first have to support
flags in child requests at all.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-6-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
7adcf59fec block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-5-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
c6035964f8 block: Add BDRV_REQ_WRITE_UNCHANGED flag
This flag signifies that a write request will not change the visible
disk content.  With this flag set, it is sufficient to have the
BLK_PERM_WRITE_UNCHANGED permission instead of BLK_PERM_WRITE.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-4-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
24b7c538fe block: BLK_PERM_WRITE includes ..._UNCHANGED
Currently we never actually check whether the WRITE_UNCHANGED
permission has been taken for unchanging writes.  But the one check that
is commented out checks both WRITE and WRITE_UNCHANGED; and considering
that WRITE_UNCHANGED is already documented as being weaker than WRITE,
we should probably explicitly document WRITE to include WRITE_UNCHANGED.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-3-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
6c6f24fd84 block: Add COR filter driver
This adds a simple copy-on-read filter driver.  It relies on the already
existing COR functionality in the central block layer code, which may be
moved here once we no longer need it there.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180421132929.21610-2-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
b05a2225d2 iotests: Skip 181 and 201 without userfaultfd
userfaultfd support depends on the host kernel, so it may not be
available.  If so, 181 and 201 should be skipped.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180406151731.4285-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
81c6ddf49a iotests: Add failure matching to common.qemu
Currently, common.qemu only allows to match for results indicating
success.  The only way to fail is by provoking a timeout.  However,
sometimes we do have a defined failure output and can match for that,
which saves us from having to wait for the timeout in case of failure.
Because failure can sometimes just result in a _notrun in the test, it
is actually important to care about being able to fail quickly.

Also, sometimes we simply do not get any specific output in case of
success.  The only way to handle this currently would be to define an
error message as the string to look for, which means that actual success
results in a timeout.  This is really bad because it unnecessarily slows
down a succeeding test.

Therefore, this patch adds a new parameter $success_or_failure to
_timed_wait_for and _send_qemu_cmd.  Setting this to a non-empty string
makes both commands expect two match parameters: If the first matches,
the function succeeds.  If the second matches, the function fails.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180406151731.4285-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Alberto Garcia
603790ef3a docs: Document the new default sizes of the qcow2 caches
We have just reduced the refcount cache size to the minimum unless
the user explicitly requests a larger one, so we have to update the
documentation to reflect this change.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: c5f0bde23558dd9d33b21fffc76ac9953cc19c56.1523968389.git.berto@igalia.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Alberto Garcia
52253998ec qcow2: Give the refcount cache the minimum possible size by default
The L2 and refcount caches have default sizes that can be overridden
using the l2-cache-size and refcount-cache-size (an additional
parameter named cache-size sets the combined size of both caches).

Unless forced by one of the aforementioned parameters, QEMU will set
the unspecified sizes so that the L2 cache is 4 times larger than the
refcount cache.

This is based on the premise that the refcount metadata needs to be
only a fourth of the L2 metadata to cover the same amount of disk
space. This is incorrect for two reasons:

 a) The amount of disk covered by an L2 table depends solely on the
    cluster size, but in the case of a refcount block it depends on
    the cluster size *and* the width of each refcount entry.
    The 4/1 ratio is only valid with 16-bit entries (the default).

 b) When we talk about disk space and L2 tables we are talking about
    guest space (L2 tables map guest clusters to host clusters),
    whereas refcount blocks are used for host clusters (including
    L1/L2 tables and the refcount blocks themselves). On a fully
    populated (and uncompressed) qcow2 file, image size > virtual size
    so there are more refcount entries than L2 entries.

Problem (a) could be fixed by adjusting the algorithm to take into
account the refcount entry width. Problem (b) could be fixed by
increasing a bit the refcount cache size to account for the clusters
used for qcow2 metadata.

However this patch takes a completely different approach and instead
of keeping a ratio between both cache sizes it assigns as much as
possible to the L2 cache and the remainder to the refcount cache.

The reason is that L2 tables are used for every single I/O request
from the guest and the effect of increasing the cache is significant
and clearly measurable. Refcount blocks are however only used for
cluster allocation and internal snapshots and in practice are accessed
sequentially in most cases, so the effect of increasing the cache is
negligible (even when doing random writes from the guest).

So, make the refcount cache as small as possible unless the user
explicitly asks for a larger one.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 9695182c2eb11b77cb319689a1ebaa4e7c9d6591.1523968389.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Alberto Garcia
3c7d14b201 specs/qcow2: Clarify that compressed clusters have the COPIED bit reset
Compressed clusters are not supposed to have the COPIED bit set, but
this is not made explicit in the specs, so let's document it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 74552e1d6e858d3159cb0c0e188e80bc9248e337.1523376013.git.berto@igalia.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Alberto Garcia
74c44a5934 Fix error message about compressed clusters with OFLAG_COPIED
Compressed clusters are not supposed to have the COPIED bit set.
"qemu-img check" detects that and prints an error message reporting
the number of the affected host cluster. This doesn't make much sense
because compressed clusters are not aligned to host clusters, so it
would be better to report the offset instead. Plus, the calculation is
wrong and it uses the raw L2 entry as if it was simply an offset.

This patch fixes the error message and reports the offset of the
compressed cluster.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 0f687957feb72e80c740403191a47e607c2463fe.1523376013.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
6cba5377f5 iotests: Split 214 off of 122
Commit abd3622cc0 added a case to 122
regarding how the qcow2 driver handles an incorrect compressed data
length value.  This does not really fit into 122, as that file is
supposed to contain qemu-img convert test cases, which this case is not.
So this patch splits it off into its own file; maybe we will even get
more qcow2-only compression tests in the future.

Also, that test case does not work with refcount_bits=1, so mark that
option as unsupported.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180406164108.26118-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Kevin Wolf
bd21935b50 blockjob: Add block_job_driver()
The backup block job directly accesses the driver field in BlockJob. Add
a wrapper for getting it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:50 +02:00
Kevin Wolf
dee81d5111 blockjob: Introduce block_job_ratelimit_get_delay()
This gets us rid of more direct accesses to BlockJob fields from the
job drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:50 +02:00