mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Vladimir Sementsov-Ogievskiy	319bd5edb9	block/backup: deal with zero detection We have detect_zeroes option, so at least for blockdev-backup user should define it if zero-detection is needed. For drive-backup leave detection enabled by default but do it through existing option instead of open-coding. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190730163251.755248-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 18:29:43 -04:00
Vladimir Sementsov-Ogievskiy	590a63d598	qapi: add dirty-bitmaps to query-named-block-nodes result Let's add a possibility to query dirty-bitmaps not only on root nodes. It is useful when dealing both with snapshots and incremental backups. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 20190717173937.18747-1-jsnow@redhat.com [Added deprecation information. --js] Signed-off-by: John Snow <jsnow@redhat.com> [Fixed spelling --js]	2019-08-16 18:29:43 -04:00
John Snow	1a2b8b406b	block/backup: support bitmap sync modes for non-bitmap backups Accept bitmaps and sync policies for the other backup modes. This allows us to do things like create a bitmap synced to a full backup without a transaction, or start a resumable backup process. Some combinations don't make sense, though: - NEVER policy combined with any non-BITMAP mode doesn't do anything, because the bitmap isn't used for input or output. It's harmless, but is almost certainly never what the user wanted. - sync=NONE is more questionable. It can't use on-success because this job never completes with success anyway, and the resulting artifact of 'always' is suspect: because we start with a full bitmap and only copy out segments that get written to, the final output bitmap will always be ... a fully set bitmap. Maybe there's contexts in which bitmaps make sense for sync=none, but not without more severe changes to the current job, and omitting it here doesn't prevent us from adding it later. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716000117.25219-11-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 18:29:43 -04:00
John Snow	7e30dd618e	block/backup: teach TOP to never copy unallocated regions Presently, If sync=TOP is selected, we mark the entire bitmap as dirty. In the write notifier handler, we dutifully copy out such regions. Fix this in three parts: 1. Mark the bitmap as being initialized before the first yield. 2. After the first yield but before the backup loop, interrogate the allocation status asynchronously and initialize the bitmap. 3. Teach the write notifier to interrogate allocation status if it is invoked during bitmap initialization. As an effect of this patch, the job progress for TOP backups now behaves like this: - total progress starts at bdrv_length. - As allocation status is interrogated, total progress decreases. - As blocks are copied, current progress increases. Taken together, the floor and ceiling move to meet each other. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 20190716000117.25219-10-jsnow@redhat.com [Remove ret = -ECANCELED change. --js] [Squash in conflict resolution based on Max's patch --js] Message-id: c8b0ab36-79c8-0b4b-3193-4e12ed8c848b@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	dba8700f16	block/backup: add backup_is_cluster_allocated Modify bdrv_is_unallocated_range to utilize the pnum return from bdrv_is_allocated, and in the process change the semantics from "is unallocated" to "is allocated." Optionally returns a full number of clusters that share the same allocation status. This will be used to carefully toggle bits in the bitmap for sync=top initialization in the following commits. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716000117.25219-9-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	141cdcdf84	block/backup: centralize copy_bitmap initialization Just a few housekeeping changes that keeps the following commit easier to read; perform the initial copy_bitmap initialization in one place. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716000117.25219-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	0fff1f1371	block/backup: improve sync=bitmap work estimates When making backups based on bitmaps, the work estimate can be more accurate. Update iotests to reflect the new strategy. TOP work estimates are broken, but do not get worse with this commit. That issue is addressed in the following commits instead. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716000117.25219-7-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	a6c9365ad4	block/backup: hoist bitmap check into QMP interface This is nicer to do in the unified QMP interface that we have now, because it lets us use the right terminology back at the user. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716000117.25219-5-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	c4e4b0fa59	qapi: implement block-dirty-bitmap-remove transaction action It is used to do transactional movement of the bitmap (which is possible in conjunction with merge command). Transactional bitmap movement is needed in scenarios with external snapshot, when we don't want to leave copy of the bitmap in the base image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190708220502.12977-3-jsnow@redhat.com [Edited "since" version to 4.2 --js] Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	b30ffbef53	block/backup: loosen restriction on readonly bitmaps With the "never" sync policy, we actually can utilize readonly bitmaps now. Loosen the check at the QMP level, and tighten it based on provided arguments down at the job creation level instead. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-19-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	c23909e530	block/backup: add 'always' bitmap sync policy This adds an "always" policy for bitmap synchronization. Regardless of if the job succeeds or fails, the bitmap is always synchronized. This means that for backups that fail part-way through, the bitmap retains a record of which sectors need to be copied out to accomplish a new backup using the old, partial result. In effect, this allows us to "resume" a failed backup; however the new backup will be from the new point in time, so it isn't a "resume" as much as it is an "incremental retry." This can be useful in the case of extremely large backups that fail considerably through the operation and we'd like to not waste the work that was already performed. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-13-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	62aa1fbeac	block/backup: upgrade copy_bitmap to BdrvDirtyBitmap This simplifies some interface matters; namely the initialization and (later) merging the manifest back into the sync_bitmap if it was provided. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-12-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	28636b8211	block/dirty-bitmap: add bdrv_dirty_bitmap_get Add a public interface for get. While we're at it, rename "bdrv_get_dirty_bitmap_locked" to "bdrv_dirty_bitmap_get_locked". (There are more functions to rename to the bdrv_dirty_bitmap_VERB form, but they will wait until the conclusion of this series.) Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-11-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	b7661ca5d8	block/dirty-bitmap: add bdrv_dirty_bitmap_merge_internal I'm surprised it didn't come up sooner, but sometimes we have a +busy bitmap as a source. This is dangerous from the QMP API, but if we are the owner that marked the bitmap busy, it's safe to merge it using it as a read only source. It is not safe in the general case to allow users to read from in-use bitmaps, so create an internal variant that foregoes the safety checking. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-10-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	cf0cd293c6	block/backup: add 'never' policy to bitmap sync mode This adds a "never" policy for bitmap synchronization. Regardless of if the job succeeds or fails, we never update the bitmap. This can be used to perform differential backups, or simply to avoid the job modifying a bitmap. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-7-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	c8b5650178	block/backup: Add mirror sync mode 'bitmap' We don't need or want a new sync mode for simple differences in semantics. Create a new mode simply named "BITMAP" that is designed to make use of the new Bitmap Sync Mode field. Because the only bitmap sync mode is 'on-success', this adds no new functionality to the backup job (yet). The old incremental backup mode is maintained as a syntactic sugar for sync=bitmap, mode=on-success. Add all of the plumbing necessary to support this new instruction. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-6-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
Peter Maydell	e018ccb3fb	Block layer patches: - file-posix: Fix O_DIRECT alignment detection - Fixes for concurrent block jobs - block-backend: Queue requests while drained (fix IDE vs. job crashes) - qemu-img convert: Deprecate using -n and -o together - iotests: Migration tests with filter nodes - iotests: More media change tests -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJdVnduAAoJEH8JsnLIjy/W0IgQAKft/M3aDgt0sbTzQh8vdy6A yAfTnnSL4Z56+8qAsqhEnplC3rZxvTkg9AGOoNYHOZKl3FgRH9r8g9/Enemh4fWu MH52hiRf2ytlFVurIQal3aj9O+i0YTnzuvYbysvkH4ID5zbv2QnwdagtEcBxbbYL NZTMZBynDzp4rKIZ7p6T/kkaklLHh4vZrjW+Mzm3LQx9JJr8TwVNqqetSfc4VKIJ ByaNbbihDUVjQyIaJ24DXXJdzonGrrtSbSZycturc5FzXymzSRgrXZCeSKCs8X+i fjwMXH5v4/UfK511ILsXiumeuxBfD2Ck4sAblFxVo06oMPRNmsAKdRLeDByE7IC1 lWep/pB3y/au9CW2/pkWJOiaz5s5iuv2fFYidKUJ0KQ1dD7G8M9rzkQlV3FUmTZO jBKSxHEffXsYl0ojn0vGmZEd7FAPi3fsZibGGws1dVgxlWI93aUJsjCq0E+lHIRD hEmQcjqZZa4taKpj0Y3Me05GkL7tH6RYA153jDNb8rPdzriGRCLZSObEISrOJf8H Mh0gTLi8KJNh6bULd12Ake1tKn7ZeTXpHH+gadz9OU7eIModh1qYTSHPlhy5oAv0 Hm9BikNlS1Hzw+a+EbLcOW7TrsteNeGr7r8T6QKPMq1sfsYcp3svbC2c+zVlQ6Ll mLoTssksXOkgBevVqSiS =T7L5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - file-posix: Fix O_DIRECT alignment detection - Fixes for concurrent block jobs - block-backend: Queue requests while drained (fix IDE vs. job crashes) - qemu-img convert: Deprecate using -n and -o together - iotests: Migration tests with filter nodes - iotests: More media change tests # gpg: Signature made Fri 16 Aug 2019 10:29:18 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: file-posix: Handle undetectable alignment qemu-img convert: Deprecate using -n and -o together block-backend: Queue requests while drained mirror: Keep mirror_top_bs drained after dropping permissions block: Remove blk_pread_unthrottled() iotests: Add test for concurrent stream/commit tests: Test mid-drain bdrv_replace_child_noperm() tests: Test polling in bdrv_drop_intermediate() block: Reduce (un)drains when replacing a child block: Keep subtree drained in drop_intermediate block: Simplify bdrv_filter_default_perms() iotests: Test migration with all kinds of filter nodes iotests: Move migration helpers to iotests.py iotests/118: Add -blockdev based tests iotests/118: Create test classes dynamically iotests/118: Test media change for scsi-cd Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-08-16 16:43:46 +01:00
Peter Maydell	c6a2225a5a	nbd patches for 2019-08-15 - Addition of InetSocketAddress keep-alive - Addition of BDRV_REQ_PREFETCH for more efficient copy-on-read - Initial refactoring in preparation of NBD reconnect -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJdVaRZAAoJEKeha0olJ0NqrGoIAJSvVLMDeWZIkHr3CQ5AbMHy 6IHUntBwv4PEHw0FyyDU7lLgEWubTwe/7RfvyJ69kQYSJLjvHa3KEic0aa7SOETK hGUlSoIFHEugi+XDcYyy9EG+ItUR7jnunkwomxvFRm4XzjEHFO9ck8fOS+uq/23e LGDHwdoZI6vawUPftbBuRAlB3egCEcBtTWXYMk8lm3MXHOHL7O18DRkfWvwcHfl6 mNIKgTVMtl1gYoJznCUmC5VLHL4jQy+kSNXnyHBQOEEvTcORu0EztJS81H+BODni sxa9seem7JL9NLUTmkJsbGfSM6RKdfypX34oik9yakqUnXRrlxkxI+IX26XfdQ4= =2MAO -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-08-15' into staging nbd patches for 2019-08-15 - Addition of InetSocketAddress keep-alive - Addition of BDRV_REQ_PREFETCH for more efficient copy-on-read - Initial refactoring in preparation of NBD reconnect # gpg: Signature made Thu 15 Aug 2019 19:28:41 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2019-08-15: block/nbd: refactor nbd connection parameters block/nbd: add cmdline and qapi parameter reconnect-delay block/nbd: move from quit to state block/nbd: use non-blocking io channel for nbd negotiation block/nbd: split connection_co start out of nbd_client_connect nbd: improve CMD_CACHE: use BDRV_REQ_PREFETCH block/stream: use BDRV_REQ_PREFETCH block: implement BDRV_REQ_PREFETCH qapi: Add InetSocketAddress member keep-alive Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-08-16 15:53:37 +01:00
Markus Armbruster	54d31236b9	sysemu: Split sysemu/runstate.h off sysemu/sysemu.h sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]	2019-08-16 13:37:36 +02:00
Markus Armbruster	d5938f29fe	Clean up inclusion of sysemu/sysemu.h In my "build everything" tree, changing sysemu/sysemu.h triggers a recompile of some 5400 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). Almost a third of its inclusions are actually superfluous. Delete them. Downgrade two more to qapi/qapi-types-run-state.h, and move one from char/serial.h to char/serial.c. hw/semihosting/config.c, monitor/monitor.c, qdev-monitor.c, and stubs/semihost.c define variables declared in sysemu/sysemu.h without including it. The compiler is cool with that, but include it anyway. This doesn't reduce actual use much, as it's still included into widely included headers. The next commit will tackle that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-27-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>	2019-08-16 13:31:53 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	13d4ff07e8	trace: Do not include qom/cpu.h into generated trace.h docs/devel/tracing.txt explains "since many source files include trace.h, [the generated trace.h use] a minimum of types and other header files included to keep the namespace clean and compile times and dependencies down." Commit `4815185902` "trace: Add per-vCPU tracing states for events with the 'vcpu' property" made them all include qom/cpu.h via control-internal.h. qom/cpu.h in turn includes about thirty headers. Ouch. Per-vCPU tracing is currently not supported in sub-directories' trace-events. In other words, qom/cpu.h can only be used in trace-root.h, not in any trace.h. Split trace/control-vcpu.h off trace/control.h and trace/control-internal.h. Have the generated trace.h include trace/control.h (which no longer includes qom/cpu.h), and trace-root.h include trace/control-vcpu.h (which includes it). The resulting improvement is a bit disappointing: in my "build everything" tree, some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h) depend on a trace.h, and about 600 of them no longer depend on qom/cpu.h. But more than 1300 others depend on trace-root.h. More work is clearly needed. Left for another day. Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-8-armbru@redhat.com>	2019-08-16 13:31:52 +02:00
Nir Soffer	a6b257a08e	file-posix: Handle undetectable alignment In some cases buf_align or request_alignment cannot be detected: 1. With Gluster, buf_align cannot be detected since the actual I/O is done on Gluster server, and qemu buffer alignment does not matter. Since we don't have alignment requirement, buf_align=1 is the best value. 2. With local XFS filesystem, buf_align cannot be detected if reading from unallocated area. In this we must align the buffer, but we don't know what is the correct size. Using the wrong alignment results in I/O error. 3. With Gluster backed by XFS, request_alignment cannot be detected if reading from unallocated area. In this case we need to use the correct alignment, and failing to do so results in I/O errors. 4. With NFS, the server does not use direct I/O, so both buf_align cannot be detected. In this case we don't need any alignment so we can use buf_align=1 and request_alignment=1. These cases seems to work when storage sector size is 512 bytes, because the current code starts checking align=512. If the check succeeds because alignment cannot be detected we use 512. But this does not work for storage with 4k sector size. To determine if we can detect the alignment, we probe first with align=1. If probing succeeds, maybe there are no alignment requirement (cases 1, 4) or we are probing unallocated area (cases 2, 3). Since we don't have any way to tell, we treat this as undetectable alignment. If probing with align=1 fails with EINVAL, but probing with one of the expected alignments succeeds, we know that we found a working alignment. Practically the alignment requirements are the same for buffer alignment, buffer length, and offset in file. So in case we cannot detect buf_align, we can use request alignment. If we cannot detect request alignment, we can fallback to a safe value. To use this logic, we probe first request alignment instead of buf_align. Here is a table showing the behaviour with current code (the value in parenthesis is the optimal value). Case Sector buf_align (opt) request_alignment (opt) result ====================================================================== 1 512 512 (1) 512 (512) OK 1 4096 512 (1) 4096 (4096) FAIL ---------------------------------------------------------------------- 2 512 512 (512) 512 (512) OK 2 4096 512 (4096) 4096 (4096) FAIL ---------------------------------------------------------------------- 3 512 512 (1) 512 (512) OK 3 4096 512 (1) 512 (4096) FAIL ---------------------------------------------------------------------- 4 512 512 (1) 512 (1) OK 4 4096 512 (1) 512 (1) OK Same cases with this change: Case Sector buf_align (opt) request_alignment (opt) result ====================================================================== 1 512 512 (1) 512 (512) OK 1 4096 4096 (1) 4096 (4096) OK ---------------------------------------------------------------------- 2 512 512 (512) 512 (512) OK 2 4096 4096 (4096) 4096 (4096) OK ---------------------------------------------------------------------- 3 512 4096 (1) 4096 (512) OK 3 4096 4096 (1) 4096 (4096) OK ---------------------------------------------------------------------- 4 512 4096 (1) 4096 (1) OK 4 4096 4096 (1) 4096 (1) OK I tested that provisioning VMs and copying disks on local XFS and Gluster with 4k bytes sector size work now, resolving bugs [1],[2]. I tested also on XFS, NFS, Gluster with 512 bytes sector size. [1] https://bugzilla.redhat.com/1737256 [2] https://bugzilla.redhat.com/1738657 Signed-off-by: Nir Soffer <nsoffer@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-08-16 11:29:11 +02:00
Kevin Wolf	cf3129323f	block-backend: Queue requests while drained This fixes devices like IDE that can still start new requests from I/O handlers in the CPU thread while the block backend is drained. The basic assumption is that in a drain section, no new requests should be allowed through a BlockBackend (blk_drained_begin/end don't exist, we get drain sections only on the node level). However, there are two special cases where requests should not be queued: 1. Block jobs: We already make sure that block jobs are paused in a drain section, so they won't start new requests. However, if the drain_begin is called on the job's BlockBackend first, it can happen that we deadlock because the job stays busy until it reaches a pause point - which it can't if its requests aren't processed any more. The proper solution here would be to make all requests through the job's filter node instead of using a BlockBackend. For now, just disabling request queuing on the job BlockBackend is simpler. 2. In test cases where making requests through bdrv_* would be cumbersome because we'd need a BdrvChild. As we already got the functionality to disable request queuing from 1., use it in tests, too, for convenience. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-08-16 10:25:16 +02:00
Kevin Wolf	d2da5e288a	mirror: Keep mirror_top_bs drained after dropping permissions mirror_top_bs is currently implicitly drained through its connection to the source or the target node. However, the drain section for target_bs ends early after moving mirror_top_bs from src to target_bs, so that requests can already be restarted while mirror_top_bs is still present in the chain, but has dropped all permissions and therefore runs into an assertion failure like this: qemu-system-x86_64: block/io.c:1634: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed. Keep mirror_top_bs drained until all graph changes have completed. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-08-16 10:25:16 +02:00
Kevin Wolf	421919d76b	block: Remove blk_pread_unthrottled() The functionality offered by blk_pread_unthrottled() goes back to commit `498e386c58`. Then, we couldn't perform I/O throttling with synchronous requests because timers wouldn't be executed in polling loops. So the commit automatically disabled I/O throttling as soon as a synchronous request was issued. However, for geometry detection during disk initialisation, we always used (and still use) synchronous requests even if guest requests use AIO later. Geometry detection was not wanted to disable I/O throttling, so bdrv_pread_unthrottled() was introduced which disabled throttling only temporarily. All of this isn't necessary any more because we do run timers in polling loop and even synchronous requests are now using coroutine infrastructure internally. For this reason, commit `90c78624f` already removed the automatic disabling of I/O throttling. It's time to get rid of the workaround for the removed code, and its abuse of blk_root_drained_begin()/end(), as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-08-16 10:25:16 +02:00
Vladimir Sementsov-Ogievskiy	8f071c9db5	block/nbd: refactor nbd connection parameters We'll need some connection parameters to be available all the time to implement nbd reconnect. So, let's refactor them: define additional parameters in BDRVNBDState, drop them from function parameters, drop nbd_client_init and separate options parsing instead from nbd_open. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190618114328.55249-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: Drop useless 'if' before object_unref] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-08-15 13:22:14 -05:00
Vladimir Sementsov-Ogievskiy	b172ae2e0e	block/nbd: add cmdline and qapi parameter reconnect-delay Reconnect will be implemented in the following commit, so for now, in semantics below, disconnect itself is a "serious error". Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190618114328.55249-5-vsementsov@virtuozzo.com> [eblake: slipped from 4.1 to 4.2] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-08-15 13:22:14 -05:00
Vladimir Sementsov-Ogievskiy	a34b1e5e06	block/nbd: move from quit to state To implement reconnect we need several states for the client: CONNECTED, QUIT and two different CONNECTING states. CONNECTING states will be added in the following patches. This patch implements CONNECTED and QUIT. QUIT means, that we should close the connection and fail all current and further requests (like old quit = true). CONNECTED means that connection is ok, we can send requests (like old quit = false). For receiving loop we use a comparison of the current state with QUIT, because reconnect will be in the same loop, so it should be looping until the end. Opposite, for requests we use a comparison of the current state with CONNECTED, as we don't want to send requests in future CONNECTING states. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190618114328.55249-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-08-15 13:22:14 -05:00
Vladimir Sementsov-Ogievskiy	a8e2bb6a76	block/nbd: use non-blocking io channel for nbd negotiation No reason to use blocking channel for negotiation and we'll benefit in further reconnect feature, as qio_channel reads and writes will do qemu_coroutine_yield while waiting for io completion. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190618114328.55249-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-08-15 13:22:14 -05:00
Vladimir Sementsov-Ogievskiy	962b7b3d4c	block/nbd: split connection_co start out of nbd_client_connect nbd_client_connect is going to be used from connection_co, so, let's refactor nbd_client_connect in advance, leaving io channel configuration all in nbd_client_connect. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190618114328.55249-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-08-15 13:22:13 -05:00
Vladimir Sementsov-Ogievskiy	99136607b1	block/stream: use BDRV_REQ_PREFETCH This helps to avoid extra io, allocations and memory copying. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190725100550.33801-3-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: fix comment grammar] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-08-15 13:22:13 -05:00
Vladimir Sementsov-Ogievskiy	3299e5ecf7	block: implement BDRV_REQ_PREFETCH Do effective copy-on-read request when we don't need data actually. It will be used for block-stream and NBD_CMD_CACHE. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190725100550.33801-2-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: comment grammar fix] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-08-15 13:22:13 -05:00
Vladimir Sementsov-Ogievskiy	110571be4e	block/backup: disable copy_range for compressed backup Enabled by default copy_range ignores compress option. It's definitely unexpected for user. It's broken since introduction of copy_range usage in backup in `9ded4a0114`. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190730163251.755248-3-vsementsov@virtuozzo.com Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-06 13:17:27 +02:00
Max Reitz	9adc1cb49a	mirror: Only mirror granularity-aligned chunks In write-blocking mode, all writes to the top node directly go to the target. We must only mirror chunks of data that are aligned to the job's granularity, because that is how the dirty bitmap works. Therefore, the request alignment for writes must be the job's granularity (in write-blocking mode). Unfortunately, this forces all reads and writes to have the same granularity (we only need this alignment for writes to the target, not the source), but that is something to be fixed another time. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190805153308.2657-1-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Fixes: `d06107ade0` Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-06 13:17:25 +02:00
Max Reitz	4a5b91ca02	backup: Copy only dirty areas The backup job must only copy areas that the copy_bitmap reports as dirty. This is always the case when using traditional non-offloading backup, because it copies each cluster separately. When offloading the copy operation, we sometimes copy more than one cluster at a time, but we only check whether the first one is dirty. Therefore, whenever copy offloading is possible, the backup job currently produces wrong output when the guest writes to an area of which an inner part has already been backed up, because that inner part will be re-copied. Fixes: `9ded4a0114` Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190801173900.23851-2-mreitz@redhat.com Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-06 13:17:01 +02:00
Max Reitz	1120407bdf	nvme: Limit blkshift to 12 (for 4 kB blocks) Linux does not support blocks greater than 4 kB anyway, so we might as well limit blkshift to 12 and thus save us from some potential trouble. Reported-by: Peter Maydell <peter.maydell@linaro.org> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190730114812.10493-1-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Coverity: CID 1403771 Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-30 14:49:24 +02:00
Kevin Wolf	2b23f28639	block/copy-on-read: Fix permissions for inactive node The copy-on-read drive must not request the WRITE_UNCHANGED permission for its child if the node is inactive, otherwise starting a migration destination with -incoming will fail because the child cannot provide write access yet: qemu-system-x86_64: -blockdev copy-on-read,file=img,node-name=cor: Block node is read-only Earlier QEMU versions additionally ran into an abort() on the migration source side: bdrv_inactivate_recurse() failed to update permissions. This is silently ignored today because it was only supposed to loosen restrictions. This is the symptom that was originally reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1733022 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-07-30 12:25:43 +02:00
Max Reitz	65181d6381	block: Dec. drained_end_counter before bdrv_wakeup Decrementing drained_end_counter after bdrv_dec_in_flight() (which in turn invokes bdrv_wakeup() and thus aio_wait_kick()) is not very clever. We should decrement it beforehand, so that any waiting aio_poll() that is woken by bdrv_dec_in_flight() sees the decremented drained_end_counter. Because the time window between decrementing drained_end_counter and aio_wait_kick() is very small, I cannot supply a reliable regression test. However, running e.g. the /bdrv-drain/blockjob/iothread/drain_all test in test-bdrv-drain has a small chance of hanging without this patch (about 1/200 or so; it gets to nearly 100 % if you add e.g. an fputc(' ', stderr); after the bdrv_dec_in_flight()). Fixes: `e037c09c78` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190722133054.21781-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-22 18:41:35 +02:00
Maxim Levitsky	258867d1dc	block/nvme: don't touch the completion entries Completion entries are meant to be only read by the host and written by the device. The driver is supposed to scan the completions from the last point where it left, and until it sees a completion with non flipped phase bit. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716163020.13383-4-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-22 18:40:32 +02:00
Maxim Levitsky	118d1b6a81	block/nvme: support larger that 512 bytes sector devices Currently the driver hardcodes the sector size to 512, and doesn't check the underlying device. Fix that. Also fail if underlying nvme device is formatted with metadata as this needs special support. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190716163020.13383-3-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-22 18:40:32 +02:00
Maxim Levitsky	461bba04bf	block/nvme: fix doorbell stride Fix the math involving non standard doorbell stride Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716163020.13383-2-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-22 18:40:32 +02:00
Peter Maydell	b9e02bb3f9	nbd patches for 2019-07-19 - silence harmless compiler/valgrind warning -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJdMiVIAAoJEKeha0olJ0NqhSkH/RF4EViyGi/BW9rHHmZIKQjL h71g+Y5l0QvTHW2qkU9UCjYC7lrhTnD8r06v2qQZCk37Rb0z/y8BGpcZQVE92nPt GW5VbqWh4IdNCMUNXlfFo/U2t+hJL+BAAd8OkolKcDU4FUuN9QGkvQztBLb6Edzk oz4/NbnFsd87TRI61EjOEspTpTO6cukLRmE0HIsL8KOaYo3E7QhgxRvR45Y58sbN TvUTpI0teHzo4MiXD/yRH8oDz7zKttFwAj3E0oU9IcVTyBJXpE8lNA4J2a65KdbH S+43tlNDJA1a5+OqyfzFHcgz6nLCcntU4+5LH7YfTy753EurDVqsi7vTo4lJqHM= =uPfp -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-07-19' into staging nbd patches for 2019-07-19 - silence harmless compiler/valgrind warning # gpg: Signature made Fri 19 Jul 2019 21:17:12 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2019-07-19: nbd: Initialize reply on failure Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-07-22 10:11:28 +01:00
Eric Blake	5cf42b1c1f	nbd: Initialize reply on failure We've had two separate reports of different callers running into use of uninitialized data if s->quit is set (one detected by gcc -O3, another by valgrind), due to checking 'nbd_reply_is_simple(reply) \|\| s->quit' in the wrong order. Rather than chasing down which callers need to pre-initialize reply, and whether there are any other uninitialized uses, it's easier to guarantee that reply will always be set by nbd_co_receive_one_chunk() even on failure. The uninitialized use happens to be harmless (the only time the variable is uninitialized is if s->quit is set, so the conditional results in the same action regardless of what was read from reply), and was introduced in commit `65e01d47`. In fixing the problem, it can also be seen that all (one) callers pass in a non-NULL reply, so there is a dead conditional to also be cleaned up. Reported-by: Thomas Huth <thuth@redhat.com> Reported-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190719172001.19770-1-eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-07-19 13:19:18 -05:00
Max Reitz	61ad631cee	block: Loop unsafely in bdrv*drained_end() The graph must not change in these loops (or a QLIST_FOREACH_SAFE would not even be enough). We now ensure this by only polling once in the root bdrv_drained_end() call, so we can drop the _SAFE suffix. Doing so makes it clear that the graph must not change. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-19 13:19:17 +02:00
Max Reitz	e037c09c78	block: Do not poll in bdrv_do_drained_end() We should never poll anywhere in bdrv_do_drained_end() (including its recursive callees like bdrv_drain_invoke()), because it does not cope well with graph changes. In fact, it has been written based on the postulation that no graph changes will happen in it. Instead, the callers that want to poll must poll, i.e. all currently globally available wrappers: bdrv_drained_end(), bdrv_subtree_drained_end(), bdrv_unapply_subtree_drain(), and bdrv_drain_all_end(). Graph changes there do not matter. They can poll simply by passing a pointer to a drained_end_counter and wait until it reaches 0. This patch also adds a non-polling global wrapper for bdrv_do_drained_end() that takes a drained_end_counter pointer. We need such a variant because now no function called anywhere from bdrv_do_drained_end() must poll. This includes BdrvChildRole.drained_end(), which already must not poll according to its interface documentation, but bdrv_child_cb_drained_end() just violates that by invoking bdrv_drained_end() (which does poll). Therefore, BdrvChildRole.drained_end() must take a drained_end_counter parameter, which bdrv_child_cb_drained_end() can pass on to the new bdrv_drained_end_no_poll() function. Note that we now have a pattern of all drained_end-related functions either polling or receiving a drained_end_counter to let the caller poll based on that. A problem with a single poll loop is that when the drained section in bdrv_set_aio_context_ignore() ends, some nodes in the subgraph may be in the old contexts, while others are in the new context already. To let the collective poll in bdrv_drained_end() work correctly, we must not hold a lock to the old context, so that the old context can make progress in case it is different from the current context. (In the process, remove the comment saying that the current context is always the old context, because it is wrong.) In all other places, all nodes in a subtree must be in the same context, so we can just poll that. The exception of course is bdrv_drain_all_end(), but that always runs in the main context, so we can just poll NULL (like bdrv_drain_all_begin() does). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-19 13:19:16 +02:00
Max Reitz	f4c8a43be0	block: Make bdrv_parent_drained_[^_]*() static These functions are not used outside of block/io.c, there is no reason why they should be globally available. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-19 13:19:16 +02:00
Max Reitz	8e1da77e6e	block: Add @drained_end_counter Callers can now pass a pointer to an integer that bdrv_drain_invoke() (and its recursive callees) will increment for every bdrv_drain_invoke_entry() operation they schedule. bdrv_drain_invoke_entry() in turn will decrement it once it has invoked BlockDriver.bdrv_co_drain_end(). We use atomic operations to access the pointee, because the bdrv_do_drained_end() caller may wish to end drained sections for multiple nodes in different AioContexts (bdrv_drain_all_end() does, for example). This is the first step to moving the polling for BdrvCoDrainData.done to become true out of bdrv_drain_invoke() and into the root drained_end function. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-19 13:19:16 +02:00
Max Reitz	804db8ea00	block: Introduce BdrvChild.parent_quiesce_counter Commit `5cb2737e92` laid out why bdrv_do_drained_end() must decrement the quiesce_counter after bdrv_drain_invoke(). It did not give a very good reason why it has to happen after bdrv_parent_drained_end(), instead only claiming symmetry to bdrv_do_drained_begin(). It turns out that delaying it for so long is wrong. Situation: We have an active commit job (i.e. a mirror job) from top to base for the following graph: filter \| [file] \| v top --[backing]--> base Now the VM is closed, which results in the job being cancelled and a bdrv_drain_all() happening pretty much simultaneously. Beginning the drain means the job is paused once whenever one of its nodes is quiesced. This is reversed when the drain ends. With how the code currently is, after base's drain ends (which means that it will have unpaused the job once), its quiesce_counter remains at 1 while it goes to undrain its parents (bdrv_parent_drained_end()). For some reason or another, undraining filter causes the job to be kicked and enter mirror_exit_common(), where it proceeds to invoke block_job_remove_all_bdrv(). Now base will be detached from the job. Because its quiesce_counter is still 1, it will unpause the job once more. So in total, undraining base will unpause the job twice. Eventually, this will lead to the job's pause_count going negative -- well, it would, were there not an assertion against this, which crashes qemu. The general problem is that if in bdrv_parent_drained_end() we undrain parent A, and then undrain parent B, which then leads to A detaching the child, bdrv_replace_child_noperm() will undrain A as if we had not done so yet; that is, one time too many. It follows that we cannot decrement the quiesce_counter after invoking bdrv_parent_drained_end(). Unfortunately, decrementing it before bdrv_parent_drained_end() would be wrong, too. Imagine the above situation in reverse: Undraining A leads to B detaching the child. If we had already decremented the quiesce_counter by that point, bdrv_replace_child_noperm() would undrain B one time too little; because it expects bdrv_parent_drained_end() to issue this undrain. But bdrv_parent_drained_end() won't do that, because B is no longer a parent. Therefore, we have to do something else. This patch opts for introducing a second quiesce_counter that counts how many times a child's parent has been quiesced (though c->role->drained_*). With that, bdrv_replace_child_noperm() just has to undrain the parent exactly that many times when removing a child, and it will always be right. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-19 13:19:16 +02:00
Peter Maydell	697f59243f	* VFIO bugfix for AMD SEV (Alex) * Kconfig improvements (Julio, Philippe) * MemoryRegion reference counting bugfix (King Wang) * Build system cleanups (Marc-André, myself) * rdmacm-mux off-by-one (Marc-André) * ZBC passthrough fixes (Shinichiro, myself) * WHPX build fix (Stefan) * char-pty fix (Wei Yang) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJdLX1PAAoJEL/70l94x66DQ1YH/im8BbRRRPsm3Qg55fTolcWN 0+dm/Vfv2P7nfxDMsZ4S+jrvCaCWOZb6ua75TdB74VIXpJTGPU7a3JxyTzRueP+2 c4WH3owT8x9e4iyLNGZoIDAKtJXLSX6FInjHKTkupLVbs2UpAh0Mipq4zIoIambl wf83jFmJ6KCemayE9gfw8Z45YTJcLceIOLaEyXgqrPoHXTmerEj5ZMMIqEMag3W/ dKszhVjRb6En5Ldn0jEqeC5fU10tKIs+y7VNwdJ8CZw41daBDiXDVmXemJyTF/Xn SYJCwrJUSdVU42AE2xXCpBfANCh7eGyg4loCitLv8Z393tN7bRufULsnM/rEreI= =tEVO -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * VFIO bugfix for AMD SEV (Alex) * Kconfig improvements (Julio, Philippe) * MemoryRegion reference counting bugfix (King Wang) * Build system cleanups (Marc-André, myself) * rdmacm-mux off-by-one (Marc-André) * ZBC passthrough fixes (Shinichiro, myself) * WHPX build fix (Stefan) * char-pty fix (Wei Yang) # gpg: Signature made Tue 16 Jul 2019 08:31:27 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: vl: make sure char-pty message displayed by moving setbuf to the beginning create_config: remove $(CONFIG_SOFTMMU) hack Makefile: do not repeat $(CONFIG_SOFTMMU) in hw/Makefile.objs hw/usb/Kconfig: USB_XHCI_NEC requires USB_XHCI hw/usb/Kconfig: Add CONFIG_USB_EHCI_PCI target/i386: sev: Do not unpin ram device memory region checkpatch: detect doubly-encoded UTF-8 hw/lm32/Kconfig: Milkymist One provides a USB 1.1 Controller util: merge main-loop.c and iohandler.c Fix broken build with WHPX enabled memory: unref the memory region in simplify flatview hw/i386: turn off vmport if CONFIG_VMPORT is disabled rdmacm-mux: fix strcpy string warning build-sys: remove slirp cflags from main-loop.o iscsi: base all handling of check condition on scsi_sense_to_errno iscsi: fix busy/timeout/task set full scsi: add guest-recoverable ZBC errors scsi: explicitly list guest-recoverable sense codes scsi-disk: pass sense correctly for guest-recoverable errors Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-07-16 15:08:29 +01:00
Stefano Garzarella	0b1847bbc2	gluster: fix .bdrv_reopen_prepare when backing file is a JSON object When the backing_file is specified as a JSON object, the qemu_gluster_reopen_prepare() fails with this message: invalid URI json:{"server.0.host": ...} In this case, we should call qemu_gluster_init() using the QDict 'state->options' that contains the JSON parameters already parsed. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1542445 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190715132844.506584-1-sgarzare@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-15 15:48:41 +02:00
Max Reitz	8441d82d51	block/stream: Swap backing file change order bdrv_change_backing_file() can result in yields. Therefore, @base may no longer be the the backing_bs() of s->bottom afterwards. Just swap the order of the two calls to fix this. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190703172813.6868-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-15 15:48:40 +02:00
Max Reitz	17a7c39248	block/stream: Fix error path As of commit `c624b015bf`, the stream job only freezes the chain until the overlay of the base node. The error path must consider this. Fixes: `c624b015bf` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190703172813.6868-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-15 15:48:40 +02:00
Max Reitz	e5182c1c57	block: Add BDS.never_freeze The commit and the mirror block job must be able to drop their filter node at any point. However, this will not be possible if any of the BdrvChild links to them is frozen. Therefore, we need to prevent them from ever becoming frozen. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190703172813.6868-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-15 15:48:40 +02:00
Michal Privoznik	95667c3be0	nvme: Set number of queues later in nvme_init() When creating the admin queue in nvme_init() the variable that holds the number of queues created is modified before actual queue creation. This is a problem because if creating the queue fails then the variable is left in inconsistent state. This was actually observed when I tried to hotplug a nvme disk. The control got to nvme_file_open() which called nvme_init() which failed and thus nvme_close() was called which in turn called nvme_free_queue_pair() with queue being NULL. This lead to an instant crash: #0 0x000055d9507ec211 in nvme_free_queue_pair (bs=0x55d952ddb880, q=0x0) at block/nvme.c:164 #1 0x000055d9507ee180 in nvme_close (bs=0x55d952ddb880) at block/nvme.c:729 #2 0x000055d9507ee3d5 in nvme_file_open (bs=0x55d952ddb880, options=0x55d952bb1410, flags=147456, errp=0x7ffd8e19e200) at block/nvme.c:781 #3 0x000055d9507629f3 in bdrv_open_driver (bs=0x55d952ddb880, drv=0x55d95109c1e0 <bdrv_nvme>, node_name=0x0, options=0x55d952bb1410, open_flags=147456, errp=0x7ffd8e19e310) at block.c:1291 #4 0x000055d9507633d6 in bdrv_open_common (bs=0x55d952ddb880, file=0x0, options=0x55d952bb1410, errp=0x7ffd8e19e310) at block.c:1551 #5 0x000055d950766881 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d952bb1410, flags=32768, parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, errp=0x7ffd8e19e510) at block.c:3063 #6 0x000055d950765ae4 in bdrv_open_child_bs (filename=0x0, options=0x55d9541cdff0, bdref_key=0x55d950af33aa "file", parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, allow_none=true, errp=0x7ffd8e19e510) at block.c:2712 #7 0x000055d950766633 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d9541cdff0, flags=0, parent=0x0, child_role=0x0, errp=0x7ffd8e19e908) at block.c:3011 #8 0x000055d950766dba in bdrv_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block.c:3156 #9 0x000055d9507cb635 in blk_new_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block/block-backend.c:389 #10 0x000055d950465ec5 in blockdev_init (file=0x0, bs_opts=0x55d953d00390, errp=0x7ffd8e19e908) at blockdev.c:602 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Message-id: 927aae40b617ba7d4b6c7ffe74e6d7a2595f8e86.1562770546.git.mprivozn@redhat.com Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-15 15:48:40 +02:00
Paolo Bonzini	8c460269aa	iscsi: base all handling of check condition on scsi_sense_to_errno Now that scsi-disk is not using scsi_sense_to_errno to separate guest-recoverable sense codes, we can modify it to simplify iscsi's own sense handling. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-07-15 11:20:42 +02:00
Paolo Bonzini	00e3cccdf4	iscsi: fix busy/timeout/task set full In this case, do_retry was set without calling aio_co_wake, thus never waking up the coroutine. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-07-15 11:20:42 +02:00
Maxim Levitsky	867eccfed8	file-posix: Use max transfer length/segment count only for SCSI passthrough Regular kernel block devices (/dev/sda, /dev/nvme, etc) don't have max segment size/max segment count hardware requirements exposed to the userspace, but rather the kernel block layer takes care to split the incoming requests that violate these requirements. Allowing the kernel to do the splitting allows qemu to avoid various overheads that arise otherwise from this. This is especially visible in nbd server, exposing as a raw file, a mostly empty qcow2 image over the net. In this case most of the reads by the remote user won't even hit the underlying kernel block device, and therefore most of the overhead will be in the nbd traffic which increases significantly with lower max transfer size. In addition to that even for local block device access the peformance improves a bit due to less traffic between qemu and the kernel when large transfer sizes are used (e.g for image conversion) More info can be found at: https://bugzilla.redhat.com/show_bug.cgi?id=1647104 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-12 15:42:23 +02:00
Eric Blake	f7077c9860	qcow2: Allow -o compat=v3 during qemu-img amend Commit `b76b4f60` allowed '-o compat=v3' as an alias for the less-appealing '-o compat=1.1' for 'qemu-img create' since we want to use the QMP form as much as possible, but forgot to do likewise for qemu-img amend. Also, it doesn't help that '-o help' doesn't list our new preferred spellings. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-08 16:00:31 +02:00
John Snow	197bfa7da7	block/qcow: Improve error when opening qcow2 files as qcow Reported-by: radmehrsaeed7@gmail.com Fixes: https://bugs.launchpad.net/bugs/1832914 Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-08 16:00:26 +02:00
Andrey Shinkevich	c624b015bf	block/stream: introduce a bottom node The bottom node is the intermediate block device that has the base as its backing image. It is used instead of the base node while a block stream job is running to avoid dependency on the base that may change due to the parallel jobs. The change may take place due to a filter node as well that is inserted between the base and the intermediate bottom node. It occurs when the base node is the top one for another commit or stream job. After the introduction of the bottom node, don't freeze its backing child, that's the base, anymore. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1559152576-281803-4-git-send-email-andrey.shinkevich@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-02 03:53:05 +02:00
Andrey Shinkevich	96a07d5bf4	block/stream: refactor stream_run: drop goto The goto is unnecessary in the stream_run() since the common exit code was removed in the commit `eb23654dbe`: "jobs: utilize job_exit shim". Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1559152576-281803-3-git-send-email-andrey.shinkevich@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-02 03:53:04 +02:00
Andrey Shinkevich	170d3bd341	block: include base when checking image chain for block allocation This patch is used in the 'block/stream: introduce a bottom node' that is following. Instead of the base node, the caller may pass the node that has the base as its backing image to the function bdrv_is_allocated_above() with a new parameter include_base = true and get rid of the dependency on the base that may change during commit/stream parallel jobs. Now, if the specified base is not found in the backing image chain, the QEMU will abort. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1559152576-281803-2-git-send-email-andrey.shinkevich@virtuozzo.com [mreitz: Squashed in the following as a rebase on conflicting patches:] Message-id: e3cf99ae-62e9-8b6e-5a06-d3c8b9363b85@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-02 03:53:04 +02:00
Stefano Garzarella	d24f80234b	block/rbd: increase dynamically the image size RBD APIs don't allow us to write more than the size set with rbd_create() or rbd_resize(). In order to support growing images (eg. qcow2), we resize the image before write operations that exceed the current size. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190509145927.293369-1-sgarzare@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-02 03:53:04 +02:00
Pino Toscano	b10d49d761	ssh: switch from libssh2 to libssh Rewrite the implementation of the ssh block driver to use libssh instead of libssh2. The libssh library has various advantages over libssh2: - easier API for authentication (for example for using ssh-agent) - easier API for known_hosts handling - supports newer types of keys in known_hosts Use APIs/features available in libssh 0.8 conditionally, to support older versions (which are not recommended though). Adjust the iotest 207 according to the different error message, and to find the default key type for localhost (to properly compare the fingerprint with). Contributed-by: Max Reitz <mreitz@redhat.com> Adjust the various Docker/Travis scripts to use libssh when available instead of libssh2. The mingw/mxe testing is dropped for now, as there are no packages for it. Signed-off-by: Pino Toscano <ptoscano@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20190620200840.17655-1-ptoscano@redhat.com Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-06-24 16:01:04 +02:00
Sam Eiderman	98eb9733f4	vmdk: Add read-only support for seSparse snapshots Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in QEMU). This format was lacking in the following: * Grain directory (L1) and grain table (L2) entries were 32-bit, allowing access to only 2TB (slightly less) of data. * The grain size (default) was 512 bytes - leading to data fragmentation and many grain tables. * For space reclamation purposes, it was necessary to find all the grains which are not pointed to by any grain table - so a reverse mapping of "offset of grain in vmdk" to "grain table" must be constructed - which takes large amounts of CPU/RAM. The format specification can be found in VMware's documentation: https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf In ESXi 6.5, to support snapshot files larger than 2TB, a new format was introduced: SESparse (Space Efficient). This format fixes the above issues: * All entries are now 64-bit. * The grain size (default) is 4KB. * Grain directory and grain tables are now located at the beginning of the file. + seSparse format reserves space for all grain tables. + Grain tables can be addressed using an index. + Grains are located in the end of the file and can also be addressed with an index. - seSparse vmdks of large disks (64TB) have huge preallocated headers - mainly due to L2 tables, even for empty snapshots. * The header contains a reverse mapping ("backmap") of "offset of grain in vmdk" to "grain table" and a bitmap ("free bitmap") which specifies for each grain - whether it is allocated or not. Using these data structures we can implement space reclamation efficiently. * Due to the fact that the header now maintains two mappings: * The regular one (grain directory & grain tables) * A reverse one (backmap and free bitmap) These data structures can lose consistency upon crash and result in a corrupted VMDK. Therefore, a journal is also added to the VMDK and is replayed when the VMware reopens the file after a crash. Since ESXi 6.7 - SESparse is the only snapshot format available. Unfortunately, VMware does not provide documentation regarding the new seSparse format. This commit is based on black-box research of the seSparse format. Various in-guest block operations and their effect on the snapshot file were tested. The only VMware provided source of information (regarding the underlying implementation) was a log file on the ESXi: /var/log/hostd.log Whenever an seSparse snapshot is created - the log is being populated with seSparse records. Relevant log records are of the form: [...] Const Header: [...] constMagic = 0xcafebabe [...] version = 2.1 [...] capacity = 204800 [...] grainSize = 8 [...] grainTableSize = 64 [...] flags = 0 [...] Extents: [...] Header : <1 : 1> [...] JournalHdr : <2 : 2> [...] Journal : <2048 : 2048> [...] GrainDirectory : <4096 : 2048> [...] GrainTables : <6144 : 2048> [...] FreeBitmap : <8192 : 2048> [...] BackMap : <10240 : 2048> [...] Grain : <12288 : 204800> [...] Volatile Header: [...] volatileMagic = 0xcafecafe [...] FreeGTNumber = 0 [...] nextTxnSeqNumber = 0 [...] replayJournal = 0 The sizes that are seen in the log file are in sectors. Extents are of the following format: <offset : size> This commit is a strict implementation which enforces: * magics * version number 2.1 * grain size of 8 sectors (4KB) * grain table size of 64 sectors * zero flags * extent locations Additionally, this commit proivdes only a subset of the functionality offered by seSparse's format: * Read-only * No journal replay * No space reclamation * No unmap support Hence, journal header, journal, free bitmap and backmap extents are unused, only the "classic" (L1 -> L2 -> data) grain access is implemented. However there are several differences in the grain access itself. Grain directory (L1): * Grain directory entries are indexes (not offsets) to grain tables. * Valid grain directory entries have their highest nibble set to 0x1. * Since grain tables are always located in the beginning of the file - the index can fit into 32 bits - so we can use its low part if it's valid. Grain table (L2): * Grain table entries are indexes (not offsets) to grains. * If the highest nibble of the entry is: 0x0: The grain in not allocated. The rest of the bytes are 0. 0x1: The grain is unmapped - guest sees a zero grain. The rest of the bits point to the previously mapped grain, see 0x3 case. 0x2: The grain is zero. 0x3: The grain is allocated - to get the index calculate: ((entry & 0x0fff000000000000) >> 48) \| ((entry & 0x0000ffffffffffff) << 12) * The difference between 0x1 and 0x2 is that 0x1 is an unallocated grain which results from the guest using sg_unmap to unmap the grain - but the grain itself still exists in the grain extent - a space reclamation procedure should delete it. Unmapping a zero grain has no effect (0x2 will not change to 0x1) but unmapping an unallocated grain will (0x0 to 0x1) - naturally. In order to implement seSparse some fields had to be changed to support both 32-bit and 64-bit entry sizes. Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-06-24 15:53:02 +02:00
Sam Eiderman	59d6ee4850	vmdk: Reduce the max bound for L1 table size 512M of L1 entries is a very loose bound, only 32M are required to store the maximal supported VMDK file size of 2TB. Fixed qemu-iotest 59# - now failure occures before on impossible L1 table size. Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-06-24 15:53:02 +02:00
Sam Eiderman	940a2cd5d2	vmdk: Fix comment regarding max l1_size coverage Commit `b0651b8c24` ("vmdk: Move l1_size check into vmdk_add_extent") extended the l1_size check from VMDK4 to VMDK3 but did not update the default coverage in the moved comment. The previous vmdk4 calculation: (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB The added vmdk3 calculation: (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB Adding the calculation of vmdk3 to the comment. In any case, VMware does not offer virtual disks more than 2TB for vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is not implemented yet in qemu. Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com Reviewed-by: yuchenlin <yuchenlin@synology.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-06-24 15:53:02 +02:00
Max Reitz	a193ad3b3b	block/commit: Drop bdrv_child_try_set_perm() commit_top_bs never requests or unshares any permissions. There is no reason to make this so explicit here. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:10 +02:00
Max Reitz	f94dc3b414	block/mirror: Fix child permissions We cannot use bdrv_child_try_set_perm() to give up all restrictions on the child edge, and still have bdrv_mirror_top_child_perm() request BLK_PERM_WRITE. Fix this by making bdrv_mirror_top_child_perm() return 0/BLK_PERM_ALL when we want to give up all permissions, and replacing bdrv_child_try_set_perm() by bdrv_child_refresh_perms(). The bdrv_child_try_set_perm() before removing the node with bdrv_replace_node() is then unnecessary. No permissions have changed since the previous invocation of bdrv_child_try_set_perm(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:10 +02:00
Max Reitz	094e363944	file-posix: Update open_flags in raw_set_perm() raw_check_perm() + raw_set_perm() can change the flags associated with the current FD. If so, we have to update BDRVRawState.open_flags accordingly. Otherwise, we may keep reopening the FD even though the current one already has the correct flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy	b23c580c94	block: drop bs->job Drop remaining users of bs->job: 1. assertions actually duplicated by assert(!bs->refcnt) 2. trace-point seems not enough reason to change stream_start to return BlockJob pointer 3. Restricting creation of two jobs based on same bs is bad idea, as 3.1 Some jobs creates filters to be their main node, so, this check don't actually prevent creating second job on same real node (which will create another filter node) (but I hope it is restricted by other mechanisms) 3.2 Even without bs->job we have two systems of permissions: op-blockers and BLK_PERM 3.3 We may want to run several jobs on one node one day And finally, drop bs->job pointer itself. Hurrah! Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy	68d00e4293	block/block-backend: blk_iostatus_reset: drop usage of bs->job We are going to remove bs->job pointer. Drop it's usage in blk_iostatus_reset. blk_iostatus_reset() has only two callers: 1. blk_attach_dev(). This doesn't have anything to do with jobs and attaching a new guest device won't solve any problem the job encountered, so no reason to reset the iostatus for the job. 2. qmp_cont(). This resets the iostatus for everything. We can just call block_job_iostatus_reset() for all block jobs instead of going through BlockBackend. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy	cc19f1773d	block/replication: drop usage of bs->job We are going to remove bs->job pointer. Drop it's usage in replication code. Additionally we have to return job pointer from some mirror APIs. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:09 +02:00
Max Reitz	1adb0b5e0f	blkdebug: Inject errors on .bdrv_co_block_status() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190507203508.18026-6-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-06-14 14:16:57 +02:00
Max Reitz	f8cec157cb	blkdebug: Add "none" event Together with @iotypes and @sector, this can be used to trap e.g. the first read or write access to a certain sector without having to know what happens internally in the block layer, i.e. which "real" events happen right before such an access. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190507203508.18026-5-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-06-14 14:16:57 +02:00
Max Reitz	16789db3de	blkdebug: Add @iotype error option This new error option allows users of blkdebug to inject errors only on certain kinds of I/O operations. Users usually want to make a very specific operation fail, not just any; but right now they simply hope that the event that triggers the error injection is followed up with that very operation. That may not be true, however, because the block layer is changing (including blkdebug, which may increase the number of types of I/O operations on which to inject errors). The new option's default has been chosen to keep backwards compatibility. Note that similar to the internal representation, we could choose to expose this option as a list of I/O types. But there is no practical use for this, because as described above, users usually know exactly which kind of operation they want to make fail, so there is no need to specify multiple I/O types at once. In addition, exposing this option as a list would require non-trivial changes to qemu_opts_absorb_qdict(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190507203508.18026-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-06-14 14:16:57 +02:00
Vladimir Sementsov-Ogievskiy	611ae1d716	block/nbd: merge NBDClientSession struct back to BDRVNBDState No reason to keep it separate, it differs from others block driver behavior and therefore confuses. Instead of generic 'state = (State*)bs->opaque' we have to use special helper. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190611102720.86114-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-06-13 10:00:42 -05:00
Vladimir Sementsov-Ogievskiy	86f8cdf3db	block/nbd: merge nbd-client.* to nbd.c No reason for keeping driver handlers realization separate from driver structure. We can get rid of extra header file. While being here, fix comments style, restore forgotten comments for NBD_FOREACH_REPLY_CHUNK and nbd_reply_chunk_iter_receive, remove extra includes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190611102720.86114-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-06-13 09:55:09 -05:00
Vladimir Sementsov-Ogievskiy	0a93b359db	block/nbd-client: drop stale logout Drop one on failure path (we have errp) and turn two others into trace points. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190611102720.86114-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-06-13 09:35:53 -05:00
Stefano Garzarella	2ea8e96da2	block/gluster: update .help of BLOCK_OPT_PREALLOC option Add missing 'falloc' among the allowed values of 'preallocation' option; show it and 'full' only when they are supported. ('falloc' is supported if defined CONFIG_GLUSTERFS_FALLOCATE, 'full' is supported if defined CONFIG_GLUSTERFS_ZEROFILL) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190524075848.23781-4-sgarzare@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-06-12 18:32:32 +02:00
Stefano Garzarella	abea00533f	block/file-posix: update .help of BLOCK_OPT_PREALLOC option Show 'falloc' among the allowed values of 'preallocation' option, only when it is supported (if defined CONFIG_POSIX_FALLOCATE) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190524075848.23781-3-sgarzare@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-06-12 18:31:46 +02:00
Markus Armbruster	a8d2532645	Include qemu-common.h exactly where needed No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]	2019-06-12 13:20:20 +02:00
Markus Armbruster	0b8fa32f55	Include qemu/module.h where needed, drop it from qemu-common.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]	2019-06-12 13:18:33 +02:00
Markus Armbruster	856dfd8a03	qemu-common: Move qemu_isalnum() etc. to qemu/ctype.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-3-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2019-06-11 20:22:09 +02:00
Vladimir Sementsov-Ogievskiy	d93e572688	block/io: bdrv_pdiscard: support int64_t bytes parameter This fixes at least one overflow in qcow2_process_discards, which passes 64bit region length to bdrv_pdiscard where bytes (or sectors in the past) parameter is int since its introduction in `0b919fae`. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 16:55:58 +02:00
Vladimir Sementsov-Ogievskiy	1477b6c803	block/qcow2-refcount: add trace-point to qcow2_process_discards Let's at least trace ignored failure. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 16:55:58 +02:00
Kevin Wolf	d0ee0204f4	block: Remove wrong bdrv_set_aio_context() calls The mirror and commit block jobs use bdrv_set_aio_context() to move their filter node into the right AioContext before hooking it up in the graph. Similarly, bdrv_open_backing_file() explicitly moves the backing file node into the right AioContext first. This isn't necessary any more, they get automatically moved into the right context now when attaching them. However, in the case of bdrv_open_backing_file() with a node reference, it's actually not only unnecessary, but even wrong: The unchecked bdrv_set_aio_context() changes the AioContext of the child node even if other parents require it to retain the old context. So this is not only a simplification, but a bug fix, too. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1684342 Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	132ada80c4	block: Adjust AioContexts when attaching nodes So far, we only made sure that updating the AioContext of a node affected the whole subtree. However, if a node is newly attached to a new parent, we also need to make sure that both the subtree of the node and the parent are in the same AioContext. This tries to move the new child node to the parent AioContext and returns an error if this isn't possible. BlockBackends now actually apply their AioContext to their root node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	d861ab3acf	block: Add BlockBackend.ctx This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	97896a4887	block: Add Error to blk_set_aio_context() Add an Error parameter to blk_set_aio_context() and use bdrv_child_try_set_aio_context() internally to check whether all involved nodes can actually support the AioContext switch. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Julia Suvorova	2b02fd81de	block/linux-aio: Drop unused BlockAIOCB submission method Callback-based laio_submit() and laio_cancel() were left after rewriting Linux AIO backend to coroutines in hope that they would be used in other code that could bypass coroutines. They can be safely removed because they have not been used since that time. Signed-off-by: Julia Suvorova <jusual@mail.ru> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:20:41 +02:00
Max Reitz	5cb2737e92	block/io: Delay decrementing the quiesce_counter When ending a drained section, bdrv_do_drained_end() currently first decrements the quiesce_counter, and only then actually ends the drain. The bdrv_drain_invoke(bs, false) call may cause graph changes. Say the graph change involves replacing an existing BB's ("blk") BDS (blk_bs(blk)) by @bs. Let us introducing the following values: - bs_oqc = old_quiesce_counter (so bs->quiesce_counter == bs_oqc - 1) - obs_qc = blk_bs(blk)->quiesce_counter (before bdrv_drain_invoke()) Let us assume there is no blk_pread_unthrottled() involved, so blk->quiesce_counter == obs_qc (before bdrv_drain_invoke()). Now replacing blk_bs(blk) by @bs will reduce blk->quiesce_counter by obs_qc (making it 0) and increase it by bs_oqc-1 (making it bs_oqc-1). bdrv_drain_invoke() returns and we invoke bdrv_parent_drained_end(). This will decrement blk->quiesce_counter by one, so it would be -1 -- were there not an assertion against that in blk_root_drained_end(). We therefore have to keep the quiesce_counter up at least until bdrv_drain_invoke() returns, so that bdrv_parent_drained_end() does the right thing for the parents @bs got during bdrv_drain_invoke(). But let us delay it even further, namely until bdrv_parent_drained_end() returns, because then it mirrors bdrv_do_drained_begin(): There, we first increment the quiesce_counter, then begin draining the parents, and then call bdrv_drain_invoke(). It makes sense to let bdrv_do_drained_end() unravel this exactly in reverse. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:20:41 +02:00
Vladimir Sementsov-Ogievskiy	69f47505ee	block: avoid recursive block_status call if possible drv_co_block_status digs bs->file for additional, more accurate search for hole inside region, reported as DATA by bs since `5daa74a6eb`. This accuracy is not free: assume we have qcow2 disk. Actually, qcow2 knows, where are holes and where is data. But every block_status request calls lseek additionally. Assume a big disk, full of data, in any iterative copying block job (or img convert) we'll call lseek(HOLE) on every iteration, and each of these lseeks will have to iterate through all metadata up to the end of file. It's obviously ineffective behavior. And for many scenarios we don't need this lseek at all. However, lseek is needed when we have metadata-preallocated image. So, let's detect metadata-preallocation case and don't dig qcow2's protocol file in other cases. The idea is to compare allocation size in POV of filesystem with allocations size in POV of Qcow2 (by refcounts). If allocation in fs is significantly lower, consider it as metadata-preallocation case. 102 iotest changed, as our detector can't detect shrinked file as metadata-preallocation, which don't seem to be wrong, as with metadata preallocation we always have valid file length. Two other iotests have a slight change in their QMP output sequence: Active 'block-commit' returns earlier because the job coroutine yields earlier on a blocking operation. This operation is loading the refcount blocks in qcow2_detect_metadata_preallocation(). Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:20:41 +02:00
Peter Maydell	62f6849e7a	Pull request -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAlztyykACgkQfe+BBqr8 OQ4EaA//QQVDpIBRcMN+LeKWeEs8VSLziPUrZuFvuhMEEnjnaU6gbKq8G8xbFQ62 JIHg0DBGhTt8ymE9Ay6O/cooR8F0z+XyfDr7UlpI7JL/Uwl7JguGKQrWUYBRMqCv Q2cLaWStLkfdkuW7Y3WRc16VEnIlizDxjRzfjE2ESYpuzD2fFsBY3KZbgbJwYwZw SujWUQ3MdsNdw5kDmerlrDUy7r/eyl2cLXyIt6ClHNoqq392oGMoUn4XbsaLnCWE H5s46qm33eXtvBHqxVGoOMAli5FwCnhwF+H3xg93jIG6vC/RXQYCIhlEmEwKyrU2 g2DWWe/8+9b0iX+zTIcAPTcn1pmjVivGRorOurP0AtMtjV/8PvV+hAQQeSg2ARB3 rLpXaEphD4WTwu7mYlZ5kX0qvX2SftaMU08k1IgR3mfo8Z3X9znVoFIv8HLlHuy+ OhCmwT5OWYw4mNABTXeBMH/Dcs9EcU4+T/KhAGLReHo18CSyjeT2xsT+XCsETagF KlAP88dP0EdJ9Oiccyb8as22u7ygKWIiDYPplBdb4SkKg/koQnYGDjeDAzB2vXS3 cGVhGJD2DBbcePA8iaCfWzsSCDOTBFQLa45uhPD3DnkAJylhecSsiDQP+IrLslK3 h/8v9e8MAlHMgrueSnS7foMDI9rdrTNsChuNCJWOOaUI/ZWnXFg= =kCrN -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging Pull request # gpg: Signature made Wed 29 May 2019 00:58:33 BST # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: iotests: test external snapshot with bitmap copying qapi: support external bitmaps in block-dirty-bitmap-merge migration/dirty-bitmaps: change bitmap enumeration method Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-05-30 12:10:27 +01:00
Vladimir Sementsov-Ogievskiy	eff0829b07	qapi: support external bitmaps in block-dirty-bitmap-merge Add new optional parameter making possible to merge bitmaps from different nodes. It is needed to maintain external snapshots during incremental backup chain history. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190517152111.206494-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-05-28 19:33:31 -04:00
Andrey Shinkevich	6388903e7c	qcow2-bitmap: initialize bitmap directory alignment Valgrind detects multiple issues in QEMU iotests when the memory is used without being initialized. Valgrind may dump lots of unnecessary reports what makes the memory issue analysis harder. Particularly, that is true for the aligned bitmap directory and can be seen while running the iotest #169. Padding the aligned space with zeros eases the pain. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-id: 1558961521-131620-1-git-send-email-andrey.shinkevich@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Anton Nefedov	c8bb23cbdb	qcow2: skip writing zero buffers to empty COW areas If COW areas of the newly allocated clusters are zeroes on the backing image, efficient bdrv_write_zeroes(flags=BDRV_REQ_NO_FALLBACK) can be used on the whole cluster instead of writing explicit zero buffers later in perform_cow(). iotest 060: write to the discarded cluster does not trigger COW anymore. Use a backing image instead. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Message-id: 20190516142749.81019-2-anton.nefedov@virtuozzo.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Alberto Garcia	b441dc71c0	block: Make bdrv_root_attach_child() unref child_bs on failure A consequence of the previous patch is that bdrv_attach_child() transfers the reference to child_bs from the caller to parent_bs, which will drop it on bdrv_close() or when someone calls bdrv_unref_child(). But this only happens when bdrv_attach_child() succeeds. If it fails then the caller is responsible for dropping the reference to child_bs. This patch makes bdrv_attach_child() take the reference also when there is an error, freeing the caller for having to do it. A similar situation happens with bdrv_root_attach_child(), so the changes on this patch affect both functions. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com [mreitz: Removed now superfluous BdrvChild * variable in bdrv_open_child()] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	ae6b12fa4c	block/backup: refactor: split out backup_calculate_cluster_size Split out cluster_size calculation. Move copy-bitmap creation above block-job creation, as we are going to share it with upcoming backup-top filter, which also should be created before actual block job creation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190429090842.57910-6-vsementsov@virtuozzo.com [mreitz: Dropped a paragraph from the commit message that was left over from a previous version] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00

1 2 3 4 5 ...

4447 Commits