Commit Graph

267 Commits

Author SHA1 Message Date
Gerd Hoffmann
1c82392a15 xhci: add tracepoint for endpoint state changes
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-09-02 11:06:19 +02:00
Paolo Bonzini
55d5d04884 memory: add tracepoints for MMIO reads/writes
This is quite handy to debug softmmu targets.

Reviewed-by: Andreas Faerber <afaerber@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1375016242-32651-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-29 10:37:10 -05:00
Paul Durrant
8fbab3b62a Xen PV Device
Introduces a new Xen PV PCI device which will act as a binding point for
PV drivers for Xen.
The device has parameterized vendor-id, device-id and revision to allow to
be configured as a binding point for any vendor's PV drivers.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
2013-07-29 11:13:44 +00:00
Markus Armbruster
3ba00637d0 trace-events: Fix up source file comments
They're all wrong since (at least) Paolo's big source tree
reorganization.  Need to shuffle some event declarations around to
keep them under the correct source file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-18 11:44:42 +08:00
Markus Armbruster
3ae76d23d2 trace-events: Drop unused events
Dropped event                           Unused since
mirror_cow                              884fea4
paio_complete                           47e6b25
paio_cancel                             47e6b25
usb_ehci_data                           0ce668b
megasas_qf_dequeue                      never used
megasas_handle_frame                    never used
megasas_io_continue                     never used
megasas_iovec_map_failed                never used
megasas_dcmd_map_failed                 never used
milkymist_softusb_mouse_event           4c15ba9
xen_map_block                           6506e4f
xen_unmap_block                         6506e4f
qemu_spice_start                        67be672
qemu_spice_stop                         67be672

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-18 11:44:42 +08:00
Chegu Vinod
7ca1dfad95 Force auto-convegence of live migration
If a user chooses to turn on the auto-converge migration capability
these changes detect the lack of convergence and throttle down the
guest. i.e. force the VCPUs out of the guest for some duration
and let the migration thread catchup and help converge.

Verified the convergence using the following :
 - Java Warehouse workload running on a 20VCPU/256G guest(~80% busy)
 - OLTP like workload running on a 80VCPU/512G guest (~80% busy)

Sample results with Java warehouse workload : (migrate speed set to 20Gb and
migrate downtime set to 4seconds).

 (qemu) info migrate
 capabilities: xbzrle: off auto-converge: off  <----
 Migration status: active
 total time: 1487503 milliseconds
 expected downtime: 519 milliseconds
 transferred ram: 383749347 kbytes
 remaining ram: 2753372 kbytes
 total ram: 268444224 kbytes
 duplicate: 65461532 pages
 skipped: 64901568 pages
 normal: 95750218 pages
 normal bytes: 383000872 kbytes
 dirty pages rate: 67551 pages

 ---

 (qemu) info migrate
 capabilities: xbzrle: off auto-converge: on   <----
 Migration status: completed
 total time: 241161 milliseconds
 downtime: 6373 milliseconds
 transferred ram: 28235307 kbytes
 remaining ram: 0 kbytes
 total ram: 268444224 kbytes
 duplicate: 64946416 pages
 skipped: 64903523 pages
 normal: 7044971 pages
 normal bytes: 28179884 kbytes

Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2013-07-12 20:35:08 +02:00
Dietmar Maurer
98d2c6f2cd block: add basic backup support to block driver
backup_start() creates a block job that copies a point-in-time snapshot
of a block device to a target block device.

We call backup_do_cow() for each write during backup. That function
reads the original data from the block device before it gets
overwritten.  The data is then written to the target device.

Currently backup cluster size is hardcoded to 65536 bytes.

[I made a number of changes to Dietmar's original patch and folded them
in to make code review easy.  Here is the full list:

 * Drop BackupDumpFunc interface in favor of a target block device
 * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes()
 * Use 0 delay instead of 1us, like other block jobs
 * Unify creation/start functions into backup_start()
 * Simplify cleanup, free bitmap in backup_run() instead of cb
 * function
 * Use HBitmap to avoid duplicating bitmap code
 * Use bdrv_getlength() instead of accessing ->total_sectors
 * directly
 * Delete the backup.h header file, it is no longer necessary
 * Move ./backup.c to block/backup.c
 * Remove #ifdefed out code
 * Coding style and whitespace cleanups
 * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks
 * Keep our own in-flight CowRequest list instead of using block.c
   tracked requests.  This means a little code duplication but is much
   simpler than trying to share the tracked requests list and use the
   backup block size.
 * Add on_source_error and on_target_error error handling.
 * Use trace events instead of DPRINTF()

-- stefanha]

Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 09:20:26 +02:00
Anthony Liguori
fd469df97a Merge remote-tracking branch 'bonzini/iommu-for-anthony' into staging
# By Paolo Bonzini (11) and others
# Via Paolo Bonzini
* bonzini/iommu-for-anthony:
  memory: clean up phys_page_find
  memory: populate FlatView for new address spaces
  memory: limit sections in the radix tree to the actual address space size
  s390x: reduce TARGET_PHYS_ADDR_SPACE_BITS to 62
  memory: fix address space initialization/destruction
  memory: make memory_global_sync_dirty_bitmap take an AddressSpace
  memory: do not duplicate memory_region_destructor_none
  memory: Rename readable flag to romd_mode
  memory: Replace open-coded memory_region_is_romd
  memory: allow memory_region_find() to run on non-root memory regions
  memory: assert that PhysPageEntry's ptr does not overflow
  exec: eliminate stq_phys_notdirty
  exec: make qemu_get_ram_ptr private
  exec: eliminate qemu_put_ram_ptr
  exec: remove obsolete comment

Message-id: 1369414987-8839-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-05-24 13:47:42 -05:00
Paolo Bonzini
4f39178b3a exec: eliminate qemu_put_ram_ptr
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-05-24 18:42:19 +02:00
Stefan Hajnoczi
02ffb50448 coroutine: stop using AioContext in CoQueue
qemu_co_queue_next(&queue) arranges that the next queued coroutine is
run at a later point in time.  This deferred restart is useful because
the caller may not want to transfer control yet.

This behavior was implemented using QEMUBH in the past, which meant that
CoQueue (and hence CoMutex and CoRwlock) had a dependency on the
AioContext event loop.  This hidden dependency causes trouble when we
move to a world with multiple event loops - now qemu_co_queue_next()
needs to know which event loop to schedule the QEMUBH in.

After pondering how to stash AioContext I realized the best solution is
to not use AioContext at all.  This patch implements the deferred
restart behavior purely in terms of coroutines and no longer uses
QEMUBH.

Here is how it works:

Each Coroutine has a wakeup queue that starts out empty.  When
qemu_co_queue_next() is called, the next coroutine is added to our
wakeup queue.  The wakeup queue is processed when we yield or terminate.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-05-24 16:17:56 +02:00
Hervé Poussineau
da4c1a7a85 osdep: fix qemu_anon_ram_free trace (+ fix compilation on 32 bit hosts)
Commit e7a09b92b7 added a trace at each
memory freeing, but unfortunately inverted size and pointer when printing
them. Fix trace.

This also led to a compilation error on 32 bit hosts:
In file included from include/trace.h:4:0,
                 from trace/generated-events.c:3:
./trace/generated-tracers.h: In function ‘trace_qemu_anon_ram_free’:
./trace/generated-tracers.h:64:9: error: format ‘%zu’ expects argument of type
‘size_t’, but argument 3 has type ‘void *’ [-Werror=format]
./trace/generated-tracers.h:64:9: error: format ‘%p’ expects argument of type
‘void *’, but argument 4 has type ‘size_t’ [-Werror=format]

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Message-id: 1369045989-14016-1-git-send-email-hpoussin@reactos.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-05-20 08:20:08 -05:00
Paolo Bonzini
e7a09b92b7 osdep: introduce qemu_anon_ram_free to free qemu_anon_ram_alloc-ed memory
We switched from qemu_memalign to mmap() but then we don't modify
qemu_vfree() to do a munmap() over free().  Which we cannot do
because qemu_vfree() frees memory allocated by qemu_{mem,block}align.

Introduce a new function that does the munmap(), luckily the size is
available in the RAMBlock.

Reported-by: Amos Kong <akong@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Message-id: 1368454796-14989-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-05-14 08:53:31 -05:00
Paolo Bonzini
6eebf958ab osdep, kvm: rename low-level RAM allocation functions
This is preparatory to the introduction of a separate freeing API.

Reported-by: Amos Kong <akong@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Message-id: 1368454796-14989-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-05-14 08:53:31 -05:00
Paolo Bonzini
fa131d94a5 qom: trace asserting casts
This provides a way to detect the cast that leads to a (reproducible)
crash even when QOM cast debugging is disabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1368188203-3407-6-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-05-13 09:52:06 -05:00
Kazuya Saito
b76ac80a5c kvm-all: add kvm_run_exit tracepoint
This patch enable us to know exit reason of KVM_RUN. It will help us
know where the trouble is caused.

Signed-off-by: Kazuya Saito <saito.kazuya@jp.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-05-03 13:58:09 +02:00
Kazuya Saito
9c7757290c kvm-all: add kvm_ioctl, kvm_vm_ioctl, kvm_vcpu_ioctl tracepoints
This patch adds tracepoints at ioctl to kvm. Tracing these ioctl is
useful for clarification whether the cause of troubles is qemu or kvm.

Signed-off-by: Kazuya Saito <saito.kazuya@jp.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-05-03 13:58:08 +02:00
Hervé Poussineau
6e860b5db4 pvscsi: fix compilation on 32 bit hosts
This fixes the following error:
In file included from qemu/include/trace.h:4:0,
                 from trace/generated-events.c:3:
./trace/generated-tracers.h: In function ‘trace_pvscsi_get_sg_list’:
./trace/generated-tracers.h:4271:9: error: format ‘%lu’ expects argument of
type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Werror=format]

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2013-05-01 21:00:20 +04:00
Gerd Hoffmann
3b7e759a41 usb: better speed mismatch error reporting
Report the supported speeds for device and port in the error message.
Also add the speeds to the tracepoint.  And while being at it drop
the redundant error message in usb_desc_attach, usb_device_attach will
report the error anyway.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-04-23 08:43:10 +02:00
Dmitry Fleytman
881d588a98 scsi: VMWare PVSCSI paravirtual device implementation
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Yan Vugenfirer <yan@daynix.com>
[ Rename files to vmw_pvscsi, fix setting of hostStatus in
  pvscsi_request_cancelled - Paolo ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-04-19 10:44:17 +02:00
Anthony Liguori
86c7dba0d0 Merge remote-tracking branch 'kraxel/usb.80' into staging
# By Gerd Hoffmann (6) and Hans de Goede (1)
# Via Gerd Hoffmann
* kraxel/usb.80:
  use libusb for usb-host
  xhci: fix address device
  xhci: use slotid as device address
  xhci: fix portsc writes
  xhci: add xhci_cap_write
  xhci: remove leftover debug printf
  usb-serial: Remove double call to qemu_chr_add_handlers( NULL )

Message-id: 1366107190-30853-1-git-send-email-kraxel@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-04-16 10:28:58 -05:00
Gerd Hoffmann
2b2325ff64 use libusb for usb-host
Reimplement usb-host on top of libusb.
Reasons to do this:

 (1) Largely rewritten from scratch, nice opportunity to kill historical
     cruft.
 (2) Offload usbfs handling to libusb.
 (3) Have a single portable code base instead of bsd + linux variants.
 (4) Bring usb-host support to any platform supported by libusbx.

For now this goes side-by-side to the existing code.  That is only to
simplify regression testing though, at the end of the day I want remove
the old code and support libusb exclusively.  Merge early in 1.5 cycle,
remove the old code after 1.5 release or something like this.

Thanks to qdev the old and new code can coexist nicely on linux.  Just
use "-device usb-host-linux" to use the old linux driver instead of the
libusb one (which takes over the "usb-host" name).

The bsd driver isn't qdev'ified so it isn't that easy for bsd.
I didn't bother making it runtime switchable, so you have to rebuild
qemu with --disable-libusb to get back the old code.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-04-16 12:04:09 +02:00
Gerd Hoffmann
bdfce20df1 xhci: fix portsc writes
Check for port reset first and skip everything else then.
Add sanity checks for PLS updates.
Add PLC notification when entering PLS_U0 state.

This gets host-initiated port resume going on win8.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-04-16 11:59:08 +02:00
Gerd Hoffmann
0f7b2864d0 console: gui timer fixes
Make gui update rate adaption code in gui_update() actually work.
Sprinkle in a tracepoint so you can see the code at work.  Remove
the update rate adaption code in vnc and make vnc simply use the
generic bits instead.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-04-16 09:03:49 +02:00
Gerd Hoffmann
437fe1061b console: add trace events
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-04-16 09:03:47 +02:00
Gerd Hoffmann
eb2f9b024d hw/vmware_vga.c: various vmware vga fixes.
Hardcode depth to 32 bpp.  It effectively was that way before because
that is the default surface depth, this just makes it explicit in the
code.

Rename depth to new_depth to make it consistent with the new_width +
new_height names.  In theory we can make new_depth changeable (i.e.
allow the guest to fill in -- say -- 16 there).  In practice the guests
don't try, the X-Server refuses to start if you ask it to use 16bpp
depth (via DefaultDepth in the Screen section).

Always return the correct rmask+gmask+bmask values for the given
new_depth.

Fix mode setting to also verify at new_depth to make sure we have a
correct DisplaySurface, even if the current video mode happes to be
16bpp (set by vgabios via bochs vbe interface).  While being at it
switch over to use qemu_create_displaysurface_from, so the surface is
backed by guest-visible video memory and we save a memcpy.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-04-16 09:03:47 +02:00
Gerd Hoffmann
7a6404cd8b hw/vmware_vga.c: add tracepoints for mmio reads+writes
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-04-16 09:03:47 +02:00
Gerd Hoffmann
0ab966cfcc xhci: remove unimplemented printfs
Replace them with a tracepoint, so they don't spam stderr by default.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-04-03 09:55:49 +02:00
Anthony Liguori
fde245ca7e Merge remote-tracking branch 'stefanha/block' into staging
# By Kevin Wolf (22) and Peter Lieven (1)
# Via Stefan Hajnoczi
* stefanha/block: (23 commits)
  block: Fix direct use of protocols as driver for bdrv_open()
  qcow2: Gather clusters in a looping loop
  qcow2: Move cluster gathering to a non-looping loop
  qcow2: Allow requests with multiple l2metas
  qcow2: Use byte granularity in qcow2_alloc_cluster_offset()
  qcow2: Prepare handle_alloc/copied() for byte granularity
  qcow2: handle_copied(): Implement non-zero host_offset
  qcow2: handle_copied(): Get rid of keep_clusters parameter
  qcow2: handle_copied(): Get rid of nb_clusters parameter
  qcow2: Factor out handle_copied()
  qcow2: Clean up handle_alloc()
  qcow2: Finalise interface of handle_alloc()
  qcow2: handle_alloc(): Get rid of keep_clusters parameter
  qcow2: handle_alloc(): Get rid of nb_clusters parameter
  qcow2: Factor out handle_alloc()
  qcow2: Decouple cluster allocation from cluster reuse code
  qcow2: Change handle_dependency to byte granularity
  qcow2: Improve check for overlapping allocations
  qcow2: Handle dependencies earlier
  qcow2: Remove bogus unlock of s->lock
  ...
2013-03-28 12:57:37 -05:00
Kazuya Saito
7e8660032c vl: add runstate_set tracepoint
This patch enables us to know RunState transition. It will be userful
for investigation when the trouble occured in special event such like
live migration, shutdown, suspend, and so on.

Signed-off-by: Kazuya Saito <saito.kazuya@jp.fujitsu.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 14:20:58 +01:00
Kevin Wolf
0af729ec00 qcow2: Factor out handle_copied()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
10f0ed8b2f qcow2: Factor out handle_alloc()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Gerd Hoffmann
da229ef3b3 console: rework DisplaySurface handling [vga emu side]
Decouple DisplaySurface allocation & deallocation from DisplayState.
Replace dpy_gfx_resize + dpy_gfx_setdata with a dpy_gfx_replace_surface
function.

This handles the graphic hardware emulation.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-03-18 10:21:58 +01:00
Gerd Hoffmann
7c20b4a374 console: fix displaychangelisteners interface
Split callbacks into separate Ops struct.  Pass DisplayChangeListener
pointer as first argument to all callbacks.  Uninline a bunch of
display functions and move them from console.h to console.c

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-03-18 10:21:58 +01:00
Stefan Hajnoczi
b811203cf2 threadpool: move globals into struct ThreadPool
Move global variables into a struct so multiple thread pools can be
supported in the future.

This patch does not change thread-pool.h interfaces.  There is still a
global thread pool and it is not yet possible to create/destroy
individual thread pools.  Moving the variables into a struct first makes
later patches easier to review.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-15 16:07:50 +01:00
Gerd Hoffmann
4524051c32 Add search path support for qemu data files.
This patch allows to specify multiple directories where qemu should look
for data files.  To implement that the behavior of the -L switch is
slightly different now:  Instead of replacing the data directory the
path specified will be appended to the data directory list.  So when
specifiying -L multiple times all directories specified will be checked,
in the order they are specified on the command line, instead of just the
last one.

Additionally the default paths are always appended to the directory
data list.  This allows to specify a incomplete directory (such as the
seabios out/ directory) via -L.  Anything not found there will be loaded
from the default paths, so you don't have to create a symlink farm for
all the rom blobs.

For trouble-shooting a tracepoint has been added, logging which blob
has been loaded from which location.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1362739344-8068-1-git-send-email-kraxel@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-03-12 13:42:28 -05:00
Kazuya Saito
c09e5bb1d8 migration: add migrate_set_state tracepoint
Signed-off-by: Kazuya Saito <saito.kazuya@jp.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2013-03-11 13:32:01 +01:00
Paolo Bonzini
6f6710aa99 scsi: do not call scsi_read_data/scsi_write_data for a canceled request
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-05 17:51:51 +01:00
Gerd Hoffmann
024426acc0 usb-xhci: usb3 streams
Add streams support to the xhci emulation.  No secondary streams yet,
only linear stream arays are supported for now.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-02-19 13:17:48 +01:00
Cornelia Huck
a5cf2bb4e3 s390: Add new channel I/O based virtio transport.
Add a new virtio transport that uses channel commands to perform
virtio operations.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-29 21:50:04 +01:00
Cornelia Huck
df1fe5bb49 s390: Virtual channel subsystem support.
Provide a mechanism for qemu to provide fully virtual subchannels to
the guest.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-29 21:50:04 +01:00
Cornelia Huck
7b18aad543 s390: Add channel I/O instructions.
Provide handlers for (most) channel I/O instructions.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-29 21:50:04 +01:00
Paolo Bonzini
884fea4e87 mirror: support arbitrarily-sized iterations
Yet another optimization is to extend the mirroring iteration to include more
adjacent dirty blocks.  This limits the number of I/O operations and makes
mirroring efficient even with a small granularity.  Most of the infrastructure
is already in place; we only need to put a loop around the computation of
the origin and sector count of the iteration.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:35 +01:00
Paolo Bonzini
402a47411b mirror: support more than one in-flight AIO operation
With AIO support in place, we can start copying more than one chunk
in parallel.  This patch introduces the required infrastructure for
this: the buffer is split into multiple granularity-sized chunks,
and there is a free list to access them.

Because of copy-on-write, a single operation may already require
multiple chunks to be available on the free list.

In addition, two different iterations on the HBitmap may want to
copy the same cluster.  We avoid this by keeping a bitmap of in-flight
I/O operations, and blocking until the previous iteration completes.
This should be a pretty rare occurrence, though; as long as there is
no overlap the next iteration can start before the previous one finishes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:35 +01:00
Paolo Bonzini
bd48bde8f0 mirror: switch mirror_iteration to AIO
There is really no change in the behavior of the job here, since
there is still a maximum of one in-flight I/O operation between
the source and the target.  However, this patch already introduces
the AIO callbacks (which are unmodified in the next patch)
and some of the logic to count in-flight operations and only
complete the job when there is none.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:34 +01:00
Paolo Bonzini
b812f6719c mirror: perform COW if the cluster size is bigger than the granularity
When mirroring runs, the backing files for the target may not yet be
ready.  However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros.  Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying).  So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.

However, we want to lower the granularity for efficiency, so we need
a better solution.  The solution is to always copy a whole cluster the
first time it is touched.  The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Paolo Bonzini
8f0720ecbc block: implement dirty bitmap using HBitmap
This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.

Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Paolo Bonzini
e7c033c3fa add hierarchical bitmap data type and test cases
HBitmaps provides an array of bits.  The bits are stored as usual in an
array of unsigned longs, but HBitmap is also optimized to provide fast
iteration over set bits; going from one bit to the next is O(logB n)
worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
that the number of levels is in fact fixed.

In order to do this, it stacks multiple bitmaps with progressively coarser
granularity; in all levels except the last, bit N is set iff the N-th
unsigned long is nonzero in the immediately next level.  When iteration
completes on the last level it can examine the 2nd-last level to quickly
skip entire words, and even do so recursively to skip blocks of 64 words or
powers thereof (32 on 32-bit machines).

Given an index in the bitmap, it can be split in group of bits like
this (for the 64-bit case):

     bits 0-57 => word in the last bitmap     | bits 58-63 => bit in the word
     bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
     bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word

So it is easy to move up simply by shifting the index right by
log2(BITS_PER_LONG) bits.  To move down, you shift the index left
similarly, and add the word index within the group.  Iteration uses
ffs (find first set bit) to find the next word to examine; this
operation can be done in constant time in most current architectures.

Setting or clearing a range of m bits on all levels, the work to perform
is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.

When iterating on a bitmap, each bit (on any level) is only visited
once.  Hence, The total cost of visiting a bitmap with m bits in it is
the number of bits that are set in all bitmaps.  Unless the bitmap is
extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
cost of advancing from one bit to the next is usually constant.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:32 +01:00
Alon Levy
e0ac6097b6 qxl: stop using non revision 4 rom fields for revision < 4
Signed-off-by: Alon Levy <alevy@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2013-01-22 11:01:06 +01:00
Markus Armbruster
089da572b9 fw_cfg: Use void *, size_t instead of uint8_t *, uint32_t for blobs
Many callers pass size_t, which gets silently truncated to uint32_t.
Harmless, because all practical sizes are well below 4GiB.  Clean it
up anyway.  Size overflow now fails assertions.

Bonus: saves a whole bunch of silly casts.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:22:44 +00:00
Markus Armbruster
f6e3534327 fw_cfg: Replace debug prints by tracepoints
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:22:39 +00:00