The unmap command can reuse the same infrastructure as MODE SELECT
for reading the descriptor list into memory. The descriptors are
processed sequentially.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Leaving the aiocb to a non-NULL value leads to an assertion failure when
rerror/werror are set to stop or enospc, and the operation is retried.
scsi-disk checks that the aiocb member is NULL before filling it.
This patch correctly resets the aiocb to NULL values everywhere,
and adds the dual assertion that the aiocb was non-NULL before
calling bdrv_acct_done.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now we've cleared out the architecture-independent uses of
kvm_irqchip_in_kernel(), we can add a doc comment describing
what it means.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Don't assume having an in-kernel irqchip means that GSI
routing is enabled.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Decouple another x86-specific assumption about what irqchips imply.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of assuming that we can use irqfds if and only if
kvm_irqchip_in_kernel(), add a bool to the KVMState which
indicates this, and is set only on x86 and only if the
irqchip is in the kernel.
The kernel documentation implies that the only thing
you need to use KVM_IRQFD is that KVM_CAP_IRQFD is
advertised, but this seems to be untrue. In particular
the kernel does not (alas) return a sensible error if you
try to set up an irqfd when you haven't created an irqchip.
If it did we could remove all this nonsense and let the
kernel return the error code.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
kvm_allows_irq0_override() is a totally x86 specific concept:
move it to the target-specific source file where it belongs.
This means we need a new header file for the prototype:
kvm_i386.h, in line with the existing kvm_ppc.h.
While we are moving it, fix the return type to be 'bool' rather
than 'int'.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Rename the function kvm_irqchip_set_irq() to kvm_set_irq(),
since it can be used for sending (asynchronous) interrupts whether
there is a full irqchip model in the kernel or not. (We don't
include 'async' in the function name since asynchronous is the
normal case.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
On x86 userspace delivers interrupts to the kernel asynchronously
(and therefore VCPU idle management is done in the kernel) if and
only if there is an in-kernel irqchip. On other architectures this
isn't necessarily true (they may always send interrupts
asynchronously), so define a new kvm_async_interrupts_enabled()
function instead of misusing kvm_irqchip_in_kernel().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
The code creating the symlink from linux-headers/asm to the
architecture specific linux-headers/asm-$arch directory was
implicitly hardcoding a list of KVM supporting architectures.
Add a default case for the common "Linux architecture name and
QEMU CPU name match" case, so future architectures will only
need to add code if they've managed to get mismatched names.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add a helper function for fetching max cpus supported by kvm.
Make QEMU exit with an error message if smp_cpus exceeds limit
of VCPU count retrieved by invoking this helper function.
Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch fixes a problem in handling task management functions
in virtio-scsi. The cause of the problem is a mismatch between
the size of the tag in QEMU (32-bit) and virtio-scsi (64-bit).
Changing the QEMU size is hard because the migration format
uses 32 bits to store the tag; so just don't use the QEMU tag
(virtio-scsi only uses the tag for task management functions
anyway) and look up the full 64-bit tag in the hba_private field.
The reproducer is a bit obscure. If you cause an I/O timeout
(for example with rerror=stop and doing 'cont' on the monitor
continuously without fixing the error), sooner or later the
guest will try to abort the command and reissue it. At this
point, QEMU will report _two_ errors instead of one when you
hit 'c', because the first error has not been canceled correctly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch updates the iscsi layer to automatically pick a 'unique'
initiator-name based on the name of the vm in case the user has not set
an explicit iqn-name to use.
Create a new function qemu_get_vm_name() that returns the name of the VM,
if specified.
This way we can thus create default names to use as the initiator name
based on the guest session.
If the VM is not named via the '-name' command line argument, the iscsi
initiator-name used wiull simply be
iqn.2008-11.org.linux-kvm
If a name for the VM was specified with the '-name' option, iscsi will
use a default initiatorname of
iqn.2008-11.org.linux-kvm:<name>
These names are just the default iscsi initiator name that qemu will
generate/use only when the user has not set an explicit initiator name
via the commandlines or config files.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
The argument of iscsi_create_context is never freed by libiscsi,
which in fact calls strdup on it. Avoid a leak.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change XBZRLE cache size in bytes (the size should be a power of 2, it will be
rounded down to the nearest power of 2).
If XBZRLE cache size is too small there will be many cache miss.
New query-migrate-cache-size QMP command and 'info migrate_cache_size' HMP
command to query cache value.
Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In the outgoing migration check to see if the page is cached and
changed, then send compressed page by using save_xbrle_page function.
In the incoming migration check to see if RAM_SAVE_FLAG_XBZRLE is set
and decompress the page (by using load_xbrle function).
Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
For performance we are encoding long word at a time.
For nzrun we use long-word-at-a-time NULL-detection tricks from strcmp():
using ((lword - 0x0101010101010101) & (~lword) & 0x8080808080808080) test
to find out if any byte in the long word is zero.
Signed-off-by: Benoit Hudzia <benoit.hudzia@sap.com>
Signed-off-by: Petter Svard <petters@cs.umu.se>
Signed-off-by: Aidan Shribman <aidan.shribman@sap.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The management can enable/disable a capability for the next migration by using
migrate-set-capabilities QMP command.
The user can use migrate_set_capability HMP command.
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The management can query the current migration capabilities using
query-migrate-capabilities QMP command.
The user can use 'info migrate_capabilities' HMP command.
Currently only XBZRLE capability is available.
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Upstream seabios commit 5a023065388287e261ae9212452ff541f9fa9cd3
Major changes since 1.7.0:
- Usual share of bugfixes and cleanups ;)
- Support for 64bit PCI bars and mapping those above 4G.
- Stack switching for real mode irq handlers to reduce
seabios stack footprint.
- Support for booting from lsi scsi hba.
- Support for booting from usb attached scsi.
- Support for non-linear apic ids.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
* kwolf/for-anthony:
qemu-img: use QemuOpts instead of QEMUOptionParameter in resize function
qemu-iotests: Be more flexible with image creation options
qemu-iotests: add 039 qcow2 lazy refcounts test
qemu-io: add "abort" command to simulate program crash
qcow2: implement lazy refcounts
qemu-iotests: ignore qemu-img create lazy_refcounts output
docs: add lazy refcounts bit to qcow2 specification
qcow2: introduce dirty bit
docs: add dirty bit to qcow2 specification
qemu-iotests: add qed.py image manipulation utility
qapi: generalize documentation of streaming commands
ide scsi: Mess with geometry only for hard disk devices
Commit 5931065907 is incomplete,
we'll arrive in the scsi command complete callback in CSW state
and must handle that case correctly.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
in_addr_t isn't available on mingw32. Just use an unsigned long instead. I
considered typedef'ing in_addr_t on mingw32 but this would potentially be
brittle if mingw32 did introduce the type.
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu-iotests already filters out image creation options that may be
present or not in order to get the same output in both cases. However,
often it only considers the default value of the option. Cover all valid
values instead so that ./check -o name=value can be used successfull for
all of them.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This tests establishes the basic post-conditions of the qcow2 lazy
refcounts features:
1. If the image was closed normally, it is marked clean.
2. If an allocating write was performed and the image was not closed
normally, then it is marked dirty.
a. Written data can be read back successfully.
b. The image file can be repaired and will be marked clean again.
c. The image file is automatically repaired when opened read/write.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Avoiding data loss and corruption is the top requirement for image file
formats. The qemu-io "abort" command makes it possible to simulate
program crashes and does not give the image format a chance to cleanly
shut down. This command is useful for data integrity test cases.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Lazy refcounts is a performance optimization for qcow2 that postpones
refcount metadata updates and instead marks the image dirty. In the
case of crash or power failure the image will be left in a dirty state
and repaired next time it is opened.
Reducing metadata I/O is important for cache=writethrough and
cache=directsync because these modes guarantee that data is on disk
after each write (hence we cannot take advantage of caching updates in
RAM). Refcount metadata is not needed for guest->file block address
translation and therefore does not need to be on-disk at the time of
write completion - this is the motivation behind the lazy refcount
optimization.
The lazy refcount optimization must be enabled at image creation time:
qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough
Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Hide the default lazy_refcounts=off output from qemu-img like we do with
other image creation options. This ensures that existing golden outputs
continue to pass despite the new option that has been added.
Note that this patch applies before the one that actually introduces the
lazy_refcounts=on|off option. This ensures git-bisect(1) continues to
work.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The lazy refcounts bit indicates that this image can take advantage of
the dirty bit and that refcount updates can be postponed.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds an incompatible feature bit to mark images that have not
been closed cleanly. When a dirty image file is opened a consistency
check and repair is performed.
Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The dirty bit will make it possible to perform lazy refcount updates,
where the image file is not kept consistent all the time. Upon opening
a dirty image file, it is necessary to perform a consistency check and
repair any incorrect refcounts.
Therefore the dirty bit must be an incompatible feature bit. We don't
want old programs accessing a file with stale refcounts.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The qed.py utility can inspect and manipulate QED image files. It can
be used for testing to see the state of image metadata and also to
inject corruptions into the image file. It also has a scrubbing feature
to copy just the metadata out of an image file, allowing users to share
broken image files without revealing data in bug reports.
This has lived in my local repo for a long time but could be useful
to others.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Talk about background operations in general, rather than specifically
about streaming.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Legacy -drive cyls=... are now ignored completely when the drive
doesn't back a hard disk device. Before, they were first checked
against a hard disk's limits, then ignored.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit b1f416aa8d breaks vhost_net
because it always registers the virtio_pci_host_notifier_read() handler
function on the ioeventfd, even when vhost_net.ko is using the ioeventfd.
The result is both QEMU and vhost_net.ko polling on the same eventfd
and the virtio_net.ko guest driver seeing inconsistent results:
# ifconfig eth0 192.168.0.1 netmask 255.255.255.0
virtio_net virtio0: output:id 0 is not a head!
To fix this, proceed the same as we do for irqfd: add a parameter to
virtio_queue_set_host_notifier_fd_handler and in that case only set
the notifier, not the handler.
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Tested-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* 'axp-next' of git://repo.or.cz/qemu/rth:
alpha-linux-user: Fix the getpriority syscall
alpha-linux-user: Properly handle the non-rt sigprocmask syscall.
alpha-linux-user: Fix a3 error return with v0 error bypass.
linux-user: Translate pipe2 flags; add to strace
linux-user: Allocate the right amount of space for non-fixed file maps
linux-user: Handle O_SYNC, O_NOATIME, O_CLOEXEC, O_PATH
linux-user: Sync fcntl.h bits with the kernel
alpha-linux-user: Handle TARGET_SSI_IEEE_RAISE_EXCEPTION properly
alpha-linux-user: Work around hosted mmap allocation problems
alpha-linux-user: Fix signal handling