When AHCI executes an asynchronous IDE command, it checked DRDY without
checking either DRQ or BSY. This sometimes caused interrupt to be sent
before command is actually completed.
This resulted in a race condition: if guest then managed to access the
device before command has completed, it would hang waiting for an
interrupt.
This was observed with windows 7 guests.
To fix, check for DRQ or BSY in additiona to DRDY, if set,
the command is asynchronous so delay the interrupt until
asynchronous done callback is invoked.
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently copy_policy isn't used. Recent sheepdog supports erasure coding, which
make use of copy_policy internally, but require client explicitly passing
copy_policy from base inode to newly creately inode for snapshot related
operations.
If connected sheep daemon doesn't utilize copy_policy, passing it to sheep
daemon is just one extra null effect operation. So no compatibility problem.
With this patch, sheepdog can provide erasure coded volume for QEMU VM.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
'copies' is actually uint8_t since day one, but request headers and some helper
functions parameterize it as uint32_t for unknown reasons and effectively
reserve 24 bytes for possible future use. This patch explicitly set the correct
for copies and reserve the left bytes.
This is a preparation patch that allow passing copy_policy in request header.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_open_backing_file() tries to copy the backing file name using
pstrcpy directly after calling bdrv_open() to open the backing file
without checking whether that was actually successful. If it was not,
ps->backing_hd->file will probably be NULL and qemu will crash.
Fix this by moving pstrcpy after checking whether bdrv_open() succeeded.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Amos Kong <kongjianjun@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds a test case for Multiboot memory map in the tests/multiboot
directory, where future i386 test kernels can be dropped. Because this
requires an x86 build host and an installed 32 bit libgcc, the test is
not part of a regular 'make check'.
The reference output for the test is verified against test runs of the
same multiboot kernel booted by some GRUB 0.97.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This fixes a regression introduced by commit e3127ae0c, which kept the
allocation size of the bounce buffer limited to one page in order to
avoid unbounded allocations (as explained in the commit message of
6d16c2f88), but broke the reporting of the shortened bounce buffer to
the caller. The caller therefore assumes that the full requested size
was provided and causes memory corruption when writing beyond the end of
the actually allocated buffer.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Opening the qcow2 image with BDRV_O_NO_FLUSH prevents any flushes during
the image creation. This means that the image has not yet been flushed
to disk when qemu-img create exits. This flush is delayed until the next
operation on the image involving opening it without BDRV_O_NO_FLUSH and
closing (or directly flushing) it. For large images and/or images with a
small cluster size and preallocated metadata, this flush may take a
significant amount of time and may occur unexpectedly.
Reopening the image without BDRV_O_NO_FLUSH right before the end of
qcow2_create2() results in hoisting the potentially costly flush into
the image creation, which is expected to take some time (whereas
successive image operations may be not).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a test for saving a VM state from a qcow2 image and loading it back
(with having restarted qemu in between); this should work without any
problems.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
this adds a check that a dynamic VHD file has not been
accidently truncated (e.g. during transfer or upload).
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
currently it is not possible to distinguish by exitcode if there
has been an error or if bdrv_check is not supported by the image
format. Change the exitcode from 1 to 63 for the latter case.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Saving the VM state is done using bdrv_pwrite. This function may perform
a read-modify-write, which in this case results in data being read from
beyond the end of the virtual disk. Since we are actually trying to
access an area which is not a part of the virtual disk, zero_beyond_eof
has to be set to false before performing the partial write, otherwise
the VM state may become corrupted.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since df2a6f29a5, bdrv_co_do_writev increases the total_sectors value of
a growable block devices on writes after the current end. This leads to
the virtual disk apparently growing in qcow2_save_vmstate, which in turn
affects the disk size captured by the internal snapshot taken directly
afterwards through e.g. the HMP savevm command. Such a "grown" snapshot
cannot be loaded after reopening the qcow2 image, since its disk size
differs from the actual virtual disk size (writing a VM state does not
actually increase the virtual disk size).
Fix this by restoring total_sectors at the end of qcow2_save_vmstate.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The QMP wire format uses "", not '', around strings.
* docs/qapi-code-gen.txt: Fix typo.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
# By Paolo Bonzini (2) and Jan Kiszka (1)
# Via Gleb Natapov
* qemu-kvm/uq/master:
kvmvapic: Prevent reading beyond the end of guest RAM
x86: cpuid: reconstruct leaf 0Dh data
x86: fix migration from pre-version 12
Message-id: 1382108641-4862-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
# By Amos Kong
# Via Stefan Hajnoczi
* stefanha/net:
net/rtl8139: update network information when macaddr is changed in guest
net/e1000: update network information when macaddr is changed in guest
net: update nic info during device reset
Message-id: 1382103314-21608-1-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
# By Fam Zheng (3) and others
# Via Stefan Hajnoczi
* stefanha/block:
vmdk: fix VMFS extent parsing
vmdk: Only read cid from image file when opening
virtio: Remove unneeded memcpy
block/raw-win32: Always use -errno in hdev_open
blockdev: fix cdrom read_only flag
sd: Avoid access to NULL BlockDriverState
hmp: drop bogus "[not inserted]"
Message-id: 1382105915-27735-1-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
# By Paolo Bonzini (10) and others
# Via Paolo Bonzini
* bonzini/iommu-for-anthony:
exec: remove qemu_safe_ram_ptr
icount: make it thread-safe
icount: document (future) locking rules for icount
icount: prepare the code for future races in calling qemu_clock_warp
icount: reorganize icount_warp_rt
icount: use cpu_get_icount() directly
timer: add timer_mod_anticipate and timer_mod_anticipate_ns
timer: extract timer_mod_ns_locked and timerlist_rearm
timer: make qemu_clock_enable sync between disable and timer's cb
qemu-thread: add QemuEvent
timer: protect timers_state's clock with seqlock
seqlock: introduce read-write seqlock
vga: Mark relevant portio lists regions as coalesced MMIO flushing
cirrus: Mark vga io region as coalesced MMIO flushing
portio: Allow to mark portio lists as coalesced MMIO flushing
compatfd: switch to QemuThread
memory: fix 128 arithmetic in info mtree
Message-id: 1382024935-28297-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
# By Peter Maydell (3) and Ákos Kovács (2)
# Via Paolo Bonzini
* bonzini/configure:
ui/Makefile.objs: delete unnecessary cocoa.o dependency
default-configs/: CONFIG_GDBSTUB_XML removed
Makefile.target: CONFIG_NO_* variables removed
rules.mak: New string testing functions
rules.mak: New logical functions for handling y/n values
# By Gerd Hoffmann (2) and others
# Via Gerd Hoffmann
* spice/spice.v75:
spice: fix multihead support
spice-display: add display channel id to the debug messages.
Fix VNC SASL authentication when using a QXL device
spice: replace use of deprecated API
Message-id: 1382006760-19388-1-git-send-email-kraxel@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
The VMFS extent line in description file doesn't have start offset as
FLAT lines does, and it should be defaulted to 0. The flat_offset
variable is initialized to -1, so we need to set it in this case.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Previously cid of parent is parsed from image file for every IO request.
We already have L1/L2 cache and don't have assumption that parent image
can be updated behind us, so remove this to get more efficiency.
The parent CID is checked only for once after opening.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
rtl8139 has same problem as e1000, nic info isn't updated when macaddr
is changed in guest.
This patch updates the nic info when the last bit of macaddr is written.
Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If we change macaddr in guest by 'ifconfig eth0 hw ether 12:12:12:34:35:36',
the mac register of e1000 is already updated, but we don't update
network information in qemu. Therefor, the information in monitor
is wrong.
This patch updates nic info when the second part of macaddr is written.
Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
macaddr is reset during device reset, but nic info
isn't updated, this problem exists in e1000 & rtl8139
Signed-off-by: Amos Kong <akong@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Report from valgrind:
==19521== Source and destination overlap in memcpy(0x31d38938, 0x31d38938, 64)
==19521== at 0x4A0A343: memcpy@@GLIBC_2.14 (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==19521== by 0x42774E: virtio_blk_device_init (virtio-blk.c:686)
==19521== by 0x46EE9E: virtio_device_init (virtio.c:1158)
==19521== by 0x25405E: device_realize (qdev.c:178)
==19521== by 0x2559B5: device_set_realized (qdev.c:699)
==19521== by 0x3A819B: property_set_bool (object.c:1315)
==19521== by 0x3A6CE0: object_property_set (object.c:803)
Valgrind is right: blk == &s->blks, so it is a memcpy of 64 byte with
source == destination which can be removed.
Reported-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is not needed since the RAM list is not modified anymore by
qemu_get_ram_ptr. Replace it with qemu_get_ram_block.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Computing the deadline of all vm_clocks is somewhat expensive and calls
out to qemu-timer.c; two reasons not to do it in the seqlock's write-side
critical section. This however opens the door for races in setting and
reading vm_clock_warp_start.
To plug them, we need to cover the case where a new deadline slips in
between the call to qemu_clock_deadline_ns_all and the actual modification
of the icount_warp_timer. Restrict changes to vm_clock_warp_start and
the icount_warp_timer's expiration time, to only move them back (which
would simply cause an early wakeup).
If a vm_clock timer is cancelled while CPUs are idle, this might cause the
icount_warp_timer to fire unnecessarily. This is not a problem, after it
fires the timer becomes inactive and the next call to timer_mod_anticipate
will be precise.
In addition to this, we must deactivate the icount_warp_timer _before_
checking whether CPUs are idle. This way, if the "last" CPU becomes idle
during the call to timer_del we will still set up the icount_warp_timer.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To prepare for future code changes, move the increment of qemu_icount_bias
outside the "if" statement.
Also, hoist outside the if the check for timers that expired due to the
"warping". The check is redundant when !runstate_is_running(), but
doing it this way helps because the code that increments qemu_icount_bias
will be a critical section.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This will help later when we will have to place these calls in
a critical section, and thus call a version of cpu_get_icount()
that does not take the lock.
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These let a user anticipate the deadline of a timer, atomically with
other sites that call the function. This helps avoiding complicated
lock hierarchies.
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After disabling the QemuClock, we should make sure that no QemuTimers
are still in flight. To implement that with light overhead, we resort
to QemuEvent. The caller of disabling will wait on QemuEvent of each
timerlist.
Note, qemu_clock_enable(foo,false) can _not_ be called from timer's cb.
Also, the callers of qemu_clock_enable() should be protected by the BQL.
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This emulates Win32 manual-reset events using futexes or conditional
variables. Typical ways to use them are with multi-producer,
single-consumer data structures, to test for a complex condition whose
elements come from different threads:
for (;;) {
qemu_event_reset(ev);
... test complex condition ...
if (condition is true) {
break;
}
qemu_event_wait(ev);
}
Or more efficiently (but with some duplication):
... evaluate condition ...
while (!condition) {
qemu_event_reset(ev);
... evaluate condition ...
if (!condition) {
qemu_event_wait(ev);
... evaluate condition ...
}
}
QemuEvent provides a very fast userspace path in the common case when
no other thread is waiting, or the event is not changing state.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU_CLOCK_VIRTUAL may be read outside BQL. This will make its
foundation, i.e. cpu_clock_offset exposed to race condition.
Using private lock to protect it.
After this patch, reading QEMU_CLOCK_VIRTUAL is thread safe
unless use_icount is true, in which case the existing callers
still rely on the BQL.
Lock rule: private lock innermost, ie BQL->"this lock"
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Seqlock implementation for QEMU. Usage idiom
reader:
do {
start = seqlock_read_begin(&sl);
...
} while (seqlock_read_retry(&sl, start));
writer:
seqlock_write_lock(&sl);
...
seqlock_write_unlock(&sl);
initialization:
seqlock_init(QemuSeqLock *sl, QemuMutex *mutex)
mutex could be NULL if the caller will provide its own protection
for concurrent write sides (typically using the BQL).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This allows to remove the explicit qemu_flush_coalesced_mmio_buffer
calls.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This allows to remove the explicit qemu_flush_coalesced_mmio_buffer
calls - the memory core will invoke them now.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This will enable us to remove all remaining explicit calls of
qemu_flush_coalesced_mmio_buffer in IO handlers.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu_thread_create already does signal blocking and detaching for us.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
mtree_print_mr() calls int128_get64() in 3 places but only 2 places
handle 2^64 correctly.
This fixes the third call of int128_get64().
Cc: qemu-stable@nongnu.org
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On one occasion, hdev_open() returned -1 in case of an unknown error
instead of a proper -errno value. Adjust this to match the behavior of
raw_open() (in raw-win32), which is to return -EINVAL in this case.
Also, change the call to error_setg*() to match the one in raw_open() as
well.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch fixes spice display initialization to handle
multihead properly.
spice-core now keeps track of which QemuConsole has a spice
display channel attached to it and which has not. It also
manages display channel ids.
spice-display looks at all QemuConsoles and will pick up any
graphic console not yet bound to a spice channel (which in practice
are all non-qxl graphic devices).
Result is that
(a) you'll get a spice client window for each graphical device
now (first only without this patch), and
(b) mixing qxl and non-qxl vga cards works properly.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
ui/vnc.c:vnc_display_open() and spice-server/server/reds.c:do_spice_init()
are both calling sasl_server_init(). If spice_server_set_sasl_appname()
hasn't been called, spice-server will call it with "spice" as an appname,
causing cyrus-sasl to try to use a /etc/sasl2/spice.conf config file rather
than the /etc/sasl2/qemu.conf file that QEMU uses.
When using -spice sasl on the command line, QEMU properly calls
spice_server_set_sasl_appname() to set the SASL appname as "qemu",
but when using a QXL device without using SPICE, spice_server_init()
is called from qemu_spice_add_interface() without setting the appname
to "qemu", which then causes the VNC code to try to use spice.conf
instead of qemu.conf.
Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Since 0ebd24e0, cdrom doesn't have read-only on by default, which will
error out when using an read only image. Fix it by setting the default
value when parsing opts.
Reported-by: Edivaldo de Araujo Pereira <edivaldoapereira@yahoo.com.br>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>