If an error occurs in l2_allocate, the allocated (but unused) L2 cluster
should be freed.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Since this is only read in cpu_copy() and linux-user has a global
cpu_model, drop the field from generic code.
Signed-off-by: Andreas Färber <afaerber@suse.de>
It is only used there and is deemed very fragile if not incorrect in its
current memcpy() form. Moving it into linux-user will allow to move
parts into target_cpu.h headers and only copy what the ABI mandates.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Local variable CPUClass *cc needs to be reloaded after return from longjmp,
too. (This fixes a mips-softmmu crash observed on FreeBSD when QEMU is
built with clang.)
Reported-by: Dimitry Andric <dim@FreeBSD.org>
Signed-off-by: Juergen Lock <nox@jelal.kn-bremen.de>
Signed-off-by: Andreas Färber <afaerber@suse.de>
While dirent->d_type is 8 bit for most systems, it is 32 bit for MinGW.
Reducing it to 8 bit results in a compiler warning because the macro
is_dir_maybe compares that 8 bit value with 32 bit constants.
Using 'unsigned' instead of 'unsigned char' matches the declaration for
MinGW and does not harm the other systems.
MinGW-w64 is not affected: it does not declare d_type.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
If there is no operation driver for the xattr type the
functions return '-1' and set errno to '-EOPNOTSUPP'.
When the calling code sets 'ret = -errno' this turns
into a large positive number.
In Linux 3.11, the kernel has switched to using 9p
version 9p2000.L, instead of 9p2000.u, which enables
support for xattr operations. This on its own is harmless,
but for another change which makes it request the xattr
with a name 'security.capability'.
The result is that the guest sees a succesful return
of 95 bytes of data, instead of a failure with errno
set to 95. Since the kernel expects a maximum of 20
bytes for an xattr return this gets translated to the
unexpected errno ERANGE.
This all means that when running a binary off a 9p fs
in 3.11 kernels you get a fun result of:
# ./date
sh: ./date: Numerical result out of range
The only workaround is to pass 'version=9p2000.u' when
mounting the 9p fs in the guest, to disable all use of
xattrs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Latest gcc-4.8 supports a new option -fsanitize=address which activates
an AddressSanitizer. This AddressSanitizer stops the QEMU system emulation
very early because two character arrays of size 8 are potentially written
with 9 bytes.
Commit 6ea314d914 added the code.
There is no obvious reason why width or height could need 8 characters,
so reduce it to 7 characters which together with the terminating '\0'
fit into the arrays.
Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Alex Bennée <alex@bennee.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
VFIO is always little endian so do byte swapping of our mask on the
way in and byte swapping of the size on the way out.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Just to be sure we don't jump off any NULL pointer cliffs.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
rom_state_paddr is guest provided (caller address of outw(VAPIC_PORT) +
writen 16-bit value) and can be influenced to point beyond the end of
the host memory backing the guest's RAM. Make sure we do not use this
pointer to actually read beyond the limits.
Reading arbitrary guest bytes is harmless, the guest kernel has to
manage access to this I/O port anyway.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Memory regions can easily be 2^64 byte long and therefore overflow
for just a bit but that is enough for int128_get64() to assert.
This takes care of debug printing of huge section sizes.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Where *software* leaves 0x0000 - 0x2000 unmapped, the hardware should
still allow for this area to be mapped.
Signed-off-by: Sebastian Macke <sebastian@macke.de>
Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Reviewed-by: Jia Liu <proljc@gmail.com>
The result of (rw & 0) is always zero and therefore a logic false.
The whole comparison will therefore never be executed, it is a obvious bug,
we should use !(rw & 1) here.
Signed-off-by: Sebastian Macke <sebastian@macke.de>
Reviewed-by: Jia Liu <proljc@gmail.com>
Now that VFIO has a PCI hot reset interface, take advantage of it.
There are two modes that we need to consider. The first is when only
one device within the set of devices affected is actually assigned to
the guest. In this case the other devices are are just held by VFIO
for isolation and we can pretend they're not there, doing an entire
bus reset whenever the device reset callback is triggered. Supporting
this case separately allows us to do the best reset we can do of the
device even if the device is hotplugged.
The second mode is when multiple affected devices are all exposed to
the guest. In this case we can only do a hot reset when the entire
system is being reset. However, this also allows us to track which
individual devices are affected by a reset and only do them once.
We split our reset function into pre- and post-reset helper functions
prioritize the types of device resets available to us, and create
separate _one vs _multi reset interfaces to handle the distinct cases
above.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Device communication errors need to be reported to driver.
Add a debug message while at it.
Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
Acked-by: Gerd Hoffmann <kraxel@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
With Linux kernel version 3.3 or later, qemu fails with the following message:
sh_serial: unsupported read from 0x18
Aborted
Reported-and-analyzed-by: Rob Landley <rob@landley.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
From buildbot default_i386_rhel61:
CC alpha-softmmu/hw/alpha/typhoon.o
hw/alpha/typhoon.c: In function 'typhoon_translate_iommu':
hw/alpha/typhoon.c:703: warning: integer constant is too large for 'long' type
hw/alpha/typhoon.c:703: warning: integer constant is too large for 'long' type
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
From buildbot default_i386_rhel61:
CC i386-softmmu/target-i386/arch_memory_mapping.o
target-i386/arch_memory_mapping.c: In function 'walk_pde':
target-i386/arch_memory_mapping.c:110: warning:
integer constant is too large for 'long' type
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
blockdev.c:1929:13: warning: Value stored to 'ret' is never read
ret = 0;
^ ~
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Touched some error after enabling DEBUG_SUBPAGE.
Signed-off-by: Amos Kong <akong@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Commit 4f193e3 added the test, but screwed up in-tree builds
(SRCDIR=.): the tests's output overwrites the expected output, and is
thus compared to itself.
Cc: qemu-stable@nongnu.org
Reported-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
During vfio-pci initfn, the device is not always in a state where the
option ROM can be read. In the case of graphics cards, there's often
no per function reset, which means we have host driver state affecting
whether the option ROM is usable. Ideally we want to move reading the
option ROM past any co-assigned device resets to the point where the
guest first tries to read the ROM itself.
To accomplish this, we switch the memory region for the option rom to
an I/O region rather than a memory mapped region. This has the side
benefit that we don't waste KVM memory slots for a BAR where we don't
care about performance. This also allows us to delay loading the ROM
from the device until the first read by the guest. We then use the
PCI config space size of the ROM BAR when setting up the BAR through
QEMU PCI.
Another benefit of this approach is that previously when a user set
the ROM to a file using the romfile= option, we still probed VFIO for
the parameters of the ROM, which can result in dmesg errors about an
invalid ROM. We now only probe VFIO to get the ROM contents if the
guest actually tries to read the ROM.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Not all resets are created equal. PM reset is not very reliable,
especially for GPUs, so we might want to opt for a bus reset if a
standard reset will only do a D3hot->D0 transition. We can also
use this to tell if the standard reset will do a bus reset (if
neither has_pm_reset or has_flr is probed, but the device still
supports reset).
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When MSI is accelerated through KVM the vectors are only programmed
when the guest first enables MSI support. Subsequent writes to the
vector address or data fields are ignored. Unfortunately that means
we're ignore updates done to adjust SMP affinity of the vectors.
MSI SMP affinity already works in non-KVM mode because the address
and data fields are read from their backing store on each interrupt.
This patch stores the MSIMessage programmed into KVM so that we can
determine when changes are made and update the routes.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
SO_REUSEADDR should be avoided on Windows but is desired on other operating
systems. So instead of setting it we call socket_set_fast_reuse that will result
in the appropriate behaviour on all operating systems.
Signed-off-by: Sebastian Ottlik <ottlik@fzi.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
SO_REUSEADDR should be avoided on Windows but is desired on other operating
systems. So instead of setting it we call socket_set_fast_reuse that will result
in the appropriate behaviour on all operating systems.
Signed-off-by: Sebastian Ottlik <ottlik@fzi.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
SO_REUSEADDR should be avoided on Windows but is desired on other operating
systems. So instead of setting it we call socket_set_fast_reuse that will result
in the appropriate behaviour on all operating systems.
An exception to this rule are multicast sockets where it is sensible to have
multiple sockets listen on the same ip and port and we should set SO_REUSEADDR
on windows.
Signed-off-by: Sebastian Ottlik <ottlik@fzi.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
SO_REUSEADDR should be avoided on Windows but is desired on other operating
systems. So instead of setting it we call socket_set_fast_reuse that will result
in the appropriate behaviour on all operating systems.
Signed-off-by: Sebastian Ottlik <ottlik@fzi.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
If a socket is closed it remains in TIME_WAIT state for some time. On operating
systems using BSD sockets the endpoint of the socket may not be reused while in
this state unless SO_REUSEADDR was set on the socket. On windows on the other
hand the default behaviour is to allow reuse (i.e. identical to SO_REUSEADDR on
other operating systems) and setting SO_REUSEADDR on a socket allows it to be
bound to a endpoint even if the endpoint is already used by another socket
independently of the other sockets state. This can even result in undefined
behaviour.
Many sockets used by QEMU should not block the use of their endpoint after being
closed while they are still in TIME_WAIT state. Currently QEMU sets SO_REUSEADDR
for such sockets, which can lead to problems on Windows. This patch introduces
the function socket_set_fast_reuse that should be used instead of setting
SO_REUSEADDR when fast socket reuse is desired and behaves correctly on all
operating systems.
As a failure of this function can only be caused by bad QEMU internal errors, an
assertion handles these situations. The return value is still passed on, to
minimize changes in client code and prevent unused variable warnings if NDEBUG
is defined.
Signed-off-by: Sebastian Ottlik <ottlik@fzi.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The data in leaf 0Dh depends on information from other feature bits.
Instead of passing it blindly from the host, compute it based on
whether these feature bits are enabled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
and not restore anything.
Since FP and SSE data are always valid, set them in xstate_bv at reset
time. In fact, that value is the same that KVM_GET_XSAVE returns on
pre-XSAVE hosts.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
There's no Intel CPU with family=6,model=2, and Linux and Windows guests
disable SEP when seeing that combination due to Pentium Pro erratum #82.
In addition to just having SEP ignored by guests, Skype (and maybe other
applications) runs sysenter directly without passing through ntdll on
Windows, and crashes because Windows ignored the SEP CPUID bit.
So, having model > 2 is a better default on qemu64 and qemu32 for two
reasons: making SEP really available for guests, and avoiding crashing
applications that work on bare metal.
model=3 would fix the problem, but it causes CPU enumeration problems
for Windows guests[1]. So let's set model=6, that matches "Athlon
(PM core)" on AMD and "P2 with on-die L2 cache" on Intel and it allows
Windows to use all CPUs as well as fixing sysenter.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=508623
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Switching the L1 table in memory should be an atomic operation, as far
as possible. Calling qcow2_free_clusters on the old L1 table on disk is
not a good idea when the old L1 table is no longer valid and the address
to the new one hasn't yet been written into the corresponding
BDRVQcowState field. To be more specific, this can lead to segfaults due
to qcow2_check_metadata_overlap trying to access the L1 table during the
free operation.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This blocks migration for VHDX image files, until the
functionality can be supported.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The content filename point to will be erased by qemu_opts_absorb_qdict()
in raw_open_common() in drv->bdrv_file_open()
So it's better to use bs->filename.
Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
CHECK_OFLAG_COPIED as a parameter to check_refcounts_l1 and
check_refcounts_l2 is obselete now, since the OFLAG_COPIED consistency
check is actually no longer performed by these functions (but by
check_oflag_copied).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If an inactive L1 table is loaded from disk, its entries are in big
endian and have to be converted to host byte order before using them.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
There are free scheduling slots between the sequence of
comparison instructions. This requires changing the
register in use to avoid conflict with those compares.
Signed-off-by: Richard Henderson <rth@twiddle.net>
The main intent of the patch is to allow the tlb addend register
to be changed, without tying that change to the constraint. But
the most common side-effect seems to be to enable usage of ldrd
with the r0,r1 pair.
Signed-off-by: Richard Henderson <rth@twiddle.net>
This allows us to make more intelligent decisions about the relative
offsets of the tlb comparator and the addend, avoiding any need of
writeback addressing.
Signed-off-by: Richard Henderson <rth@twiddle.net>
One of the two constraints we already checked via #if, but
the tlb offset distance was only checked at runtime.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use the new helper_ret_*_mmu routines. Use a conditional call
to arrange for a tail-call from the store path, and to load the
return address for the helper for the load path.
Signed-off-by: Richard Henderson <rth@twiddle.net>