Properly register reset functions via the device class.
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Call msi_reset on device reset as still required by the core.
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Call msi_reset on device reset as still required by the core.
CC: Alexander Graf <agraf@suse.de>
CC: qemu-stable@nongnu.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Some drivers (Linux' 8139too among them) rely on the NIC
injecting an interrupt in the event of a receive buffer overflow
and, accordingly, set the RxOverflow bit in the interrupt
mask. Unfortunately rtl8139's can_receive method ignores the
RxOverflow flag, which may lead to a situation where rtl8139
stops receiving packets (can_receive returns 0) when the receive
buffer becomes full.
If the driver eventually read from the receive buffer or reset
the card the emulator could recover from this situation. However
some implementations only do this upon receiving an interrupt
with either RxOK or RxOverflow set in the ISR; interrupt that
will never come because QEMU's flow control mechanisms would
prevent rtl8139 from receiving any packet.
Letting packets go through when the overflow interrupt is enabled
makes the QEMU emulator compliant to the spec and solves the
problem.
This patch should fix a relatively common (in our experience)
network stall observed when running enterprise distros with
rtl8139 as the NIC; in some cases the 8139too device driver gets
loaded and when under heavy load the network eventually stops
working.
Reported-by: Hayato Kakuta <kakuta.hayato@oss.ntt.co.jp>
Tested-by: Hayato Kakuta <kakuta.hayato@oss.ntt.co.jp>
Acked-by: Igor Kovalenko <igor.v.kovalenko@gmail.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit ff71f2e8ca. This is because
the linux 8139cp driver would leave the card in "Config Register Write Enable"
mode after the eeprom were read or write ( which is unexpected in the spec
). Also a physical 8139 card can still DMA into host memory in modes other than
Normal mode, so we need revert this commit to align with the behavior of
physical card.
The issue of 8139cp driver should be fixed in linux seperately.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* qemu-kvm/uq/master:
virtio/vhost: Add support for KVM in-kernel MSI injection
msix: Add msix_nr_vectors_allocated
kvm: Enable use of kvm_irqchip_in_kernel in hwlib code
kvm: Introduce kvm_irqchip_add/remove_irqfd
kvm: Make kvm_irqchip_commit_routes an internal service
kvm: Publicize kvm_irqchip_release_virq
kvm: Introduce kvm_irqchip_add_msi_route
kvm: Rename kvm_irqchip_add_route to kvm_irqchip_add_irq_route
msix: Introduce vector notifiers
msix: Invoke msix_handle_mask_update on msix_mask_all
msix: Factor out msix_get_message
kvm: update vmxcap for EPT A/D, INVPCID, RDRAND, VMFUNC
kvm: Enable in-kernel irqchip support by default
kvm: Add support for direct MSI injections
kvm: Update kernel headers
kvm: x86: Wire up MSI support for in-kernel irqchip
pc: Enable MSI support at APIC level
kvm: Introduce basic MSI support for in-kernel irqchips
Introduce MSIMessage structure
kvm: Refactor KVMState::max_gsi to gsi_count
* kwolf/for-anthony:
ahci: SATA FIS is 20 bytes, not 0x20
virtio-blk: Fix geometry sector calculation
block: prevent snapshot mode $TMPDIR symlink attack
sheepdog: fix return value of do_load_save_vm_state
virtio: Fix compiler warning for non Linux hosts
As in the SATA and AHCI specifications, a FIS is 5 Dwords of 4 bytes
each, which comes to 20 bytes (decimal), not 0x20.
Signed-off-by: Daniel Verkamp <daniel@drv.nu>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently the sector value for the geometry is masked, even if the
user usesa command line parameter that explicitely gives a number.
This breaks dasd devices on s390. A dasd device can have
a physical block size of 4096 (== same for logical block size)
and a typcial geometry of 15 heads and 12 sectors per cyl.
The ibm partition detection relies on a correct geometry
reported by the device. Unfortunately the current code changes
12 to 8. This would be necessary if the total size is
not a multiple of logical sector size, but for dasd this
is not the case.
This patch checks the device size and only applies sector
mask if necessary.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In snapshot mode, bdrv_open creates an empty temporary file without
checking for mkstemp or close failure, and ignoring the possibility
of a buffer overrun given a surprisingly long $TMPDIR.
Change the get_tmp_filename function to return int (not void),
so that it can inform its two callers of those failures.
Also avoid the risk of buffer overrun and do not ignore mkstemp
or close failure.
Update both callers (in block.c and vvfat.c) to propagate
temp-file-creation failure to their callers.
get_tmp_filename creates and closes an empty file, while its
callers later open that presumed-existing file with O_CREAT.
The problem was that a malicious user could provoke mkstemp failure
and race to create a symlink with the selected temporary file name,
thus causing the qemu process (usually root owned) to open through
the symlink, overwriting an attacker-chosen file.
This addresses CVE-2012-2652.
http://bugzilla.redhat.com/CVE-2012-2652
Signed-off-by: Jim Meyering <meyering@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_save_vmstate and bdrv_load_vmstate should return the vmstate size
on success, and -errno on error.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The local variables ret, i are only used if __linux__ is defined.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In snapshot mode, bdrv_open creates an empty temporary file without
checking for mkstemp or close failure, and ignoring the possibility
of a buffer overrun given a surprisingly long $TMPDIR.
Change the get_tmp_filename function to return int (not void),
so that it can inform its two callers of those failures.
Also avoid the risk of buffer overrun and do not ignore mkstemp
or close failure.
Update both callers (in block.c and vvfat.c) to propagate
temp-file-creation failure to their callers.
get_tmp_filename creates and closes an empty file, while its
callers later open that presumed-existing file with O_CREAT.
The problem was that a malicious user could provoke mkstemp failure
and race to create a symlink with the selected temporary file name,
thus causing the qemu process (usually root owned) to open through
the symlink, overwriting an attacker-chosen file.
This addresses CVE-2012-2652.
http://bugzilla.redhat.com/CVE-2012-2652
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
NULL pointer dereference in case no vnc server is configured.
Catch this and return -EINVAL like vnc_display_password() does.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Changes v2 -> v3;
- Check for kvm_enabled() before setting cpuid_7_0_ebx_features
Changes v1 -> v2:
- Use kvm_arch_get_supported_cpuid() instead of host_cpuid() on
cpu_x86_fill_host().
We should use GET_SUPPORTED_CPUID for all bits on "-cpu host"
eventually, but I am not changing all the other CPUID leaves because
we may not be able to test such an intrusive change in time for 1.1.
Description of the bug:
Since QEMU 0.15, the CPUID information on CPUID[EAX=7,ECX=0] is being
returned unfiltered to the guest, directly from the GET_SUPPORTED_CPUID
return value.
The problem is that this makes the resulting CPU feature flags
unpredictable and dependent on the host CPU and kernel version. This
breaks live-migration badly if migrating from a host CPU that supports
some features on that CPUID leaf (running a recent kernel) to a kernel
or host CPU that doesn't support it.
Migration also is incorrect (the virtual CPU changes under the guest's
feet) if you migrate in the opposite direction (from an old CPU/kernel
to a new CPU/kernel), but with less serious consequences (guests
normally query CPUID information only once on boot).
Fortunately, the bug affects only users using cpudefs with level >= 7.
The right behavior should be to explicitly enable those features on
[cpudef] config sections or on the "-cpu" command-line arguments. Right
now there is no predefined CPU model on QEMU that has those features:
the latest Intel model we have is Sandy Bridge.
I would like to get this fixed on 1.1, so I am submitting this patch,
that enables those features only if "-cpu host" is being used (as we
don't have any pre-defined CPU model that actually have those features).
After 1.1 is released, we can make those features properly configurable
on [cpudef] and -cpu configuration.
One problem is: with this patch, users with the following setup:
- Running QEMU 1.0;
- Using a cpudef having level >= 7;
- Running a kernel that supports the features on CPUID leaf 7; and
- Running on a CPU that supports some features on CPUID leaf 7
won't be able to live-migrate to QEMU 1.1. But for these users
live-migration is already broken (they can't live-migrate to hosts with
older CPUs or older kernels, already), I don't see how to avoid this
problem.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently we re-read/re-process /etc/mtab to get an updated list of
mounts when guest-fsfreeze-thaw is called. This can cause an atime
update on /etc/mtab, which will block if we're in a frozen state.
Instead, use /proc's version of mtab, which may not be up-to-date with
options passed via -o remount, but is compatible for our use cases since
we only care about the filesystem type.
Reported-by: Matsuda, Daiki <matsudadik@intellilink.co.jp>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Use _NSGetEnviron() helper to access the environment.
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Cc: Charlie Somerville <charlie@charliesomerville.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Start VM with 8 multiple-function block devs, hot-removing
those block devs by 'device_del ...' would cause qemu abort.
| (qemu) device_del virti0-0-0
| (qemu) **
|ERROR:qom/object.c:389:object_delete: assertion failed: (obj->ref == 0)
It's a regression introduced by commit 57c9fafe
The whole PCI slot should be removed once. Currently only one func
is cleaned in pci_unplug_device(), if you try to remove a single
func by monitor cmd.
free_qdev() are called for all functions in slot,
but unparent_delete() is only called for one
function.
Signed-off-by: XXXX
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The previous multiboot load code did not treat the case where
load_end_addr was 0 specially. The multiboot specification says the
following:
* load_end_addr
Contains the physical address of the end of the data segment.
(load_end_addr - load_addr) specifies how much data to load. This
implies that the text and data segments must be consecutive in the
OS image; this is true for existing a.out executable formats. If
this field is zero, the boot loader assumes that the text and data
segments occupy the whole OS image file.
Signed-off-by: Scott Moser <smoser@ubuntu.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
With pc-0.12, we map the video RAM both through the PCI BAR (the guest does
this) and through a fixed mapping at 0xe0000000. The memory API doesn't allow
this double map, and aborts.
Fix by using an alias.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Commit f29a56147b (implement
-no-user-config command-line option (v3)) introduced uses of bool
in arch_init.c. Shortly before that usage is support code for
AltiVec (conditional to __ALTIVEC__).
GCC's altivec.h may in a !__APPLE_ALTIVEC__ code path redefine bool,
leading to type mismatches. altivec.h recommends to #undef for C++
compatibility, but doing so in C leads to bool remaining undefined.
Fix by redefining bool to _Bool as mandated for stdbool.h by POSIX.
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
* sstabellini/for_1.1_rc3:
Call xc_domain_shutdown with the reboot flag when the guest requests a reboot.
xen: Fix PV-on-HVM
xen_disk: properly update stats in ioreq_release()
xen_disk: use bdrv_aio_flush instead of bdrv_flush
xen_disk: remove syncwrite option
xen: disable rtc_clock
xen: do not initialize the interval timer and PCSPK emulator
* kwolf/for-anthony:
fdc-test: introduced qtest no_media_on_start and cmos qtest for floppy
fdc: fix media detection
fdc: floppy drive should be visible after start without media
qemu-iotests: mark 035 qcow2-only
qcow2: Check qcow2_alloc_clusters_at() return value
sheepdog: use heap instead of stack for BDRVSheepdogState
sheepdog: return -errno on error
sheepdog: mark image as snapshot when tag is specified
qemu-img: Explain how rebase operation can be used to perform a 'diff' operation.
qcow2: don't leak buffer for unexpected qcow_version in header
* kiszka/queues/slirp:
slirp: Avoid redefining MAX_TCPOPTLEN
slirp: Avoid statements without effect on Big Endian host
slirp: Untangle TCPOLEN_* from TCPOPT_*
* bonzini/scsi-next:
ISCSI: Switch to using READ16/WRITE16 for I/O to the LUN
ISCSI: Only call READCAPACITY16 for SBC devices, use READCAPACITY10 for MMC
ISCSI: get device type at connection time
ISCSI: change num_blocks to 64-bit
ISCSI: redo how we set up the events
scsi: declare vmstate_info_scsi_requests to be static
MAX_TCPOPTLEN is being defined as 32. Darwin already has it as 40,
causing a warning. The value is only used to declare an array,
into which currently 4 bytes are written at most.
Therefore always override MAX_TCPOPTLEN for now.
Suggested-by: Jan Kiszka <jan.kiszka@web.de>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Darwin has HTON*/NTOH* macros that on BE simply return the argument.
This is incompatible with SLIRP's use of these macros as a statement.
Undefine the macros in the HOST_WORDS_BIGENDIAN code path to redefine
these macros as no-op, as already done when they were undefined.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
This allows using LUNs bigger than 2TB. Keep using READ10 for other
device types such as MMC.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
This is needed to avoid READ CAPACITY(16) for MMC devices.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Call qemu_notify_event() after updating events. Otherwise, If we add
an event for -is-writeable but the socket is already writeable there
may be a delay before the event callback is actually triggered.
Those delays would in particular hurt performance during BIOS boot and
when the GRUB bootloader reads the kernel and initrd.
But first call out to the socket write functions directly, and only set up
the write event if the socket is full. This will happen very rarely and
this improves performance.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Commit b72210568e (slirp: clean up
conflicts with system headers) enclosed TCPOLEN_MAXSEG with an #ifdef
TCPOPT_EOL. This broke the build on illumos, which has TCPOPT_*
but not TCPOLEN_*.
Move them to their own #ifdef TCPOLEN_MAXSEG section to remedy this.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
powerpc-apple-darwin9-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5577)
does not define _CALL_DARWIN, leading to unexpected behavior w.r.t.
register clobbering and stack frame layout.
Since _CALL_DARWIN is a reserved identifier, define a custom
TCG_TARGET_CALL_DARWIN based on either _CALL_DARWIN or __APPLE__.
Signed-off-by: Andreas F?rber <andreas.faerber@web.de>
Signed-off-by: malc <av1474@comtv.ru>
As default a guest has always one floppy drive so 0x10 byte in CMOS
has to have 0x40 value. Higher 4 bits means that the first floppy drive
is 1.44 Mb 3"5 drive and lower 4 bits means the second drive is not present.
After the guest starts DSKCHG bit in DIR register should be set. If there
is no media in drive, this bit should be set all the time.
Because we start the guest without media in drive, we have to swap
'eject' and 'change' in 'test_media_change'.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have to set up 'media_changed' after guest start so floppy driver
could detect that there is no media in drive. For this purpose we call
'fdctrl_change_cb' instead of 'fd_revalidate' in 'fdctrl_connect_drives'.
'fd_revalidate' is called inside 'fdctrl_change_cb'.
We still have to set default drive geometry in 'fd_revalidate' even
if there is no media in drive. When you try to open (windows) or mount (linux)
floppy the driver tries to seek on track 1. Linux guest stuck in loop then
kernel crashes and windows guest prints error message.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If you start guest with floppy drive but without media inserted, guest
still should see floppy drive pressent.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The 035 parallel aio write test relies on knowledge of qcow2 metadata
layout to stress parallel L2 table accesses. This only works for qcow2
unless we add additional calculations for qed or other formats.
Mark this test as qcow2-only.
Note that the test is strictly speaking non-deterministic although the
output produced is reliable with qcow2. This is because the aio_write
command returns before the aio write request has completed. Completions
can occur at any time afterwards and cause a message to be printed.
Therefore the exact output of this test is not deterministic but we seem
to get away with it for qcow2 (maybe due to coroutine and main loop
scheduling).
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>