Commit Graph

20304 Commits

Author SHA1 Message Date
Jim Meyering
eba25057b9 block: prevent snapshot mode $TMPDIR symlink attack
In snapshot mode, bdrv_open creates an empty temporary file without
checking for mkstemp or close failure, and ignoring the possibility
of a buffer overrun given a surprisingly long $TMPDIR.
Change the get_tmp_filename function to return int (not void),
so that it can inform its two callers of those failures.
Also avoid the risk of buffer overrun and do not ignore mkstemp
or close failure.
Update both callers (in block.c and vvfat.c) to propagate
temp-file-creation failure to their callers.

get_tmp_filename creates and closes an empty file, while its
callers later open that presumed-existing file with O_CREAT.
The problem was that a malicious user could provoke mkstemp failure
and race to create a symlink with the selected temporary file name,
thus causing the qemu process (usually root owned) to open through
the symlink, overwriting an attacker-chosen file.

This addresses CVE-2012-2652.
http://bugzilla.redhat.com/CVE-2012-2652

Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-30 14:48:40 +08:00
Gerd Hoffmann
e78bd5ab07 xhci: add usage info to docs
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-30 10:28:44 +08:00
Gerd Hoffmann
1643f2b232 vnc: fix segfault in vnc_display_pw_expire()
NULL pointer dereference in case no vnc server is configured.
Catch this and return -EINVAL like vnc_display_password() does.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-30 10:28:44 +08:00
Eduardo Habkost
1352672860 Expose CPUID leaf 7 only for -cpu host
Changes v2 -> v3;
  - Check for kvm_enabled() before setting cpuid_7_0_ebx_features

Changes v1 -> v2:
  - Use kvm_arch_get_supported_cpuid() instead of host_cpuid() on
    cpu_x86_fill_host().

  We should use GET_SUPPORTED_CPUID for all bits on "-cpu host"
  eventually, but I am not changing all the other CPUID leaves because
  we may not be able to test such an intrusive change in time for 1.1.

Description of the bug:

Since QEMU 0.15, the CPUID information on CPUID[EAX=7,ECX=0] is being
returned unfiltered to the guest, directly from the GET_SUPPORTED_CPUID
return value.

The problem is that this makes the resulting CPU feature flags
unpredictable and dependent on the host CPU and kernel version. This
breaks live-migration badly if migrating from a host CPU that supports
some features on that CPUID leaf (running a recent kernel) to a kernel
or host CPU that doesn't support it.

Migration also is incorrect (the virtual CPU changes under the guest's
feet) if you migrate in the opposite direction (from an old CPU/kernel
to a new CPU/kernel), but with less serious consequences (guests
normally query CPUID information only once on boot).

Fortunately, the bug affects only users using cpudefs with level >= 7.

The right behavior should be to explicitly enable those features on
[cpudef] config sections or on the "-cpu" command-line arguments. Right
now there is no predefined CPU model on QEMU that has those features:
the latest Intel model we have is Sandy Bridge.

I would like to get this fixed on 1.1, so I am submitting this patch,
that enables those features only if "-cpu host" is being used (as we
don't have any pre-defined CPU model that actually have those features).
After 1.1 is released, we can make those features properly configurable
on [cpudef] and -cpu configuration.

One problem is: with this patch, users with the following setup:
- Running QEMU 1.0;
- Using a cpudef having level >= 7;
- Running a kernel that supports the features on CPUID leaf 7; and
- Running on a CPU that supports some features on CPUID leaf 7
won't be able to live-migrate to QEMU 1.1. But for these users
live-migration is already broken (they can't live-migrate to hosts with
older CPUs or older kernels, already), I don't see how to avoid this
problem.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-30 10:28:44 +08:00
Michael Roth
9e2fa418fb qemu-ga: avoid blocking on atime update when reading /etc/mtab
Currently we re-read/re-process /etc/mtab to get an updated list of
mounts when guest-fsfreeze-thaw is called. This can cause an atime
update on /etc/mtab, which will block if we're in a frozen state.

Instead, use /proc's version of mtab, which may not be up-to-date with
options passed via -o remount, but is compatible for our use cases since
we only care about the filesystem type.

Reported-by: Matsuda, Daiki <matsudadik@intellilink.co.jp>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2012-05-29 21:00:42 -05:00
Andreas Färber
eecae14724 qemu-ga: Fix use of environ on Darwin
Use _NSGetEnviron() helper to access the environment.

Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Cc: Charlie Somerville <charlie@charliesomerville.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2012-05-29 21:00:40 -05:00
Amos Kong
a6de8ed80e pci: call object_unparent() before free_qdev()
Start VM with 8 multiple-function block devs, hot-removing
those block devs by 'device_del ...' would cause qemu abort.

| (qemu) device_del virti0-0-0
| (qemu) **
|ERROR:qom/object.c:389:object_delete: assertion failed: (obj->ref == 0)

It's a regression introduced by commit 57c9fafe

The whole PCI slot should be removed once. Currently only one func
is cleaned in pci_unplug_device(), if you try to remove a single
func by monitor cmd.

free_qdev() are called for all functions in slot,
but unparent_delete() is only called for one
function.

Signed-off-by: XXXX
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-29 20:19:24 -05:00
Scott Moser
9c3a596a03 fix multiboot loading if load_end_addr == 0
The previous multiboot load code did not treat the case where
load_end_addr was 0 specially.  The multiboot specification says the
following:
 * load_end_addr
   Contains the physical address of the end of the data segment.
   (load_end_addr - load_addr) specifies how much data to load. This
   implies that the text and data segments must be consecutive in the
   OS image; this is true for existing a.out executable formats. If
   this field is zero, the boot loader assumes that the text and data
   segments occupy the whole OS image file.

Signed-off-by: Scott Moser <smoser@ubuntu.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-29 20:19:24 -05:00
Avi Kivity
8294a64d7f vga: fix vram double-mapping with -vga std and -M pc-0.12
With pc-0.12, we map the video RAM both through the PCI BAR (the guest does
this) and through a fixed mapping at 0xe0000000.  The memory API doesn't allow
this double map, and aborts.

Fix by using an alias.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-29 20:19:24 -05:00
Anthony Liguori
1c4ad9d2b4 Merge remote-tracking branch 'afaerber-or/cocoa-for-upstream' into staging
* afaerber-or/cocoa-for-upstream:
  cocoa: Suppress Cocoa frontend for -qtest
  arch_init: Fix AltiVec build on Darwin/ppc
2012-05-29 06:54:16 -05:00
Andreas Färber
60b46aa2f3 cocoa: Suppress Cocoa frontend for -qtest
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
2012-05-29 11:40:27 +02:00
Andreas Färber
f283edc482 arch_init: Fix AltiVec build on Darwin/ppc
Commit f29a56147b (implement
-no-user-config command-line option (v3)) introduced uses of bool
in arch_init.c. Shortly before that usage is support code for
AltiVec (conditional to __ALTIVEC__).

GCC's altivec.h may in a !__APPLE_ALTIVEC__ code path redefine bool,
leading to type mismatches. altivec.h recommends to #undef for C++
compatibility, but doing so in C leads to bool remaining undefined.

Fix by redefining bool to _Bool as mandated for stdbool.h by POSIX.

Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-29 11:38:07 +02:00
Anthony Liguori
dd86df756e Merge remote-tracking branch 'sstabellini/for_1.1_rc3' into staging
* sstabellini/for_1.1_rc3:
  Call xc_domain_shutdown with the reboot flag when the guest requests a reboot.
  xen: Fix PV-on-HVM
  xen_disk: properly update stats in ioreq_release()
  xen_disk: use bdrv_aio_flush instead of bdrv_flush
  xen_disk: remove syncwrite option
  xen: disable rtc_clock
  xen: do not initialize the interval timer and PCSPK emulator
2012-05-29 04:32:13 -05:00
Anthony Liguori
422831fc81 Merge remote-tracking branch 'mdroth/qga-pull-5-24-12' into staging
* mdroth/qga-pull-5-24-12:
  qemu-ga: Fix missing environ declaration
  configure: check if environ is declared
2012-05-29 04:31:29 -05:00
Anthony Liguori
306761537f Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  fdc-test: introduced qtest no_media_on_start and cmos qtest for floppy
  fdc: fix media detection
  fdc: floppy drive should be visible after start without media
  qemu-iotests: mark 035 qcow2-only
  qcow2: Check qcow2_alloc_clusters_at() return value
  sheepdog: use heap instead of stack for BDRVSheepdogState
  sheepdog: return -errno on error
  sheepdog: mark image as snapshot when tag is specified
  qemu-img: Explain how rebase operation can be used to perform a 'diff' operation.
  qcow2: don't leak buffer for unexpected qcow_version in header
2012-05-29 04:30:49 -05:00
Anthony Liguori
7943df571a Merge remote-tracking branch 'kiszka/queues/slirp' into staging
* kiszka/queues/slirp:
  slirp: Avoid redefining MAX_TCPOPTLEN
  slirp: Avoid statements without effect on Big Endian host
  slirp: Untangle TCPOLEN_* from TCPOPT_*
2012-05-29 04:30:00 -05:00
Anthony Liguori
d501f8478a Merge remote-tracking branch 'bonzini/scsi-next' into staging
* bonzini/scsi-next:
  ISCSI: Switch to using READ16/WRITE16 for I/O to the LUN
  ISCSI: Only call READCAPACITY16 for SBC devices, use READCAPACITY10 for MMC
  ISCSI: get device type at connection time
  ISCSI: change num_blocks to 64-bit
  ISCSI: redo how we set up the events
  scsi: declare vmstate_info_scsi_requests to be static
2012-05-29 04:28:59 -05:00
Andreas Färber
917cfc1f26 slirp: Avoid redefining MAX_TCPOPTLEN
MAX_TCPOPTLEN is being defined as 32. Darwin already has it as 40,
causing a warning. The value is only used to declare an array,
into which currently 4 bytes are written at most.

Therefore always override MAX_TCPOPTLEN for now.

Suggested-by: Jan Kiszka <jan.kiszka@web.de>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
2012-05-28 22:44:27 +02:00
Andreas Färber
9b24d8e987 slirp: Avoid statements without effect on Big Endian host
Darwin has HTON*/NTOH* macros that on BE simply return the argument.
This is incompatible with SLIRP's use of these macros as a statement.

Undefine the macros in the HOST_WORDS_BIGENDIAN code path to redefine
these macros as no-op, as already done when they were undefined.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
2012-05-28 22:31:07 +02:00
Ronnie Sahlberg
f4dfa67f04 ISCSI: Switch to using READ16/WRITE16 for I/O to the LUN
This allows using LUNs bigger than 2TB.  Keep using READ10 for other
device types such as MMC.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-05-28 14:04:16 +02:00
Ronnie Sahlberg
6bcd1346bb ISCSI: Only call READCAPACITY16 for SBC devices, use READCAPACITY10 for MMC
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-05-28 14:04:15 +02:00
Ronnie Sahlberg
dbfff6d776 ISCSI: get device type at connection time
This is needed to avoid READ CAPACITY(16) for MMC devices.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-28 14:04:14 +02:00
Paolo Bonzini
c7b4a95202 ISCSI: change num_blocks to 64-bit
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-28 14:04:14 +02:00
Ronnie Sahlberg
c9b9f6824f ISCSI: redo how we set up the events
Call qemu_notify_event() after updating events.  Otherwise, If we add
an event for -is-writeable but the socket is already writeable there
may be a delay before the event callback is actually triggered.

Those delays would in particular hurt performance during BIOS boot and
when the GRUB bootloader reads the kernel and initrd.

But first call out to the socket write functions directly, and only set up
the write event if the socket is full.  This will happen very rarely and
this improves performance.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-05-28 14:04:06 +02:00
Andreas Färber
e20e48a802 slirp: Untangle TCPOLEN_* from TCPOPT_*
Commit b72210568e (slirp: clean up
conflicts with system headers) enclosed TCPOLEN_MAXSEG with an #ifdef
TCPOPT_EOL. This broke the build on illumos, which has TCPOPT_*
but not TCPOLEN_*.

Move them to their own #ifdef TCPOLEN_MAXSEG section to remedy this.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
2012-05-28 13:45:33 +02:00
Andreas Färber
24f50d7ea5 tcg/ppc: Handle _CALL_DARWIN being undefined on Darwin
powerpc-apple-darwin9-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5577)
does not define _CALL_DARWIN, leading to unexpected behavior w.r.t.
register clobbering and stack frame layout.

Since _CALL_DARWIN is a reserved identifier, define a custom
TCG_TARGET_CALL_DARWIN based on either _CALL_DARWIN or __APPLE__.

Signed-off-by: Andreas F?rber <andreas.faerber@web.de>
Signed-off-by: malc <av1474@comtv.ru>
2012-05-27 21:52:56 +04:00
Pavel Hrdina
7cd331617a fdc-test: introduced qtest no_media_on_start and cmos qtest for floppy
As default a guest has always one floppy drive so 0x10 byte in CMOS
has to have 0x40 value. Higher 4 bits means that the first floppy drive
is 1.44 Mb 3"5 drive and lower 4 bits means the second drive is not present.

After the guest starts DSKCHG bit in DIR register should be set. If there
is no media in drive, this bit should be set all the time.

Because we start the guest without media in drive, we have to swap
'eject' and 'change' in 'test_media_change'.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:23:47 +02:00
Pavel Hrdina
cfb08fbafc fdc: fix media detection
We have to set up 'media_changed' after guest start so floppy driver
could detect that there is no media in drive. For this purpose we call
'fdctrl_change_cb' instead of 'fd_revalidate' in 'fdctrl_connect_drives'.
'fd_revalidate' is called inside 'fdctrl_change_cb'.

We still have to set default drive geometry in 'fd_revalidate' even
if there is no media in drive. When you try to open (windows) or mount (linux)
floppy the driver tries to seek on track 1. Linux guest stuck in loop then
kernel crashes and windows guest prints error message.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:21:12 +02:00
Pavel Hrdina
9ecd394753 fdc: floppy drive should be visible after start without media
If you start guest with floppy drive but without media inserted, guest
still should see floppy drive pressent.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:18:53 +02:00
Stefan Hajnoczi
b84762e245 qemu-iotests: mark 035 qcow2-only
The 035 parallel aio write test relies on knowledge of qcow2 metadata
layout to stress parallel L2 table accesses.  This only works for qcow2
unless we add additional calculations for qed or other formats.

Mark this test as qcow2-only.

Note that the test is strictly speaking non-deterministic although the
output produced is reliable with qcow2.  This is because the aio_write
command returns before the aio write request has completed.  Completions
can occur at any time afterwards and cause a message to be printed.
Therefore the exact output of this test is not deterministic but we seem
to get away with it for qcow2 (maybe due to coroutine and main loop
scheduling).

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:13:44 +02:00
Kevin Wolf
df02179189 qcow2: Check qcow2_alloc_clusters_at() return value
When using qcow2_alloc_clusters_at(), the cluster allocation code
checked the wrong variable for an error code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
MORITA Kazutaka
b6fc8245e9 sheepdog: use heap instead of stack for BDRVSheepdogState
bdrv_create() is called in coroutine context now, so we cannot use
more stack than 1 MB in the function if we use ucontext coroutine.
This patch allocates BDRVSheepdogState, whose size is 4 MB, on the
heap in sd_create().

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
MORITA Kazutaka
cb595887cc sheepdog: return -errno on error
On error, BlockDriver APIs should return -errno instead of -1.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
MORITA Kazutaka
622b6057be sheepdog: mark image as snapshot when tag is specified
When a snapshot tag is specified in the filename, the opened image is
a snapshot.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
Richard W.M. Jones
9fda6ab1d9 qemu-img: Explain how rebase operation can be used to perform a 'diff' operation.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
Jim Meyering
b6c147622d qcow2: don't leak buffer for unexpected qcow_version in header
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
Jim Meyering
12badfc238 scsi: declare vmstate_info_scsi_requests to be static
Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-25 13:00:27 +02:00
Luiz Capitulino
2c02cbf6e9 qemu-ga: Fix missing environ declaration
Commit 3674838cd0 uses the environ global
variable, but is relying on environ to be declared somewhere else.

This worked for me because on F16 environ is declared in <unistd.h>, but
that doesn't happen in OpenBSD for example, causing a build failure.

This commit fixes the build error by declaring environ if it hasn't
being declared yet.

Also fixes a build warning due to a missing <sys/wait.h> include.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2012-05-24 13:06:33 -05:00
Luiz Capitulino
8ab1bf120d configure: check if environ is declared
Some systems may declare environ automatically, others don't. Check for it.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2012-05-24 13:06:31 -05:00
Jan Kiszka
aeb29b6459 audio: Always call fini on exit
Not only clean up enabled voices but any registered one. Backends like
pulsaudio rely on unconditional fini handler invocations.

This fixes "Memory pool destroyed but not all memory blocks freed!"
warnings on VM shutdowns when pa is used and lockups of QEMU on shutdown
as it got stuck on some pa-internal synchronization point.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: malc <av1474@comtv.ru>
2012-05-24 19:35:27 +04:00
Stefan Weil
f8687bab91 es1370: Fix debug code
When DEBUG_ES1370 is defined, the compiler shows these warnings:

hw/es1370.c: In function ?es1370_update_voices?:
hw/es1370.c:414: warning: format ?%d? expects type ?int?, but argument 3 has type ?size_t?
hw/es1370.c: In function ?es1370_writel?:
hw/es1370.c:582: warning: format ?%d? expects type ?int?, but argument 3 has type ?long int?
hw/es1370.c:592: warning: format ?%d? expects type ?int?, but argument 3 has type ?long int?
hw/es1370.c:609: warning: format ?%d? expects type ?int?, but argument 3 has type ?long int?
hw/es1370.c: In function ?es1370_readl?:
hw/es1370.c:751: warning: suggest braces around empty body in an ?if? statement

Fix the format strings and add the missing braces.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: malc <av1474@comtv.ru>
2012-05-24 02:03:30 +04:00
Anthony Liguori
c48b0c80fc Update version for 1.1.0-rc3
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-22 09:21:01 -05:00
Anthony PERARD
4accd107d0 xen: Fix PV-on-HVM
In the context of PV-on-HVM under Xen, the emulated nics are supposed to be
unplug before the guest drivers are initialized, when the guest write to a
specific IO port.

Without this patch, the guest end up with two nics with the same MAC, the
emulated nic and the PV nic.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-21 15:40:51 -05:00
dunrong huang
a340046614 qdev: Fix memory leak
The str allocated in visit_type_str was not freed.

The visit_type_str function is an input visitor(<QMP/String/etc>-to-native)
here, it will allocate memory for caller, so the caller is responsible for
freeing the memory.

Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: dunrong huang <riegamaths@gmail.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-21 15:40:51 -05:00
Orit Wassermann
2a633c461e virtio: check virtio_load return code
Otherwise we crash on error.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Orit Wassermann <owasserm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-21 15:40:50 -05:00
Paolo Bonzini
a6c5c84ae2 virtio-blk: always enable VIRTIO_BLK_F_SCSI
VIRTIO_BLK_F_SCSI is supposed to mean whether the host can *parse*
SCSI requests, not *execute* them.  You could run QEMU with scsi=on
and a file-backed disk, and QEMU would fail all SCSI requests even
though it advertises VIRTIO_BLK_F_SCSI.

Because we need to do this to fix a migration compatibility problem
related to how QEMU is invoked by management, we must do this
unconditionally even on older machine types.  This more or less assumes
that no one ever invoked QEMU with scsi=off.

Here is how testing goes:

- old QEMU, scsi=on -> new QEMU, scsi=on
- new QEMU, scsi=on -> old QEMU, scsi=on
- old QEMU, scsi=off -> new QEMU, scsi=on
- new QEMU, scsi=off -> old QEMU, scsi=on
        ok (new QEMU has VIRTIO_BLK_F_SCSI, adding host features is fine)

- old QEMU, scsi=off -> new QEMU, scsi=off
        ok (new QEMU has VIRTIO_BLK_F_SCSI, adding host features is fine)

- old QEMU, scsi=on -> new QEMU, scsi=off
        ok, bug fixed

- new QEMU, scsi=on -> old QEMU, scsi=off
        doesn't work (same as: old QEMU, scsi=on -> old QEMU, scsi=off)

- new QEMU, scsi=off -> old QEMU, scsi=off
        broken by the patch

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-21 15:40:50 -05:00
Paolo Bonzini
12c5674b84 virtio-blk: define VirtIOBlkConf
We will have to add another field to the virtio-blk configuration in
the next patch.  Avoid a proliferation of arguments to virtio_blk_init.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-21 15:40:50 -05:00
Paolo Bonzini
0e47931b88 virtio-blk: blockdev_mark_auto_del is transport-independent
Move it from virtio_blk_exit_pci to virtio_blk_exit.

This is included here because the next patch removes proxy->block.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-21 15:40:50 -05:00
Paolo Bonzini
f34e73cd69 virtio-blk: report non-zero status when failing SG_IO requests
Linux really looks only at scsi->errors for SG_IO requests; it does
not look at the virtio request status at all.  Because of this, when
a SG_IO request is failed early with virtio_blk_req_complete(req,
VIRTIO_BLK_S_UNSUPP), without writing hdr.status, it will look like
a success to the guest.

This is their bug, but we can make it safe for older guests now by
forcing scsi->errors to have a non-zero value whenever a request
has to be failed.

But if we fix the bug in the guest driver, we will have another problem
because QEMU returns VIRTIO_BLK_S_IOERR if the status is non-zero, and
Linux translates that to -EIO.  Rather, the guest should succeed the
request and pass the non-zero status via the userspace-provided SG_IO
structure.  So, remove the case where virtio_blk_handle_scsi can
return VIRTIO_BLK_S_IOERR.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-21 15:40:50 -05:00
Mark Langsdorf
80a2ba3d3c use an uint64_t for the max_sz parameter in load_image_targphys
Allow load_image_targphys to load files on systems with more than 2G of
emulated memory by changing the max_sz parameter from an int to an
uint64_t.

Reviewed-by: Andreas F=E4rber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-21 15:40:50 -05:00