When a new cluster was allocated, we only need a flush after the write to the
L2 table if it was a COW and we need to decrease the refcounts of the old
clusters.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BDRV_O_CACHE_MASK should have been extended when cache=unsafe introduced a new
flag BDRV_O_NO_FLUSH. There are currently no users that would change their
behaviour because of this, but let's clean it up before things break.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If qemu-img crashes during the conversion, the user will throw away the broken
output file anyway and start over. So no need to be too cautious.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow symbolic links which point to /dev/sgX devices.
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I use a legacy OS which depends on some optional SCSI commands.
In fact this implementation does nothing special, but provides minimum
support for the following commands:
REZERO UNIT
WRITE AND VERIFY(10)
WRITE AND VERIFY(12)
WRITE AND VERIFY(16)
MODE SELECT(6)
MODE SELECT(10)
SEEK(6)
SEEK(10)
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fill in word 64 of IDENTIFY data to indicate support for PIO modes 3 and 4.
This allows NetBSD guests to use UltraDMA modes instead of just PIO mode 0.
Signed-off-by: Jonathan A. Kollasch <jakllsch@kollasch.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
On Linux, we have code to detect CD-ROMs using an ioctl. We shouldn't lose
anything but false positives by removing the check for a /dev/cd* path.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is no indication whether or not the sector is allocated when
nb_sectors=1:
sector allocated at offset 64 KiB
This message is produced whether or not the sector is allocated.
Simply use the same message as the plural case, I don't think the
English is so broken that we need special case output here:
0/1 sectors allocated at offset 64 KiB
This change does not affect qemu-iotests since nb_sectors=1 is not used
there.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The DBD bit does not work as expected.
SCSI-Spec:
http://ldkelley.com/SCSI2/SCSI2/SCSI2-08.html#8.2.10
"A disable block descriptors (DBD) bit of zero indicates that the target
may return zero or more block descriptors in the returned MODE SENSE
data (see 8.3.3), at the target's discretion. A DBD bit of one
specifies that the target shall not return any block descriptors in the
returned MODE SENSE data."
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
SCSI-Spec:
http://ldkelley.com/SCSI2/SCSI2/SCSI2-08.html#8.2.10
"An initiator may request any one or all of the supported mode pages
from a target. If an initiator issues a MODE SENSE command with a
page code value not implemented by the target, the target shall return
CHECK CONDITION status and shall set the sense key to ILLEGAL REQUEST
and the additional sense code to INVALID FIELD IN CDB."
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The block descriptor contains the number of blocks, not the highest LBA.
Real hard disks return 0 if the number of blocks exceed the maximum 0xFFFFFF.
SCSI-Spec:
http://ldkelley.com/SCSI2/SCSI2/SCSI2-08.html#8.3.3
"The number of blocks field specifies the number of logical blocks on the
medium to which the density code and block length fields apply. A value
of zero indicates that all of the remaining logical blocks of the logical
unit shall have the medium characteristics specified."
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The page control (PC) field defines the type of mode parameter values
to be returned in the mode pages:
PC=0 : Current values
PC=1 : Changeable values
PC=2 : Default values
PC=3 : Saved values
The current implementation always returns the same type of parameters.
This is OK for Current and Default values as we don't support changes
to be done by the MODE SELECT command.
For Saved values the following applies (implemented by this patch):
"A PC field value of 3h requests that the target return the saved
values of the mode parameters. Implementation of saved page parameters
is optional. Mode parameters not supported by the target shall be set
to zero. If saved values are not implemented, the command shall be
terminated with CHECK CONDITION status, the sense key set to
ILLEGAL REQUEST and the additional sense code set to
SAVING PARAMETERS NOT SUPPORTED."
For Changeable values the following applies (implemented by this patch):
"A PC field value of 1h requests that the target return a mask denoting
those mode parameters that are changeable. In the mask, the fields of
the mode parameters that are changeable shall be set to all one bits and
the fields of the mode parameters that are non-changeable (i.e. defined
by the target) shall be set to all zero bits."
In newer versions of the SCSI-2 spec the following clause was added.
"If the logical unit does not implement changeable parameters mode pages
and the device server receives a MODE SENSE command with 01b in the PC
field, then the command shall be terminated with CHECK CONDITION status,
with the sense key set to ILLEGAL REQUEST, and the additional sense code
set to INVALID FIELD IN CDB."
This was not yet included in the SCSI-2 Working Drafts from 1986-1993.
I assume that the variant to return CHECK CONDITION for PC=1 is not
widely implemented by real devices. I have a legacy OS which fails,
if MODE_SENSE returns non GOOD for PC=1. So for highest compatibility I
implemented the former variant with this patch.
The last Working Draft X3T9.2 Rev. 10L 7-SEP-93 can be found here:
http://ldkelley.com/SCSI2/SCSI2/SCSI2-08.html#8.2.10
In mode_sense_page() this patch also avoids multiple hard coded
definitions of the same mode page length. Instead I use the varable
p[1]. In fact the returned length of the mode pages 4 and 5 were wrong
(2 bytes less).
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The header for the MODE SENSE(10) command is 8 bytes long.
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The MODE DATA LENGTH field indicates the length in bytes of the following
data that is available to be transferred. The mode data length does not include
the number of bytes in the MODE DATA LENGTH field.
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Based on a patch from Mark McLoughlin, this patch introduces a new
bottom half packet transmitter that avoids the latency imposed by
the tx_timer approach. Rather than scheduling a timer when a TX
packet comes in, schedule a bottom half to be run from the iothread.
The bottom half handler first attempts to flush the queue with
notification disabled (this is where we could race with a guest
without txburst). If we flush a full burst, reschedule immediately.
If we send short of a full burst, try to re-enable notification.
To avoid a race with TXs that may have occurred, we must then
flush again. If we find some packets to send, the guest it probably
active, so we can reschedule again.
tx_timer and tx_bh are mutually exclusive, so we can re-use the
tx_waiting flag to indicate one or the other needs to be setup.
This allows us to seamlessly migrate between timer and bh TX
handling.
The bottom half handler becomes the new default and we add a new
tx= option to virtio-net-pci. Usage:
-device virtio-net-pci,tx=timer # select timer mitigation vs "bh"
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
De-couple this from the timer since we might want to use
different backends to send the packet.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If virtio_net_flush_tx() is called with notification disabled, we can
race with the guest, processing packets at the same rate as they
get produced. The trouble is that this means we have no guaranteed
exit condition from the function and can spend minutes in there.
Currently flush_tx is only called with notification on, which seems
to limit us to one pass through the queue per call. An upcoming
patch changes this.
Also add an option to set this value on the command line as different
workloads may wish to use different values. We can't necessarily
support any random value, so this is a developer option: x-txburst=
Usage:
-device virtio-net-pci,x-txburst=64 # 64 packets per tx flush
One pass through the queue (256) seems to be a good default value
for this, balancing latency with throughput. We use a signed int
for x-txburst because 2^31 packets in a burst would take many, many
minutes to process and it allows us to easily return a negative
value value from virtio_net_flush_tx() to indicate a back-off
or error condition.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add an option to make the TX mitigation timer adjustable as a device
option. The 150us hard coded default used currently is reasonable,
but may not be suitable for all workloads, this gives us a way to
adjust it using a single binary. We can't support any random option
though, so use the "x-" prefix to indicate this is a developer
option. Usage:
-device virtio-net-pci,x-txtimer=500000,... # .5ms timeout
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Make host vnet header length a structure field in
preparation for using this support in linux kernel.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Since nobody else seems interested in maintaining PPC, let's change the
maintainer to myself. I keep a staging tree anyways and am probably the
person touching most of that code these days.
This changes the maintainer entry for working ppc targets to myself.
Signed-off-by: Alexander Graf <agraf@suse.de>
Patch b0b900070c made
TOR valuer incorrect: the spec says it should always
include the CRC field.
No one seems to use this field, but better to stick to spec.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The config data field on the e500 pci controller is in little endian, so we need
to enable byte swap there.
Signed-off-by: Alexander Graf <agraf@suse.de>
The e500 PCI controller isn't qdev'ified yet. This leads to severe issues
when running with -drive.
To be able to use a virtio disk with an e500 VM, let's convert the PCI
controller over to qdev.
Signed-off-by: Alexander Graf <agraf@suse.de>
KVM on PowerPC used to have completely broken interrupt logic. Usually,
interrupts work by having a PIC that pulls a line up/down, so the CPU knows
that an interrupt is active. This line stays active until some action is
done to the PIC to release the line.
On KVM for PPC, we just checked if there was an interrupt pending and pulled
a line in the kernel module. We never released it though, hoping that kernel
space would just declare an interrupt as released when injected - which is
wrong.
To fix this, we need to completely redesign the interrupt injection logic.
Whenever an interrupt line gets triggered, we need to notify kernel space
that the line is up. Whenever it gets released, we do the same. This way
we can assure that the interrupt state is always known to kernel space.
This fixes random stalls in KVM guests on PowerPC that were waiting for
an interrupt while everyone else thought they received it already.
Signed-off-by: Alexander Graf <agraf@suse.de>
bswap_NN() variants are not always available in CONFIG_MACHINE_BSWAP_H case
and bswapNN() are public APIs in "bswap.h".
Signed-off-by: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
stat() fields can be more or less anything depending on configuration, cast
explicitly to uint64_t to avoid printf() format mismatches.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
When making copy of arguments we were doing partial copy
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
There is no need to check for dest < 0 or vector >= 0 as both are
uint16_t.
This should fix problems with broken build with aggressive compiler
flags. Reported by Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Acked-by: Cam Macdonell <cam@cs.ualberta.ca>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Do not store return of get_image_size() in a uint32_t as it makes it
impossible to detect error returns from get_image_size.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Fix a warning from OpenBSD linker:
../libhw32/vl.o(.text+0x5c3c): In function `main':
/src/qemu/vl.c:2335: warning: sprintf() is often misused, please use snprintf()
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
acpi table file can be modified during load so file size check
should be more strict.
pointer calculation should be after qemu_realloc(). not before realloc().
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
When savevm is run without a name, the name stays blank and the snapshot is
saved anyway.
The new behavior is when savevm is run without parameters a name will be
created automaticaly, so the snapshot is accessible to the user without needing
the id when loadvm is run.
(qemu) savevm
(qemu) info snapshots
ID TAG VM SIZE DATE VM CLOCK
1 vm-20100728134640 978K 2010-07-28 13:46:40 00:00:08.603
We use a name with the format 'vm-YYYYMMDDHHMMSS'.
This is a first step to hide the internal id, because I don't see a reason to
expose this kind of internals to the user.
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The output generated by 'info snapshots' shows only snapshots that exist on the
block device that saves the VM state. This output can cause an user to
erroneously try to load an snapshot that is not available on all block devices.
$ qemu-img snapshot -l xxtest.qcow2
Snapshot list:
ID TAG VM SIZE DATE VM CLOCK
1 1.5M 2010-07-26 16:51:52 00:00:08.599
2 1.5M 2010-07-26 16:51:53 00:00:09.719
3 1.5M 2010-07-26 17:26:49 00:00:13.245
4 1.5M 2010-07-26 19:01:00 00:00:46.763
$ qemu-img snapshot -l xxtest2.qcow2
Snapshot list:
ID TAG VM SIZE DATE VM CLOCK
3 0 2010-07-26 17:26:49 00:00:13.245
4 0 2010-07-26 19:01:00 00:00:46.763
Current output:
$ qemu -hda xxtest.qcow2 -hdb xxtest2.qcow2 -monitor stdio -vnc :0
QEMU 0.12.4 monitor - type 'help' for more information
(qemu) info snapshots
Snapshot devices: ide0-hd0
Snapshot list (from ide0-hd0):
ID TAG VM SIZE DATE VM CLOCK
1 1.5M 2010-07-26 16:51:52 00:00:08.599
2 1.5M 2010-07-26 16:51:53 00:00:09.719
3 1.5M 2010-07-26 17:26:49 00:00:13.245
4 1.5M 2010-07-26 19:01:00 00:00:46.763
Snapshots 1 and 2 do not exist on xxtest2.qcow, but they are displayed anyway.
This patch sumarizes the output to only show fully available snapshots.
New output:
(qemu) info snapshots
ID TAG VM SIZE DATE VM CLOCK
3 1.5M 2010-07-26 17:26:49 00:00:13.245
4 1.5M 2010-07-26 19:01:00 00:00:46.763
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Set the async_context_id field when queuing an async ioctl call
Signed-off-by: Andrew de Quincey <adq@lidskialf.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch allows to connect Qemu using NBD protocol to an nbd-server
using named exports.
For instance, if on the host "isoserver", in /etc/nbd-server/config, you have:
[generic]
[debian-500-ppc-netinst]
exportname = /ISO/debian-500-powerpc-netinst.iso
[Fedora-10-ppc-netinst]
exportname = /ISO/Fedora-10-ppc-netinst.iso
You can connect to it, using:
qemu -cdrom nbd:isoserver:exportname=debian-500-ppc-netinst
qemu -cdrom nbd:isoserver:exportname=Fedora-10-ppc-netinst
NOTE: you need at least nbd-server 2.9.18
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We never write to a backing file, so opening rw is useless. It just means that
you can't rebase on top of a file for which you don't have write permissions.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
"qemu_socket.h" includes all necessary files and
including <netinet/tcp.h> without <netinet/in.h>
could cause errors on some systems.
Signed-off-by: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Arguably we should re-open the backing file with the backing file format and
not with the format of the snapshot image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
in_sg[].iovec and out_sg[].ioved are pointer to (source) host memory and
therefore invalid after migration. When loading the device state we must
create a new mapping on the destination host.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Separate the mapping of requests to host memory from the descriptor iteration.
The next patch will make use of it in a different context.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>