this patch tries to optimize zero write requests
by automatically using bdrv_write_zeroes if it is
supported by the format.
This significantly speeds up file system initialization and
should speed zero write test used to test backend storage
performance.
I ran the following 2 tests on my internal SSD with a
50G QCOW2 container and on an attached iSCSI storage.
a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX
QCOW2 [off] [on] [unmap]
-----
runtime: 14secs 1.1secs 1.1secs
filesize: 937M 18M 18M
iSCSI [off] [on] [unmap]
----
runtime: 9.3s 0.9s 0.9s
b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct
QCOW2 [off] [on] [unmap]
-----
runtime: 246secs 18secs 18secs
filesize: 51G 192K 192K
throughput: 203M/s 2.3G/s 2.3G/s
iSCSI* [off] [on] [unmap]
----
runtime: 8mins 45secs 33secs
throughput: 106M/s 1.2G/s 1.6G/s
allocated: 100% 100% 0%
* The storage was connected via an 1Gbit interface.
It seems to internally handle writing zeroes
via WRITESAME16 very fast.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* remotes/bonzini/scsi-next:
[PATCH] block/iscsi: bump year in copyright notice
block/iscsi: allow cluster_size of 4K and greater
block/iscsi: clarify the meaning of ISCSI_CHECKALLOC_THRES
block/iscsi: speed up read for unallocated sectors
block/iscsi: allow fall back to WRITE SAME without UNMAP
MAINTAINERS: mark megasas as maintained
megasas: Add MSI support
megasas: Enable MSI-X support
megasas: Implement LD_LIST_QUERY
scsi: Improve error messages more
scsi-disk: Improve error messager if can't get version number
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
this adds a generic function to recover the enum id of a parameter
given as a string.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Like qcow2 since commit 6d33e8e7, error out on invalid lengths instead
of silently truncating them to 1023.
Also don't rely on bdrv_pread() catching integer overflows that make len
negative, but use unsigned variables in the first place.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
A huge image size could cause s->l1_size to overflow. Make sure that
images never require a L1 table larger than what fits in s->l1_size.
This cannot only cause unbounded allocations, but also the allocation of
a too small L1 table, resulting in out-of-bounds array accesses (both
reads and writes).
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Too large L2 table sizes cause unbounded allocations. Images actually
created by qemu-img only have 512 byte or 4k L2 tables.
To keep things consistent with cluster sizes, allow ranges between 512
bytes and 64k (in fact, down to 1 entry = 8 bytes is technically
working, but L2 table sizes smaller than a cluster don't make a lot of
sense).
This also means that the number of bytes on the virtual disk that are
described by the same L2 table is limited to at most 8k * 64k or 2^29,
preventively avoiding any integer overflows.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Huge values for header.cluster_bits cause unbounded allocations (e.g.
for s->cluster_cache) and crash qemu this way. Less huge values may
survive those allocations, but can cause integer overflows later on.
The only cluster sizes that qemu can create are 4k (for standalone
images) and 512 (for images with backing files), so we can limit it
to 64k.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
We were relying on all compilers inserting the same padding in the
header struct that is used for the on-disk format. Let's not do that.
Mark the struct as packed and insert an explicit padding field for
compatibility.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
This allows qemu to use images over https with a self-signed certificate. It
defaults to verifying the certificate.
Signed-off-by: Matthew Booth <mbooth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The block layer now supports a generic json syntax for passing option parameters
explicitly, making parsing of options from the url redundant.
Signed-off-by: Matthew Booth <mbooth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The test test_stream_pause in this class uses vm.pause_drive, which
requires a blkdebug driver on top of image, otherwise it's no-op and the
test running is undeterministic.
So add it.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The shell script attempts to suppress core dumps like this:
old_ulimit=$(ulimit -c)
ulimit -c 0
$QEMU_IO arg...
ulimit -c "$old_ulimit"
This breaks the test hard unless the limit was zero to begin with!
ulimit sets both hard and soft limit by default, and (re-)raising the
hard limit requires privileges. Broken since it was added in commit
dc68afe.
Could be fixed by adding -S to set only the soft limit, but I'm not
sure how portable that is in practice. Simply do it in a subshell
instead, like this:
(ulimit -c 0; exec $QEMU_IO arg...)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a test for the JSON protocol driver.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If the filename given to bdrv_open() is prefixed with "json:", parse the
rest as a JSON object and merge the result into the options QDict. If
there are conflicts, the options QDict takes precedence.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add some test cases for qdict_join().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This function joins two QDicts by absorbing one into the other.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds a test for VHDX images created by Microsoft's tool, Disk2VHD.
VHDX images created by this tool have 2 identical header sections, with
identical sequence numbers. This makes sure we detect VHDX images with
identical headers, and do not flag them as corrupt.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The VHDX spec v1.00 declares that "a header is current if it is the only
valid header or if it is valid and its SequenceNumber field is greater
than the other header’s SequenceNumber field. The parser must only use
data from the current header. If there is no current header, then the
VHDX file is corrupt."
However, the Disk2VHD tool from Microsoft creates a VHDX image file that
has 2 identical headers, including matching checksums and matching
sequence numbers. Likely, as a shortcut the tool is just writing the
header twice, for the active and inactive headers, during the image
creation. Technically, this should be considered a corrupt VHDX file
(at least per the 1.00 spec, and that is how we currently treat it).
But in order to accomodate images created with Disk2VHD, we can safely
create an exception for this case. If we find identical sequence
numbers, then we check the VHDXHeader-sized chunks of each 64KB header
sections (we won't rely just on the crc32c to indicate the headers are
the same). If they are identical, then we go ahead and use the first
one.
Reported-by: Nerijus Baliūnas <nerijus@users.sourceforge.net>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
g_sequence_lookup is not supported by glib < 2.28. The usage
of g_sequence_lookup is not essential in this context (it's a
safeguard against duplicate values in the help message).
Removing the call enables the build on all platforms and
does not change the operation of the help function.
Signed-off-by: Mike Day <ncmike@ncultra.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_is_allocated() shouldn't return true for sectors that are
unallocated, but after the end of a short backing file, even though
such sectors are (correctly) marked as containing zeros.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Spotted by Coverity.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The purpose of this change is to help create a json file containing
common definitions; each bit of generated C code must be emitted
only one time.
A second history global to all QAPISchema instances has been added
to detect when a file is included more than one time and skip these
includes.
It does not act as a stack and the changes made to it by the
__init__ function are propagated back to the caller so it's really
a global state.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
With sparc keyboard going directly from QKeyValue to sparc keycodes
this should not be needed any more.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Nasty 0xe0 logic is gone. We map through QKeyCode now, giving us a
nice, readable mapping table.
Quick smoke test in OpenFirmware looks ok. Careful check from arch
maintainers would be very nice, especially on the capslock and numlock
logic. I'm not fully sure whenever I got it translated correctly and
also what it is supposed to do in the first place ...
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
According to the PS/2 Mouse/Keyboard Protocol, the keyboard outupt buffer size
is 16 bytes. And the PS2_QUEUE_SIZE 256 was introduced in Qemu from the very
beginning.
When I started a redhat5.6 32bit guest, meanwhile tapped the keyboard as quickly as
possible, the screen would show me "i8042.c: No controller found". As a result,
I couldn't use the keyboard in the VNC client.
Previous discussion about the issue in maillist:
http://thread.gmane.org/gmane.comp.emulators.qemu/43294/focus=47180
This patch has been tested on redhat5.6 32-bit/suse11sp3 64-bit guests.
More easy meathod to reproduce:
1.boot a guest with libvirt.
2.connect to VNC client.
3.as you see the BIOS, bootloader, Linux booting, run the follow simply shell script:
for((i=0;i<10000000;i++)) do virsh send-key redhat5.6 KEY_A; done
Actual results:
dmesg show "i8042.c: No controller found." And the keyboard is out of work.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Make it possible to query all net clients without specifying an ID when calling
qemu_find_net_clients_except().
This also adds the add_completion_option() function which is to be used for
other commands completions as well.
Signed-off-by: Hani Benhabiles <hani@linux.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
While there, pare down the shell prompts.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
We commonly use the error API like this:
err = NULL;
foo(..., &err);
if (err) {
goto out;
}
bar(..., &err);
Every error source is checked separately. The second function is only
called when the first one succeeds. Both functions are free to pass
their argument to error_set(). Because error_set() asserts no error
has been set, this effectively means they must not be called with an
error set.
The qapi-generated code uses the error API differently:
// *errp was initialized to NULL somewhere up the call chain
frob(..., errp);
gnat(..., errp);
Errors accumulate in *errp: first error wins, subsequent errors get
dropped. To make this work, the second function does nothing when
called with an error set. Requires non-null errp, or else the second
function can't see the first one fail.
This usage has also bled into visitor tests, and two device model
object property getters rtc_get_date() and balloon_stats_get_all().
With the "accumulate" technique, you need fewer error checks in
callers, and buy that with an error check in every callee. Can be
nice.
However, mixing the two techniques is confusing. You can't use the
"accumulate" technique with functions designed for the "check
separately" technique. You can use the "check separately" technique
with functions designed for the "accumulate" technique, but then
error_set() can't catch you setting an error more than once.
Standardize on the "check separately" technique for now, because it's
overwhelmingly prevalent.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
When visit_start_struct() fails, visit_end_struct() must not be
called. Three out of four visit_type_TestStruct() call it anyway. As
far as I can tell, visit_start_struct() doesn't actually fail there.
Fix them anyway.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
When visit_start_struct() fails, visit_end_struct() must not be
called. rtc_get_date() and balloon_stats_all() call it anyway. As
far as I can tell, they're only used with the string output visitor,
which doesn't care. Fix them anyway.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
When visit_start_struct() succeeds, visit_end_struct() must be called.
hmp_object_add() doesn't when a member visit fails. As far as I can
tell, the opts visitor copes okay with the misuse. Fix it anyway.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
In preparation of error handling changes. Bonus: generates less
duplicated code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
generate_visit_struct_fields() generates the base type's struct member
name both with and without the field prefix. Harmless, because the
field prefix is always empty there: only unboxed complex members have
a prefix, and those can't have a base type.
Clean it up anyway.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
By un-inlining the visit of nested complex types.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>