Extract the handling of throttling from bdrv_flush_io_queue. These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Do not call bdrv_drain_recurse twice in bdrv_co_drain. A small
tweak to the logic in Fam's patch, which is harmless since no
one implements bdrv_drain anyway. But better get it right.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We want to remove throttled_reqs from block/io.c. This is the easy
part---hide the handling of throttled_reqs during disable/enable of
throttling within throttle-groups.c.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The return value is unused and I am not sure why it would be useful.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We had to disable I/O throttling with synchronous requests because we
didn't use to run timers in nested event loops when the code was
introduced. This isn't true any more, and throttling works just fine
even when using the synchronous API.
The removed code is in fact dead code since commit a8823a3b ('block: Use
blk_co_pwritev() for blk_write()') because I/O throttling can only be
set on the top layer, but BlockBackend always uses the coroutine
interface now instead of using the sync API emulation in block.c.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Call vbe_update_vgaregs() when the guest touches GFX, SEQ or CRT
registers, to make sure the vga registers will always have the
values needed by vbe mode. This makes sure the sanity checks
applied by vbe_fixup_regs() are effective.
Without this guests can muck with shift_control, can turn on planar
vga modes or text mode emulation while VBE is active, making qemu
take code paths meant for CGA compatibility, but with the very
large display widths and heigts settable using VBE registers.
Which is good for one or another buffer overflow. Not that
critical as they typically read overflows happening somewhere
in the display code. So guests can DoS by crashing qemu with a
segfault, but it is probably not possible to break out of the VM.
Fixes: CVE-2016-3712
Reported-by: Zuozhi Fzz <zuozhi.fzz@alibaba-inc.com>
Reported-by: P J P <ppandit@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Call the new vbe_update_vgaregs() function on vbe configuration
changes, to make sure vga registers are up-to-date.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When enabling vbe mode qemu will setup a bunch of vga registers to make
sure the vga emulation operates in correct mode for a linear
framebuffer. Move that code to a separate function so we can call it
from other places too.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
vga allows banked access to video memory using the window at 0xa00000
and it supports a different access modes with different address
calculations.
The VBE bochs extentions support banked access too, using the
VBE_DISPI_INDEX_BANK register. The code tries to take the different
address calculations into account and applies different limits to
VBE_DISPI_INDEX_BANK depending on the current access mode.
Which is probably effective in stopping misprogramming by accident.
But from a security point of view completely useless as an attacker
can easily change access modes after setting the bank register.
Drop the bogus check, add range checks to vga_mem_{readb,writeb}
instead.
Fixes: CVE-2016-3710
Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Fixes build failure with --enable-xfsctl and
new linux headers (>=4.5) and older xfsprogs(<4.5):
In file included from /usr/include/xfs/xfs.h:38:0,
from /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:97:
/usr/include/xfs/xfs_fs.h:42:8: error: redefinition of ‘struct fsxattr’
struct fsxattr {
^
In file included from /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:60:0:
/usr/include/linux/fs.h:155:8: note: originally defined here
struct fsxattr {
This is really a bug in the system headers, but we can work around it
by defining HAVE_FSXATTR in the QEMU headers if linux/fs.h provides
the struct, so that xfs_fs.h doesn't try to define it as well.
CC: qemu-trivial@nongnu.org
CC: Markus Armbruster <armbru@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Stefan Weil <sw@weilnetz.de>
Tested-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
[PMM: adjusted commit message, comments]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Minor, obvious fix only affecting BE hosts.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXJfnwAAoJECgfDbjSjVRpUqgH/jzhrMSe4r6JizzltOPoQ1ET
5kLv8tsnmJS6VbtEboz2e/4UJZgOKAAkqZ1LV6C9mzxiSmffJWe5gXwCmMrrxdcD
PFyAJwvnEY6chLF7B4/GxzdP8w6qxBhdSh6QGbGrSPodl5VkuceC4ks3E7Te2p+r
6SBSBFeVPusaGz/n0F7AktrPgjS9Yyr2fFVpydC52mamwQXIYrSVHmPR6Ch6GTNt
Z+NbaLq1cId+F8w3CDcDRox8we6UcTMnngg49UZ8XJbv9T19ea3+L1V3fyb7tJ4T
15M4lOxwNsgKvoa66VUNLAitbiar1bioKNP48i/kdG2GP8P30KKXyMazWAJYJ1o=
=Jsvf
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
acpi: last minute fix for 2.6
Minor, obvious fix only affecting BE hosts.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sun 01 May 2016 13:43:28 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
* remotes/mst/tags/for_upstream:
acpi: fix bios linker loadder COMMAND_ALLOCATE on bigendian host
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
'make check' fails with:
ERROR:tests/bios-tables-test.c:493:load_expected_aml:
assertion failed: (g_file_test(aml_file, G_FILE_TEST_EXISTS))
since commit:
caf50c7166
tests: pc: acpi: drop not needed 'expected SSDT' blobs
Assert happens because qemu-system-x86_64 generates
SSDT table and test looks for a corresponding expected
table to compare with.
However there is no expected SSDT blob anymore, since
QEMU souldn't generate one. As it happens BIOS is not
able to read ACPI tables from QEMU and fallbacks to
embeded legacy ACPI codepath, which generates SSDT.
That happens due to wrongly sized endiannes conversion
which makes
uint8_t BiosLinkerLoaderEntry.alloc.zone
end up with 0 due to truncation of 32 bit integer
which on host is 1 or 2.
Fix it by dropping invalid cpu_to_le32() as uint8_t
doesn't require any conversion.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1330174
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Commit d5941dd documented that it leaves the default volume name as it
was ("QEMU VVFAT"), but it doesn't actually implement this. You get an
empty name (eleven space characters) instead.
This fixes the implementation to apply the advertised default.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Commit d5941dd made the volume name configurable, but it didn't consider
that the rw code compares the volume name string to assert that the
first directory entry is the volume name. This made vvfat crash in rw
mode.
This fixes the assertion to compare with the configured volume name
instead of a literal string.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Make sure the error message for visit_type_uint64() gracefully
handles a NULL 'name' when called from the top level or a list
context, as not all the world behaves like glibc in allowing
NULL through a printf-family %s.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-21-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
On Darwin, connect, sendto and friends want the exact size of the sockaddr,
not more (and in particular, not sizeof(struct sockaddr_storaget))
This commit adds the sockaddr_size helper to be used when passing a sockaddr
size to such function, and makes use of it int sendto and connect calls.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This just has one, last-minute, fix for a serious regression of memory
hotplug.
Patch author's comment:
Really sorry for the way last-minute fix, but without this memory
hotplug is totally broken :( Hoping to get this in for Wednesday's
RC4, which I think will be the final before release.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXHtfkAAoJEGw4ysog2bOSt/AP/jfZJvt4zjGAZ4BqVpyISLhM
n/hPZNglmcvg+YmjOrH/5G7PeX+r6Bf2gyrBjL06sWDSff1FdiBySiJQec6RlEwo
hq+EBaUWMeWS7Fg/C3evNvg2WGvSDs8KjRKWA76pm8pu+110cxnrRUybIRbbHqaI
3vxR5Gdewmbvp5VYQgr5e/BSMac7wTT+Us6/Cjr2n5q56e2jjLryM8zMiyqMtP+S
7XdCqcowXv4dQsaqwLxS0Ata8f2ykrjtp9qADErbULZtEf8cAf8zvTR3ElXtnywq
pEgxbKJeYQwGScudWb9lzwGRRGTMpM2zLbIuBKjQE2E77yQK0+hsq3FkH6OdipgT
T8K5gU3p3AFuksZ5/+COVACC364Hry4OrYjTvGDXvsXXfWablrlETqoNtaj5A7Uz
cvjcNEpja88B+dDxT/bt8Ab8p5pTuo58uUUgT4gLIduzSJlgaJ2qzMvwF4cfIGcB
5wsKr36GY4jyw3UD0MbXAwDhwvKP9t1FUmhST4qLF0WvNjfM62jGgh1E2DaXKuwF
+ndVkoaus9yxYgMKyuuI/hXH/M4qk6yuUeNj+E1qM1823XpXQgFW1hnvnPZkbf0f
nqT1as59kltGhFrv7DVLd3ovYLSjK5XtLAq8NfhTjjEe8nQlzHhql9RRkXedPpLO
GVwp6DA608M2krlS5TxK
=zSL+
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160426' into staging
ppc patch queue for 2016-04-26 (last minute qemu-2.6 fix)
This just has one, last-minute, fix for a serious regression of memory
hotplug.
Patch author's comment:
Really sorry for the way last-minute fix, but without this memory
hotplug is totally broken :( Hoping to get this in for Wednesday's
RC4, which I think will be the final before release.
# gpg: Signature made Tue 26 Apr 2016 03:52:20 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.6-20160426:
spapr_drc: fix aborts during DRC-count based hotplug
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit b00c72180c ("target-mips: add PC, XNP reg numbers to RDHWR")
changed the rdhwr helpers to use check_hwrena() to check the register
being accessed is enabled in CP0_HWREna when used from user mode. If
that check fails an EXCP_RI exception is raised at the host PC
calculated with GETPC().
However check_hwrena() may not be fully inlined as the
do_raise_exception() part of it is common regardless of the arguments.
This causes GETPC() to calculate the address in the call in the helper
instead of the generated code calling the helper. No TB will be found
and the EPC reported with the resulting guest RI exception points to the
beginning of the TB instead of the RDHWR instruction.
We can't reliably force check_hwrena() to be inlined, and converting it
to a macro would be ugly, so instead pass the host PC in as an argument,
with each rdhwr helper passing GETPC(). This should avoid any dependence
on compiler behaviour, and in practice seems to ensure the full inlining
of check_hwrena() on x86_64.
This issue causes failures when running a MIPS KVM (trap & emulate)
guest in a MIPS QEMU TCG guest, as the inner guest kernel will do a
RDHWR of counter, which is disabled in the outer guest's CP0_HWREna by
KVM so it can emulate the inner guest's counter. The emulation fails and
the RI exception is passed to the inner guest.
Fixes: b00c72180c ("target-mips: add PC, XNP reg numbers to RDHWR")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Yongbok Kim <yongbok.kim@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
qemu_opts_foreach() runs its callback with the error location set to
the option's location. Any errors the callback reports use the
option's location automatically.
Commit 90998d5 moved the actual error reporting from "inside"
qemu_opts_foreach() to after it. Here's a typical hunk:
if (qemu_opts_foreach(qemu_find_opts("object"),
- object_create,
- object_create_initial, NULL)) {
+ user_creatable_add_opts_foreach,
+ object_create_initial, &err)) {
+ error_report_err(err);
exit(1);
}
Before, object_create() reports from within qemu_opts_foreach(), using
the option's location. Afterwards, we do it after
qemu_opts_foreach(), using whatever location happens to be current
there. Commonly a "none" location.
This is because Error objects don't have location information.
Problematic.
Reproducer:
$ qemu-system-x86_64 -nodefaults -display none -object secret,id=foo,foo=bar
qemu-system-x86_64: Property '.foo' not found
Note no location. This commit restores it:
qemu-system-x86_64: -object secret,id=foo,foo=bar: Property '.foo' not found
Note that the qemu_opts_foreach() bug just fixed could mask the bug
here: if the location it leaves dangling hasn't been clobbered, yet,
it's the correct one.
Reported-by: Eric Blake <eblake@redhat.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1461767349-15329-4-git-send-email-armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Paragraph on Error added to commit message]
replay_configure() pushes and pops a Location with automatic storage
duration. Except it fails to pop when -icount parameter "rr" isn't
given. cur_loc then points to unused stack space, and will most
likely get clobbered in short order.
Clobbered cur_loc can make loc_pop() and error_print_loc() crash or
report bogus locations.
Broken in commit 890ad55.
I didn't take the time to find a reproducer.
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1461767349-15329-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
qemu_opts_foreach() pushes and pops a Location with automatic storage
duration. Except it fails to pop when @func() returns non-zero.
cur_loc then points to unused stack space, and will most likely get
clobbered in short order.
Clobbered cur_loc can make loc_pop() and error_print_loc() crash or
report bogus locations.
Affects several qemu command line options as well as qemu-img,
qemu-io, qemu-nbd -object, and blkdebug's configuration file.
Broken in commit a4c7367, v2.4.0.
Reproducer:
$ qemu-system-x86_64 -nodefaults -display none -object secret,id=foo,foo=bar
main() reports "Property '.foo' not found" like this:
if (qemu_opts_foreach(qemu_find_opts("object"),
user_creatable_add_opts_foreach,
object_create_delayed, &err)) {
error_report_err(err);
exit(1);
}
cur_loc then points to where qemu_opts_foreach()'s Location used to
be, i.e. unused stack space. With optimization, this Location doesn't
get clobbered for me, and also happens to be the correct location.
Without optimization, it does get clobbered in a way that makes
error_report_err() report no location.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1461767349-15329-2-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
CPU/memory resources can be signalled en-masse via
spapr_hotplug_req_add_by_count(), and when doing so, actually change
the meaning of the 'drc' parameter passed to
spapr_hotplug_req_event() to be a count rather than an index.
f40eb92 added a hook in spapr_hotplug_req_event() to record when a
device had been 'signalled' to the guest, but that code assumes that
drc is always an index. In cases where it's a count, such as memory
hotplug, the DRC lookup will fail, leading to an assert.
Fix this by only explicitly setting the signalled state for cases where
we are doing PCI hotplug.
For other resources types, since we cannot selectively track whether a
resource has been signalled in cases where we signal attach as a count,
set the 'signalled' state to true immediately upon making the
resource available via drck->attach().
Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: david@gibson.dropbear.id.au
Cc: qemu-ppc@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
commit "5f77e06 usb: add pid check at the first of uhci_handle_td()"
moved the pid verification to the start of the uhci_handle_td function,
to simplify the error handling (we don't have to free stuff which we
didn't allocate in the first place ...).
Problem is now the check fires too often, it raises error IRQs even for
TDs which we are not going to process because they are not set active.
So, lets move down the check a bit, so it is done only for active TDs,
but still before we are going to allocate stuff to process the requested
transfer.
Reported-by: Joe Clifford <joe@thunderbug.co.uk>
Tested-by: Joe Clifford <joe@thunderbug.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1461321893-15811-1-git-send-email-kraxel@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
A single fix for a bug in parameter handling for the spapr PCI host
bridge.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXGxxhAAoJEGw4ysog2bOSw3gQAK5hAFuvnqE0Liu40kHISq2B
jraL4o/pRFekNN/2EP0miaSk5/s21e/fP1N+MR4CFQKC+mF5n9HdqjHE4hUVWlbg
wI1n1xElvd9oafxTjeSbtxX/2t7tZ+60nu1GRcfaQwZcyK5bN4bTDOOWMu9MdOD9
Z1iw6aBpouIDS2OTCzvsUfyGl34flRgs1HPShBEqOkBUK0szqTj+16sngBkyWSiU
9Rlz7dqu6ziMYIju1EYr6gGbP8o0taV+h4B0H3sPKMQ1axqAWIs2quXza9dOH+zn
fD8pcv6ubfmIeoTnaO8HWV/a3ai4LLLtEOnlKuy1ec5CAz2o01dVAvKTOQ1ngvD0
HZ/s1j8o0UpsJ2KnpctXZQjG1udB7NwV8CouuRVv/JNvt6b7V+TDQaP0dwW6X/ZA
QB3VAgu0BFq67YAnKIMdxWMljeCTrmXbYGgUnKNFzLxFO20fJpaCmwACCVmZi856
nA+LmFcO2NTH8o2On6G0jffnvcwO5Vj3zAhhslXjatcDcfctxui4+D38j+pbPxLb
UQjJXCd1m2RKZ9IQ/rukXwmBO9Eis+8ClNZcGeDIRV2zBKrH4c6Xmd80xXeqIH9f
erj7NX6aMbnqYRFSc9lCnRD6PkIbXqhDDjkzFboK896QLrQsnsmeEs3D6ujShRsN
2Au36PbuVfbFFbnQDYc0
=MxYq
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160423' into staging
ppc patch queue for 2016-03-23
A single fix for a bug in parameter handling for the spapr PCI host
bridge.
# gpg: Signature made Sat 23 Apr 2016 07:55:29 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.6-20160423:
hw/ppc/spapr: Fix crash when specifying bad parameters to spapr-pci-host-bridge
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QEMU currently crashes when using bad parameters for the
spapr-pci-host-bridge device:
$ qemu-system-ppc64 -device spapr-pci-host-bridge,buid=0x123,liobn=0x321,mem_win_addr=0x1,io_win_addr=0x10
Segmentation fault
The problem is that spapr_tce_find_by_liobn() might return NULL, but
the code in spapr_populate_pci_dt() does not check for this condition
and then tries to dereference this NULL pointer.
Apart from that, the return value of spapr_populate_pci_dt() also
has to be checked for all PCI buses, not only for the last one, to
make sure we catch all errors.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 5a7e7a0ba moved mirror_exit to a BH handler but didn't add any
protection against new requests that could sneak in just before the
BH is dispatched. For example (assuming a code base at that commit):
main_loop_wait # 1
os_host_main_loop_wait
g_main_context_dispatch
aio_ctx_dispatch
aio_dispatch
...
mirror_run
bdrv_drain
(a) block_job_defer_to_main_loop
qemu_iohandler_poll
virtio_queue_host_notifier_read
...
virtio_submit_multiwrite
(b) blk_aio_multiwrite
main_loop_wait # 2
<snip>
aio_dispatch
aio_bh_poll
(c) mirror_exit
At (a) we know the BDS has no pending request. However, the same
main_loop_wait call is going to dispatch iohandlers (EventNotifier
events), which may lead to a new I/O from guest. So the invariant is
already broken at (c). Data loss.
Commit f3926945c8 made iohandler to use aio API. The order of
virtio_queue_host_notifier_read and block_job_defer_to_main_loop within
a main_loop_wait becomes unpredictable, and even worse, if the host
notifier event arrives at the next main_loop_wait call, the
unpredictable order between mirror_exit and
virtio_queue_host_notifier_read is also a trouble. As shown below, this
commit made the bug easier to trigger:
- Bug case 1:
main_loop_wait # 1
os_host_main_loop_wait
g_main_context_dispatch
aio_ctx_dispatch (qemu_aio_context)
...
mirror_run
bdrv_drain
(a) block_job_defer_to_main_loop
aio_ctx_dispatch (iohandler_ctx)
virtio_queue_host_notifier_read
...
virtio_submit_multiwrite
(b) blk_aio_multiwrite
main_loop_wait # 2
...
aio_dispatch
aio_bh_poll
(c) mirror_exit
- Bug case 2:
main_loop_wait # 1
os_host_main_loop_wait
g_main_context_dispatch
aio_ctx_dispatch (qemu_aio_context)
...
mirror_run
bdrv_drain
(a) block_job_defer_to_main_loop
main_loop_wait # 2
...
aio_ctx_dispatch (iohandler_ctx)
virtio_queue_host_notifier_read
...
virtio_submit_multiwrite
(b) blk_aio_multiwrite
aio_dispatch
aio_bh_poll
(c) mirror_exit
In both cases, (b) breaks the invariant wanted by (a) and (c).
Until then, the request loss has been silent. Later, 3f09bfbc7b added
asserts at (c) to check the invariant (in
bdrv_replace_in_backing_chain), and Max reported an assertion failure
first visible there, by doing active committing while the guest is
running bonnie++.
2.5 added bdrv_drained_begin at (a) to protect the dataplane case from
similar problems, but we never realize the main loop bug until now.
As a bandage, this patch disables iohandler's external events
temporarily together with bs->ctx.
Launchpad Bug: 1570134
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
aio_poll doesn't poll the external nodes so this should never be true,
but aio_ctx_dispatch may get notified by the events from GSource. To
make bdrv_drained_begin effective in main loop, we should check the
is_external flag here too.
Also do the check in aio_pending so aio_dispatch is not called
superfluously, when there is no events other than external ones.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The effect of this change is the block layer drained section can work,
for example when mirror job is being completed.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All callers pass "false" keeping the old semantics. The windows
implementation doesn't distinguish the flag yet. On posix, it is passed
down to the underlying aio context.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For KVM to use Transparent Huge Pages (THP) we have to ensure that the
alignment of the userspace address of the KVM memory slot and the IPA
that the guest sees for a memory region have the same offset from the 2M
huge page size boundary.
One way to achieve this is to always align the IPA region at a 2M
boundary and ensure that the mmap alignment is also at 2M.
Unfortunately, we were only doing this for __arm__, not for __aarch64__,
so add this simple condition.
This fixes a performance regression using KVM/ARM on AArch64 platforms
that showed a performance penalty of more than 50%, introduced by the
following commit:
9fac18f (oslib: allocate PROT_NONE pages on top of RAM, 2015-09-10)
We were only lucky before the above commit, because we were allocating
large regions and naturally getting a 2M alignment on those allocations
then.
Cc: qemu-stable@nongnu.org
Reported-by: Shih-Wei Li <shihwei@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: wrapped long line]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The NBD protocol does not (yet) force any alignment constraints
on clients. Even though qemu NBD clients always send requests
that are aligned to 512 bytes, we must be prepared for non-qemu
clients that don't care about alignment (even if it means they
are less efficient). Our use of blk_read() and blk_write() was
silently operating on the wrong file offsets when the client
made an unaligned request, corrupting the client's data (but
as the client already has control over the file we are serving,
I don't think it is a security hole, per se, just a data
corruption bug).
Note that in the case of NBD_CMD_READ, an unaligned length could
cause us to return up to 511 bytes of uninitialized trailing
garbage from blk_try_blockalign() - hopefully nothing sensitive
from the heap's prior usage is ever leaked in that manner.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1461249750-31928-1-git-send-email-eblake@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Check for CONFIG_DEBUG_TCG instead of NDEBUG, drop now useless code.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 1461228530-14852-2-git-send-email-aurelien@aurel32.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The TCG code is quite performance sensitive, but at the same time can
also be quite tricky. That is why asserts that can be enabled with the
--enable-debug-tcg configure option.
This used to work the following way:
| #include "config.h"
|
| ...
|
| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG)
| /* define it to suppress various consistency checks (faster) */
| #define NDEBUG
| #endif
|
| ...
|
| #include <assert.h>
Since commit 757e725b (tcg: Clean up includes) "config.h" as been
replaced by "qemu/osdep.h" which itself includes <assert.h>. As a
consequence the assertions are always enabled, even when using
--disable-debug-tcg, causing a performance regression, especially on
targets with many registers. For instance on qemu-system-ppc the
speed difference is about 15%.
tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already
uses in some places. This patch replaces all the calls to assert into
calss to tcg_debug_assert.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 1461228530-14852-1-git-send-email-aurelien@aurel32.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The 32-bit ARM Linux kernel booting ABI requires that r0 is 0
when calling the kernel image. A bug in commit 10b8ec73e6
meant that for boards which use the write_board_setup hook (which
means "highbank", "midway", "raspi2" and "xilinx-zynq-a9") we
were incorrectly skipping the "clear r0" instruction in the
mini-bootloader. Use the right offset in the "add lr, pc, #n"
instruction so that we return from the board-setup code to the
correct place.
Signed-off-by: Sylvain Garrigues <sylvain@sylvaingarrigues.com>
[PMM: Expanded commit message]
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When using K: in MAINTAINERS, false positives makes
get_maintainer.pl not use git history to find contributors. As
those patterns cause lots of false positives they are causing
more harm than good, so remove them.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-id: 1461164130-3847-1-git-send-email-ehabkost@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>