Submission of requests on linux aio is a bit tricky and can lead to
requests completions on submission path:
44713c9e85 ("linux-aio: Handle io_submit() failure gracefully")
0ed93d84ed ("linux-aio: process completions from ioq_submit()")
That means that any coroutine which has been yielded in order to wait
for completion can be resumed from submission path and be eventually
terminated (freed).
The following use-after-free crash was observed when IO throttling
was enabled:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f5813dff700 (LWP 56417)]
virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
(gdb) bt
#0 virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
^^^^^^^^^^^^^^
remember the address
#1 virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282
#2 virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308
#3 virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61
#4 virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126
#5 blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923
#6 coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78
(gdb) p * elem
$8 = {index = 77, out_num = 2, in_num = 1,
in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0,
in_sg = 0x0, out_sg = 0x7f5804009a50}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
'in_sg' and 'out_sg' are invalid.
e.g. it is impossible that 'in_sg' is zero,
instead its value must be equal to:
(gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0])
$26 = 0x7f5804009af0
Seems 'elem' was corrupted. Meanwhile another thread raised an abort:
Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)):
#0 raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113
#3 qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60
#4 qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119
^^^^^^^^^^^^^^^^^^
WTF?? this is equal to elem from crashed thread
#5 qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60
#6 qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119
#7 qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60
#8 qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119
#9 qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60
#10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119
#11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60
#12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119
#13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106
#14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419
Crash can be explained by access of 'co' object from the loop inside
qemu_co_queue_run_restart():
while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
^^^^^^^^^^^^^^^^^^^^
on each iteration 'co' is accessed,
but 'co' can be already freed
qemu_coroutine_enter(next);
}
When 'next' coroutine is resumed (entered) it can in its turn resume
'co', and eventually free it. That's why we see 'co' (which was freed)
has the same address as 'elem' from the first backtrace.
The fix is obvious: use temporary queue and do not touch coroutine after
first qemu_coroutine_enter() is invoked.
The issue is quite rare and happens every ~12 hours on very high IO
and CPU load (building linux kernel with -j512 inside guest) when IO
throttling is enabled. With the fix applied guest is running ~35 hours
and is still alive so far.
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170601160847.23720-1-roman.penyaev@profitbricks.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The scripts/qemu-gdb.py file is not easily discoverable. Add a .gdbinit
file so GDB either loads qemu-gdb.py automatically or prints a message
informing the user how to enable them (some systems disable ./.gdbinit
loading for security reasons).
Symlink .gdbinit and the scripts directory in order to make out-of-tree
builds work. The scripts directory is used to find the qemu-gdb.py file
specified by a relative path in .gdbinit.
Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Eric Blake <eblake@redhat.com>
Message-id: 20170517124042.1430-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Version: GnuPG v1
iQEcBAABAgAGBQJZN3MQAAoJEO8Ells5jWIRhfkH/1iV+DDT0caXqdxEbHktVpiY
ZuFxjKId63PhpyJXurmevJ3oiTYdUe5glX/GtN/0q0FRD+16kAD5SzxiIBqfWr3z
uR8PQ73rdf9ts8jZHth1ZKgP00MC9RDS3YikYyQSBW0+TkvlulrRlD0vJfogCEYj
EQO1OElrllXFmTqlHFHXR7mT4Cbcfw0xXARBJ+PfrWixhIuVuPwEKjZM4jsSoF62
BuK33cXrX8ovSg7cER7gjhet89TbZejXtEhqVQndub7byARIJi+Hvi6oE4fBslzK
1693WOvNz6T1Emv98XrdiBeLBfBDGEP+DLHz6Ih5ysuGdNYWvUVGNpgOzV+9vLc=
=OjxD
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Wed 07 Jun 2017 04:29:20 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
Revert "Change net/socket.c to use socket_*() functions" again
net/rocker: Cleanup the useless return value check
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add "Return and Deallocate" (rtd) instruction.
RTD #d
(SP) -> PC
SP + 4 + d -> SP
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Tested-By: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Message-Id: <20170605100014.22981-1-laurent@vivier.eu>
This reverts commit 883e4f7624.
This code changed net/socket.c from using socket()+connect(),
to using socket_connect(). In theory this is great, but in
practice this has completely broken the ability to connect
the frontend and backend:
$ ./x86_64-softmmu/qemu-system-x86_64 \
-device e1000,id=e0,netdev=hn0,mac=DE:AD:BE:EF:AF:05 \
-netdev socket,id=hn0,connect=localhost:1234
qemu-system-x86_64: -device e1000,id=e0,netdev=hn0,mac=DE:AD:BE:EF:AF:05: Property 'e1000.netdev' can't find value 'hn0'
The old code would call net_socket_fd_init() synchronously,
while letting the connect() complete in the backgorund. The
new code moved net_socket_fd_init() so that it is only called
after connect() completes in the background.
Thus at the time we initialize the NIC frontend, the backend
does not exist.
The socket_connect() conversion as done is a bad fit for the
current code, since it did not try to change the way it deals
with async connection completion. Rather than try to fix this,
just revert the socket_connect() conversion entirely.
The code is about to be converted to use QIOChannel which
will let the problem be solved in a cleaner manner. This
revert is more suitable for stable branches in the meantime.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
None of pci_dma_read()'s callers check the return value except
rocker. There is no need to check it because it always return
0. So the check work is useless. Remove it entirely.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
We have to make the address in the old PSW point at the next
instruction, as addressing exceptions are suppressing and not
nullifying.
I assume that there are a lot of other broken cases (as most instructions
we care about are suppressing) - all trigger_pgm_exception() specifying
and explicit number or ILEN_LATER look suspicious, however this is another
story that might require bigger changes (and I have to understand when
the address might already have been incremented first).
This is needed to make an upcoming kvm-unit-test work.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170529121228.2789-1-david@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The CDSG instruction requires a 16-byte alignement, as expressed in
the MO_ALIGN_16 passed to helper_atomic_cmpxchgo_be_mmu. In the non
parallel case, use check_alignment to enforce this.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170604202034.16615-4-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use a common helper with PACK ASCII as the differences are limited to
the stride of the source operand.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170531220129.27724-25-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For that we need to make program_interrupt available to qemu-user.
Fortunately there is almost nothing to change as both kvm_enabled and
CONFIG_KVM evaluate to false in that case.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170531220129.27724-22-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
As MVCL and MVCLE only differ by their operands, use a common
do_mvcl helper. Optimize it calling fast_memmove and fast_memset.
Correctly write back addresses. Check that r1 and r2/r3 registers
are even.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170531220129.27724-21-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
adj_len_to_page doesn't return the correct result when the address
is already page aligned and the length is bigger than a page. Fix that.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170531220129.27724-20-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
As CLCL and CLCLE mostly differ by their operands, use a common do_clcl
helper. Another difference is that CLCL is not interruptible.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170531220129.27724-19-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
There are multiple issues with the COMPARE LOGICAL LONG EXTENDED
instruction:
- The test between the two operands is inverted, leading to an inversion
of the cc values 1 and 2.
- The address and length of an operand continue to be decreased after
reaching the end of this operand. These values are then wrong write
back to the registers.
- We should limit the amount of bytes to process, so that interrupts can
be served correctly.
At the same time rename dest into src1 and src into src3 to match the
operand names and make the code less confusing.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170531220129.27724-18-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Improve fix_address to also handle the 24-bit mode. Rename fix_address
to wrap_address to better explain what is changed.
Replace the calls to get_address with x2 = 0 and b2 = 0 by
call to wrap_address, leading to the removal of this function. Rename
get_address_31fix into get_address.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170531220129.27724-15-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
These functions differ from COMPARE by generating an exception for a
QNaN input. Use the non quiet version of floatXX_compare.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170531220129.27724-10-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
And at the same time make IPTE SMP aware.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170531220129.27724-4-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Currently we only present the plain z900 feature bits to the guest,
but QEMU already emulates some additional features (but not all of
the next CPU generation, so we can not use the next CPU level as
default yet). Since newer Linux kernels are checking the feature bits
and refuse to work if a required feature is missing, it would be nice
to have a way to present more of the supported features when we are
running with the "qemu" CPU.
This patch now adds the supported features to the "full_feat" bitmap,
so that additional features can be enabled on the command line now,
for example with:
qemu-system-s390x -cpu qemu,stfle=true,ldisp=true,eimm=true,stckf=true
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1495704132-5675-1-git-send-email-thuth@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
While the previous patch is required for proper conformance,
the vast majority of target insns are MVC and XC for implementing
memmove and memset respectively. The next most common are CLC,
TR, and SVC.
Implementing these (and a few others for which we already have
an implementation) directly is faster than going through full
translation to a TB.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Previously, helper_ex would construct the insn and then implement
the insn via direct calls other helpers. This was sufficient to
boot Linux but that is all.
It is easy enough to go the whole nine yards by stashing state for
EXECUTE within the cpu, and then rely on a new TB to be created
that properly and completely interprets the insn.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This split will be required for implementing EXECUTE properly.
Do this now as a separate step to aid comparison of before and
after TB listings.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use this saved value instead of recomputing from next_pc difference.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Also provide the cross-cpu tlb flushing required by the PoO.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The PoO specifies that when R1==0, no ORing into the insn
loaded from storage takes place. Load a zero for this case.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
(1) The OR of the low bits or R1 into INSN were not being done
consistently; it was forgotten along all but the SVC path.
(2) The setting of ILEN was wrong on SVC path for EXRL.
(3) The data load for ICM read too much.
Fix these by consolidating data load at the beginning, using
get_ilen to control the number of bytes loaded, and ORing in
the byte from R1. Use extract64 from the full aligned insn
to extract arguments.
Pass in ILEN rather than RET as the more natural way to give
the required data along the SVC path.
Modify ENV->CC_OP directly rather than include it in the
functional interface.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>