The NBD spec is clear that any string sent from the client must not
contain embedded NUL characters. If the client passes "a\0", we
should reject that option request rather than act on "a".
Testing this is not possible with a compliant client, but I was able
to use gdb to coerce libnbd into temporarily behaving as such a
client.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200930121105.667049-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Honoring just SIGTERM on Linux is too weak; we also want to handle
other common signals, and do so even on BSD. Why? Because at least
'qemu-nbd -B bitmap' needs a chance to clean up the in-use bit on
bitmaps when the server is shut down via a signal.
See also: http://bugzilla.redhat.com/1883608
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200930121105.667049-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: apply comment tweak suggested by Vladimir; fix ifdef around
termsig_handler]
Signed-off-by: Eric Blake <eblake@redhat.com>
In a recent commit 12c75e20a2 we've improved
nbd_co_reconnect_loop() to not make drain wait for additional sleep.
Similarly, we shouldn't try to connect, if previous sleep was
interrupted by drain begin, otherwise drain_begin will have to wait for
the whole connection attempt.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
reconnect-delay has a design flaw: we handle it in the same loop where
we do connection attempt. So, reconnect-delay may be exceeded by
unpredictable time of connection attempt.
Let's instead use separate timer.
How to reproduce the bug:
1. Create an image on node1:
qemu-img create -f qcow2 xx 100M
2. Start NBD server on node1:
qemu-nbd xx
3. On node2 start qemu-io:
./build/qemu-io --image-opts \
driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15
4. Type 'read 0 512' in qemu-io interface to check that connection
works
Be careful: you should make steps 5-7 in a short time, less than 15
seconds.
5. Kill nbd server on node1
6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd
client goes to reconnect loop.
7. On node1 run the following command
sudo iptables -A INPUT -p tcp --dport 10809 -j DROP
This will make the connect() call of qemu-io at node2 take a long time.
And you'll see that read command in qemu-io will hang for a long time,
more than 15 seconds specified by reconnect-delay parameter. It's the
bug.
8. Don't forget to drop iptables rule on node1:
sudo iptables -D INPUT -p tcp --dport 10809 -j DROP
Important note: Step [5] is necessary to reproduce _this_ bug. If we
miss step [5], the read command (step 6) will hang for a long time and
this commit doesn't help, because there will be not long connect() to
unreachable host, but long sendmsg() to unreachable host, which should
be fixed by enabling and adjusting keep-alive on the socket, which is a
thing for further patch set.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Don't use nbd_client_detach_aio_context() driver handler where we want
to finalize the connection. We should directly use
qio_channel_detach_aio_context() in such cases. Driver handler may (and
will) contain another things, unrelated to the qio channel.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We pause reconnect process during drained section. So, if we have some
requests, waiting for reconnect we should cancel them, otherwise they
deadlock the drained section.
How to reproduce:
1. Create an image:
qemu-img create -f qcow2 xx 100M
2. Start NBD server:
qemu-nbd xx
3. Start vm with second nbd disk on node2, like this:
./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
file=/work/images/cent7.qcow2 -drive \
driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
-vnc :0 -m 2G -enable-kvm -vga std
4. Access the vm through vnc (or some other way?), and check that NBD
drive works:
dd if=/dev/sdb of=/dev/null bs=1M count=10
- the command should succeed.
5. Now, kill the nbd server, and run dd in the guest again:
dd if=/dev/sdb of=/dev/null bs=1M count=10
Now Qemu is trying to reconnect, and dd-generated requests are waiting
for the connection (they will wait up to 60 seconds (see
reconnect-delay option above) and than fail). But suddenly, vm may
totally hang in the deadlock. You may need to increase reconnect-delay
period to catch the dead-lock.
VM doesn't respond because drain dead-lock happens in cpu thread with
global mutex taken. That's not good thing by itself and is not fixed
by this commit (true way is using iothreads). Still this commit fixes
drain dead-lock itself.
Note: probably, we can instead continue to reconnect during drained
section. To achieve this, we may move negotiation to the connect thread
to make it independent of bs aio context. But expanding drained section
doesn't seem good anyway. So, let's now fix the bug the simplest way.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
gcc 10 from Fedora 32 gives me:
Compiling C object libblock.fa.p/nbd_server.c.o
../nbd/server.c: In function ‘nbd_co_client_start’:
../nbd/server.c:625:14: error: ‘namelen’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
625 | rc = nbd_negotiate_send_info(client, NBD_INFO_NAME, namelen, name,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
626 | errp);
| ~~~~~
../nbd/server.c:564:14: note: ‘namelen’ was declared here
564 | uint32_t namelen;
| ^~~~~~~
cc1: all warnings being treated as errors
As I cannot see how this can happen, let uns silence the warning.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20200930155859.303148-3-borntraeger@de.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Here's the next set of ppc related patches for qemu-5.2. There are
two main things here:
* Cleanups to error handling in spapr from Greg Kurz
* Improvements to NUMA handling for spapr from Daniel Barboza
There are also a handful of other bugfixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl9//PUACgkQbDjKyiDZ
s5KdQQ/9EKl8GRuNw1CaoMRZFnD5YCDnr6Piy24HpcINHm8khvC4SWEaMm2ESOLU
J5e9rQn2vXlHLWDA0qQ8pTTEMqfgAOuYllGQXTnTKF3tjePEZzsYzdg49v8O3dsb
EHOAvixsHocH+8KMsiQkbV5BZYEEJukX6RoGGm6vte+MTXdRlpyxmp9Xf52tGmEB
pU/Q2Y9oLR6OW7POWv3kfpmCfxklkOXstguEMTP42+ZGP17PBvpKXAXfW13gCl8t
yGvvcjWr64m9uTyqTxYWK/jFxxYa8hraKPk4BY/001UCypd+T/DrD7E/xlBMZwPh
eDRX7fV+YPcRqv66x47Gu40afEVm3mlQXzr0QaK5qm772f+v6C/xyLUznLNxYdLy
s9lKSi7wSxjBS8M8jztRoCJEx+zVe6BclJbwdzGQMYODiY13HKVENFUzPxrC9bfN
IxYAU3uAN3VL/agslEYV+aBrX0qj96c1Ek6CcFG2XXdR3k9QnYvUcQuPKcfuCBSX
TVS2mYger8Ba4E47tapH++TKj5jHoVKgTciSN663+gUCGzNTw+5UEZBxEHTQaPOX
a5SKh5t06PEkxpBK4ITnQfeRwvkMg4ERjJoKPXWzcqvHUWK+xaI8XbBlqCDMiC3T
mBAVHMIrKEe6J9tTqeURyct3ItUioneueLWNSplBUN3BPkE+7AQ=
=dbvK
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.2-20201009' into staging
ppc patch queue 2020-10-09
Here's the next set of ppc related patches for qemu-5.2. There are
two main things here:
* Cleanups to error handling in spapr from Greg Kurz
* Improvements to NUMA handling for spapr from Daniel Barboza
There are also a handful of other bugfixes.
# gpg: Signature made Fri 09 Oct 2020 07:02:29 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-5.2-20201009:
specs/ppc-spapr-numa: update with new NUMA support
spapr_numa: consider user input when defining associativity
spapr_numa: change reference-points and maxdomain settings
spapr_numa: forbid asymmetrical NUMA setups
spapr: add spapr_machine_using_legacy_numa() helper
ppc/pnv: Increase max firmware size
spapr: Add a return value to spapr_check_pagesize()
spapr: Add a return value to spapr_nvdimm_validate()
spapr: Simplify error handling in spapr_cpu_core_realize()
spapr: Add a return value to spapr_set_vcpu_id()
spapr: Simplify error handling in prop_get_fdt()
spapr: Add a return value to spapr_drc_attach()
spapr: Simplify error handling in spapr_vio_busdev_realize()
spapr: Simplify error handling in do_client_architecture_support()
spapr: Get rid of cas_check_pvr() error reporting
spapr: Simplify error handling in callers of ppc_set_compat()
ppc: Fix return value in cpu_post_load() error path
ppc: Add a return value to ppc_set_compat() and ppc_set_compat_all()
spapr: Fix error leak in spapr_realize_vcpu()
spapr: Handle HPT allocation failure in nested guest
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
block_resize performs some I/O that could potentially take quite some
time, so use it as an example for the new 'coroutine': true annotation
in the QAPI schema.
bdrv_truncate() requires that we're already in the right AioContext for
the BlockDriverState if called in coroutine context. So instead of just
taking the AioContext lock, move the QMP handler coroutine to the
context.
Call blk_unref() only after switching back because blk_unref() may only
be called in the main thread.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-15-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Inside of coroutine context, we can't directly use aio_context_acquire()
for the AioContext of a block node because we already own the lock of
the current AioContext and we need to avoid double locking to prevent
deadlocks.
This provides helper functions to lock the AioContext of a node only if
it's not the same as the current AioContext.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-14-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add a pair of functions to temporarily move the current coroutine to the
AioContext of a given BlockDriverState.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-13-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add a function that can be used to move the currently running coroutine
to a different AioContext (and therefore potentially a different
thread).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201005155855.256490-12-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Often, QMP command handlers are not only called to handle QMP commands,
but also from a corresponding HMP command handler. In order to give them
a consistent environment, optionally run HMP command handlers in a
coroutine, too.
The implementation is a lot simpler than in QMP because for HMP, we
still block the VM while the coroutine is running.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201005155855.256490-11-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This moves the QMP dispatcher to a coroutine and runs all QMP command
handlers that declare 'coroutine': true in coroutine context so they
can avoid blocking the main loop while doing I/O or waiting for other
events.
For commands that are not declared safe to run in a coroutine, the
dispatcher drops out of coroutine context by calling the QMP command
handler from a bottom half.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201005155855.256490-10-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This patch adds a new 'coroutine' flag to QMP command definitions that
tells the QMP dispatcher that the command handler is safe to be run in a
coroutine.
The documentation of the new flag pretends that this flag is already
used as intended, which it isn't yet after this patch. We'll implement
this in another patch in this series.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-9-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This way, a monitor command handler will still be able to access the
current monitor, but when it yields, all other code code will correctly
get NULL from monitor_cur().
This uses a hash table to map the coroutine pointer to the current
monitor of that coroutine. Outside of coroutine context, we associate
the current monitor with the leader coroutine of the current thread.
Approaches to implement some form of coroutine local storage directly in
the coroutine core code have been considered and discarded because they
didn't end up being much more generic than the hash table and their
performance impact on coroutines not using coroutine local storage was
unclear. As the block layer uses a coroutine per I/O request, this is a
fast path and we have to be careful. It's safest to just stay out of
this path with code only used by the monitor.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201005155855.256490-8-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The correct way to set the current monitor for a coroutine handler will
be different than for a blocking handler, so monitor_set_cur() needs to
be called in qmp_dispatch().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-7-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
monitor_qmp_dispatch() is never supposed to be called in the context of
another monitor, so assert that monitor_cur() is NULL instead of saving
and restoring it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-6-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The current monitor is updated relatively early in the command handling
code even though only the command handler actually needs it.
The current monitor will become coroutine-local later, so we can only
update it when we know in which coroutine the command will be exectued.
Move it to handle_hmp_command() where this information will be
available.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201005155855.256490-5-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
cur_mon really needs to be coroutine-local as soon as we move monitor
command handlers to coroutines and let them yield. As a first step, just
remove all direct accesses to cur_mon so that we can implement this in
the getter function later.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-4-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Most callers actually don't have to rely on cur_mon, but already know
for which monitor they call monitor_get_cpu_index().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-3-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Most callers actually don't have to rely on cur_mon, but already know
for which monitor they call monitor_set_cpu().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201005155855.256490-2-kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This update provides more in depth information about the
choices and drawbacks of the new NUMA support for the
spapr machine.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A new function called spapr_numa_define_associativity_domains()
is created to calculate the associativity domains and change
the associativity arrays considering user input. This is how
the associativity domain between two NUMA nodes A and B is
calculated:
- get the distance D between them
- get the correspondent NUMA level 'n_level' for D. This is done
via a helper called spapr_numa_get_numa_level()
- all associativity arrays were initialized with their own
numa_ids, and we're calculating the distance in node_id ascending
order, starting from node id 0 (the first node retrieved by
numa_state). This will have a cascade effect in the algorithm because
the associativity domains that node 0 defines will be carried over to
other nodes, and node 1 associativities will be carried over after
taking node 0 associativities into account, and so on. This
happens because we'll assign assoc_src as the associativity domain
of dst as well, for all NUMA levels beyond and including n_level.
The PPC kernel expects the associativity domains of the first node
(node id 0) to be always 0 [1], and this algorithm will grant that
by default.
Ultimately, all of this results in a best effort approximation for
the actual NUMA distances the user input in the command line. Given
the nature of how PAPR itself interprets NUMA distances versus the
expectations risen by how ACPI SLIT works, there might be better
algorithms but, in the end, it'll also result in another way to
approximate what the user really wanted.
To keep this commit message no longer than it already is, the next
patch will update the existing documentation in ppc-spapr-numa.rst
with more in depth details and design considerations/drawbacks.
[1] https://lore.kernel.org/linuxppc-dev/5e8fbea3-8faf-0951-172a-b41a2138fbcf@gmail.com/
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is the first guest visible change introduced in
spapr_numa.c. The previous settings of both reference-points
and maxdomains were too restrictive, but enough for the
existing associativity we're setting in the resources.
We'll change that in the following patches, populating the
associativity arrays based on user input. For those changes
to be effective, reference-points and maxdomains must be
more flexible. After this patch, we'll have 4 distinct
levels of NUMA (0x4, 0x3, 0x2, 0x1) and maxdomains will
allow for any type of configuration the user intends to
do - under the scope and limitations of PAPR itself, of
course.
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pSeries machine does not support asymmetrical NUMA
configurations. This doesn't make much of a different
since we're not using user input for pSeries NUMA setup,
but this will change in the next patches.
To avoid breaking existing setups, gate this change by
checking for legacy NUMA support.
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The changes to come to NUMA support are all guest visible. In
theory we could just create a new 5_1 class option flag to
avoid the changes to cascade to 5.1 and under. The reality is that
these changes are only relevant if the machine has more than one
NUMA node. There is no need to change guest behavior that has
been around for years needlesly.
This new helper will be used by the next patches to determine
whether we should retain the (soon to be) legacy NUMA behavior
in the pSeries machine. The new behavior will only be exposed
if:
- machine is pseries-5.2 and newer;
- more than one NUMA node is declared in NUMA state.
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Builds enabling GCOV can be bigger than 4MB and the limit on FSP
systems is 16MB.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20201002091440.1349326-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-14-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-13-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", add a bool return value to
spapr_realize_vcpu() and use it in spapr_cpu_core_realize()
in order to get rid of the error propagation overhead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-12-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-11-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use the return value of visit_check_struct() and visit_check_list()
for error checking instead of local_err. This allows to get rid of
the error propagation overhead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-10-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-9-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use the return value of spapr_irq_findone() and spapr_irq_claim()
to detect failures. This allows to reduce the error propagation
overhead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-8-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use the return value of ppc_set_compat_all() to check failures,
which is preferred over hijacking local_err.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-7-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The cas_check_pvr() function has two purposes:
- finding the "best" logical PVR, ie. the most recent one supported by
the guest for this CPU type
- checking if the guest supports the real PVR of this CPU type, which
is just an optional extra information to workaround the lack of
support for "compat" mode in PR KVM
This logic doesn't need error reporting, really. If we don't find a
suitable logical PVR, we return the special value 0 which is definitely
not a valid PVR. Let the caller decide on whether it should error out
or not.
This doesn't change the behavior.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-6-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that ppc_set_compat() indicates success/failure with a return
value, use it and reduce error propagation overhead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-5-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
VMState handlers are supposed to return negative errno values on failure.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-4-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As recommended in "qapi/error.h", indicate success / failure with a
return value. Since ppc_set_compat() is called from a VMState handler,
let's make it an int so that it propagates any negative errno returned
by kvmppc_set_compat(). Do the same for ppc_set_compat_all() for
consistency, even if it isn't called in a context where a negative errno
is required on failure.
This will allow to simplify error handling in the callers.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-3-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The nested KVM code does not yet support HPT guests. Calling the
KVM_CAP_PPC_ALLOC_HTAB ioctl currently leads to KVM setting the guest
as HPT and erroneously executing code in L1 that should only run in
hypervisor mode, leading to an exception in the L1 vcpu thread when it
enters the nested guest.
This can be reproduced with -machine max-cpu-compat=power8 in the L2
guest command line.
The KVM code has since been modified to fail the ioctl when running in
a nested environment so QEMU needs to be able to handle that. This
patch provides an error message informing the user about the lack of
support for HPT in nested guests.
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20200911043123.204162-1-farosas@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU supports a 48-bit physical address range, but we don't currently
expose it in the '-cpu max' ID registers (you get the same range as
Cortex-A57, which is 44 bits).
Set the ID_AA64MMFR0.PARange field to indicate 48 bits.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201001160116.18095-1-peter.maydell@linaro.org
We add the kvm-steal-time CPU property and implement it for machvirt.
A tiny bit of refactoring was also done to allow pmu and pvtime to
use the same vcpu device helper functions.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-id: 20201001061718.101915-7-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>