Check that devices on the spapr vio bus aren't given duplicate
addresses. Currently we will not run with duplicate devices, the
fdt code will spot it, but the error reporting is not great. With
this patch we can report the error nicely in terms of the device
names given by the user.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
There is a device tree property "/chosen/linux,stdout-path" which indicates
which device should be used as stdout - ie. "the console".
Currently we don't specify anything, which means both firmware and Linux
choose something arbitrarily. Use the routine we added in the last patch
to pick a default vty and specify it as stdout.
Currently SLOF doesn't use the property, but we are hoping to update it
to do so.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
In vty_lookup() we have a special case for supporting early debug in
the kernel. This accepts reg == 0 as a special case to mean "any vty".
We implement this by searching the vtys on the bus and returning the
first we find. This means that the vty we chose depends on the order
the vtys are specified on the QEMU command line - because that determines
the order of the vtys on the bus.
We'd rather the command line order was irrelevant, so instead return
the vty with the lowest reg value. This is still a guess as to what the
user really means, but it is at least stable WRT command line ordering.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
[agraf] fix braces
Although in theory the device tree has no inherent ordering, in practice
the order of nodes in the device tree does effect the order that devices
are detected by software.
Currently the ordering is determined by the order the devices appear on
the QEMU command line. Although that does give the user control over the
ordering, it is fragile, especially when the user does not generate the
command line manually - eg. when using libvirt etc.
So order the device tree based on the reg value, ie. the address of on
the VIO bus of the devices. This gives us a sane and stable ordering.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
[agraf] add braces
Add NUMA specific properties to guest's device tree to boot a multi-node
guests. This patch adds the following properties:
ibm,associativity
ibm,architecture-vec-5
ibm,associativity-reference-points
With this, it becomes possible to use -numa option on pseries targets.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
For forgotten historical reasons, PAPR hypercalls for specific virtual IO
devices (oh which there are quite a number) are registered via a callback
in the VIOsPAPRDeviceInfo structure.
This is kind of ugly, so this patch instead registers hypercalls from
device_init() functions for each device type. This works just as well,
and is cleaner.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
When guest reset, we need to halt secondary cpus until guest kick them.
This already works for tcg. The patch add the support for kvm.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
[agraf: remove in-kernel irqchip code]
When trying to create a screen dump without having any VGA adapter
inside the guest, QEMU segfaults.
This is because it's trying to switch back to the "previous" screen
it was on before dumping the VGA screen. Unfortunately, in my case
there simply is no previous screen so it accesses a NULL pointer.
Fix it by checking if previous_active_console is actually available.
This is 1.0 material.
Signed-off-by: Alexander Graf <agraf@suse.de>
When run with a PPC Book3S (server) CPU Currently 'info tlb' in the
qemu monitor reports "dump_mmu: unimplemented". However, during
bringup work, it can be quite handy to have the SLB entries, which are
available in the CPUPPCState. This patch adds an implementation of
info tlb for book3s, which dumps the SLB.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
when I tried qemu with -virtio-console pty the guest hangs and attaching
on /dev/pts/<x> does not return anything if the attachment is too late.
This results in pty_chr_write() returning 0, which causes the port to
get throttled. This results in the guest getting frozen as the
guest->host virtio_console writes don't return until the host releases
the vq element back to the guest.
For the virtio-serial use case we don't want to lose data but for the
console case we better drop data instead of "killing" the guest
console. If we get chardev->frontend notification and a better behaving
virtio-console we can revert this fix.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Make's multiple output syntax
x.c x.h: x.template
gen < x.template
actually invokes the command once for x.c and once for x.h (with differing $@
in each invocation). During a parallel build, the two commands may be invoked
in parallel; this opens up a race, where the second invocation trashes a file
supposedly produced during the first, and now in use by a dependent command.
The various qapi code generators are susceptible to this; fix by making them
generate just one file per invocation.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* aneesh/for-upstream:
scripts/analyse-9p-simpletrace.py: Add symbolic names for 9p operations.
hw/9pfs: iattr_valid flags are kernel internal flags map them to 9p values.
hw/9pfs: Use the correct signed type for different variables
hw/9pfs: replace iovec manipulation with QEMUIOVector
* bonzini/nbd-for-anthony: (26 commits)
nbd: add myself as maintainer
qemu-nbd: throttle requests
qemu-nbd: asynchronous operation
qemu-nbd: add client pointer to NBDRequest
qemu-nbd: move client handling to nbd.c
qemu-nbd: use common main loop
link the main loop and its dependencies into the tools
qemu-nbd: introduce NBDRequest
qemu-nbd: introduce NBDExport
qemu-nbd: introduce nbd_do_receive_request
qemu-nbd: more robust handling of invalid requests
qemu-nbd: introduce nbd_do_send_reply
qemu-nbd: simplify nbd_trip
move corking functions to osdep.c
qemu-nbd: remove data_size argument to nbd_trip
qemu-nbd: remove offset argument to nbd_trip
Update ioctl order in nbd_init() to detect EBUSY
nbd: add support for NBD_CMD_TRIM
nbd: add support for NBD_CMD_FLUSH
nbd: add support for NBD_CMD_FLAG_FUA
...
qemu-kvm passes numa/SRAT topology information for smp_cpus to SeaBIOS. However
SeaBIOS always expects to setup max_cpus number of SRAT cpu entries
(MaxCountCPUs variable in build_srat function of Seabios). When qemu-kvm runs
with smp_cpus != max_cpus (e.g. -smp 2,maxcpus=4), Seabios will mistakenly use
memory SRAT info for setting up CPU SRAT entries for the offline CPUs. Wrong
SRAT memory entries are also created. This breaks NUMA in a guest.
Fix by setting up SRAT info for max_cpus in qemu-kvm.
Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The latter was already commented out, the former is redundant as well.
We always get the latest changes after return from the guest via
kvm_arch_post_run.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Keep a per-VCPU xsave buffer for kvm_put/get_xsave instead of
continuously allocating and freeing it on state sync.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Field 0 (FCW+FSW) and 1 (FTW+FOP) were hard-coded so far.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Limiting the number of in-flight requests is implemented very simply
with a can_read callback. It does not require a semaphore, unlike the
client side in block/nbd.c, because we can throttle directly the creation
of coroutines. The client side can have a coroutine created at any time
when an I/O request is made.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using coroutines enable asynchronous operation on both the network and
the block side. Network can be owned by two coroutines at the same time,
one writing and one reading. On the send side, mutual exclusion is
guaranteed by a CoMutex. On the receive side, mutual exclusion is
guaranteed because new coroutines immediately start receiving data,
and no new coroutines are created as long as the previous one is receiving.
Between receive and send, qemu-nbd can have an arbitrary number of
in-flight block transfers. Throttling is implemented by the next
patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
By attaching a client to an NBDRequest, we can avoid passing around the
socket descriptor and data buffer.
Also, we can now manage the reference count for the client in
nbd_request_get/put request instead of having to do it ourselved in
nbd_read. This simplifies things when coroutines are used.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch sets up the fd handler in nbd.c instead of qemu-nbd.c. It
introduces NBDClient, which wraps the arguments to nbd_trip in a single
structure, so that we can add a notifier to it. This way, qemu-nbd can
know about disconnections.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using a single main loop for sockets will help yielding from the socket
coroutine back to the main loop, and later reentering it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using the main loop code from QEMU enables tools to operate fully
asynchronously. Advantages include better Windows portability (for some
definition of portability) over glib's.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the buffer from NBDExport to a new structure, so that it will be
possible to have multiple in-flight requests for the same export
(and for the same client too---we get that for free).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Group the sending of a reply and the associated data into a new function.
Without corking, the caller would be forced to leave 12 free bytes at the
beginning of the data pointer. Not too ugly, but still ugly. :)
Using nbd_do_send_reply everywhere will help when the routine will set up
the write handler that re-enters the send coroutine.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use TCP_CORK to remove a violation of encapsulation, that would later
require nbd_trip to know too much about an NBD reply.
We could also switch to sendmsg (qemu_co_sendv) later, it is even
easier once coroutines are in.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Update ioctl(s) in nbd_init() to detect device busy early.
Current nbd_init() issues NBD_CLEAR_SOCKET before NBD_SET_SOCKET, if issuing
"qemu-nbd -c /dev/nbd0 disk.img" twice, the second time won't detect EBUSY in
nbd_init(), but in nbd_client will report EBUSY and do clear socket (the 1st
time command will be affacted too because of no socket any more.)
No change to previous version.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow sending up to 16 requests, and drive the replies to the coroutine
that did the request. The code is written to be exactly the same as
before this patch when MAX_NBD_REQUESTS == 1 (modulo the extra mutex
and state).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu-nbd has a limit of slightly less than 1M per request. Work
around this in the nbd block driver.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Outside coroutines, avoid busy waiting on EAGAIN by temporarily
making the socket blocking.
The API of qemu_recvv/qemu_sendv is slightly different from
do_readv/do_writev because they do not handle coroutines. It
returns the number of bytes written before encountering an
EAGAIN. The specificity of yielding on EAGAIN is entirely in
qemu-coroutine.c.
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There's no need to check if ports can accept any incoming data from the
guest each time the guest sends data. Check if the port implements such
functionality during port initialisation.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The earlier code really was a hack: initialising class methods in an
object init function as noted by Anthony.
The motivation for that was to not have the virtio-serial-bus call into
the callback functions if there was no chardev backend registered.
However, that really wasn't a worthwhile optimisation, and definitely
not one that was well-implemented. Get rid of it.
Reported-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
For the callback functions invoked by the virtio-serial-bus code, check
if we have chardev backends registered before we call into the chardev
functions.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Reported-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently, we just print the numerical value of 9p operation identifier in
case of RERROR which is less meaningful for readability. Mapping 9p
operation ids to symbolic names provides a better tracelog:
RERROR (tag = 1 , id = TWALK , err = " No such file or directory ")
RERROR (tag = 1 , id = TUNLINKAT , err = " Directory not empty ")
This patch provides a dictionary of all possible 9p operation symbols mapped
to their numerical identifiers which are likely to be used in future at
various places in this script.
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Kernel internal values can change, add protocol values for these constant and
use them.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>