Add RCU-enabled variants on the existing bsd DQ facility. Each
operation has the same interface as the existing (non-RCU)
version. Also, each operation is implemented as macro.
Using the RCU-enabled QLIST, existing QLIST users will be able to
convert to RCU without using a different list interface.
Signed-off-by: Mike Day <ncmike@ncultra.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Note that even after this patch, most callers of address_space_*
functions must still be under the big QEMU lock, otherwise the memory
region returned by address_space_translate can disappear as soon as
address_space_translate returns. This will be fixed in the next part
of this series.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After the previous patch, TLBs will be flushed on every change to
the memory mapping. This patch augments that with synchronization
of the MemoryRegionSections referred to in the iotlb array.
With this change, it is guaranteed that iotlb_to_region will access
the correct memory map, even once the TLB will be accessed outside
the BQL.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This for now is a simple TLB flush. This can change later for two
reasons:
1) an AddressSpaceDispatch will be cached in the CPUState object
2) it will not be possible to do tlb_flush once the TCG-generated code
runs outside the BQL.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that objects actually obey the rules, document them.
Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
object_unparent should not be called until the parent device is going to be
destroyed. Only remove the capability and do memory_region_del_subregion
at unrealize time. Freeing the data structures is left in shpc_free, to
be called from the instance_finalize callback.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This memory leak was introduced inadvertently by omitting object_unparent.
A better fix is to use the new memory_region_set_size instead of destroying
and recreating the MMIO region on the fly.
Also, ensure that unmapping and remapping the region is done atomically.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This fixes a use-after-free if do_address_space_destroy is executed
too late.
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This needs to go away sooner or later, but one complication is the
complex VFIO data structures that are modified in instance_finalize.
Take a shortcut for now.
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Always process them within a short time. Even though waiting a little
is useful, it is not okay to delay e.g. qemu_opts_del forever.
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
At present, the target is valued boot_tpgt, In addition,
channel and lun both are 0 for bootable vhost-scsi device.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Bo Su <subo7@huawei.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because Qemu only accept an wwpn argument for vhost-scsi, we
cannot assign a tpgt. That's say tpg is transparent for Qemu, Qemu
doesn't know which tpg can boot, but vhost-scsi driver module
doesn't know too for one assigned wwpn.
At present, we assume that the first tpg can boot only, and add
a boot_tpgt property that defaults to 0. Of course, people can
pass a valid value by qemu command line.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the way, we can make the bootindex property take effect.
At the meanwhile, the firmware path name of vhost-scsi is
"channel@channel/vhost-scsi@target,lun".
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
commit 6b1566c (qdev: Introduce FWPathProvider interface) did a
good job for supproting to get firmware path on some different
architectures.
Moreover further more, we can use the interface to get firmware
path name for a device which isn't attached a specific bus,
such as virtio-bus, scsi-bus etc.
When the device (such as vhost-scsi) realize the TYPE_FW_PATH_PROVIDER
interface, we should introduce a new function to get the correct firmware
path name for it.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch defines the list of kvm_exit reasons for aarch64. This list is
based on the Exception Class (EC) field of HSR register. With this patch
users can trace the execution of guest VMs better. A sample output from
command "kvm_stat -1 -t" is shown as the following:
<...>
kvm_exit(WATCHPT_HYP) 0 0
kvm_exit(WFI) 9422 9361
NOTE: This patch requires TRACE_EVENT(kvm_exit) to include exit_reason
field in TP_ARGS. A patch to upstream kernel has been submitted.
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This fixes a compiler error which occurs if DEBUG_VFIO is defined.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The difference between v1 and v2 is fairly subtle, simply more
deterministic behavior for unmaps. The v1 interface allows the user
to attempt to unmap sub-regions of previous mappings, returning
success with zero size if unable to comply. This was a reflection of
the underlying IOMMU API. The v2 interface requires that the user
may only unmap fully contained mappings, ie. an unmap cannot intersect
or bisect a previous mapping, but may cover multiple mappings. QEMU
never made use of the sub-region v1 support anyway, so we can support
either v1 or v2. We'll favor v2 since it's newer.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In the case of VFIO, the unrealize callback is too early to munmap the
BARs. The munmap must be delayed until memory accesses are complete.
To do this, split vfio_unmap_bars in two. The removal step, now called
vfio_unregister_bars, remains in vfio_exitfn. The reclamation step
is vfio_unmap_bars and is moved to the instance_finalize callback.
Similarly, quirk MemoryRegions have to be removed during
vfio_unregister_bars, but freeing the data structure must be delayed
to vfio_unmap_bars.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In order to enable out-of-BQL address space lookup, destruction of
devices needs to be split in two phases.
Unrealize is the first phase; once it complete no new accesses will
be started, but there may still be pending memory accesses can still
be completed.
The second part is freeing the device, which only happens once all memory
accesses are complete. At this point the reference count has dropped to
zero, an RCU grace period must have completed (because the RCU-protected
FlatViews hold a reference to the device via memory_region_ref). This is
when instance_finalize is called.
Freeing data belongs in an instance_finalize callback, because the
dynamically allocated memory can still be used after unrealize by the
pending memory accesses.
This starts the process by creating an instance_finalize callback and
freeing most of the dynamically-allocated data in instance_finalize.
Because instance_finalize is also called on error paths or also when
the device is actually not realized, the common code needs some changes
to be ready for this. The error path in vfio_initfn can be simplified too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Now that vfio_put_base_device is called unconditionally at instance_finalize
time, it can be called twice if vfio_populate_device fails. This works
but it is slightly harder to follow.
Change vfio_get_device to not touch the vbasedev struct until it will
definitely succeed, moving the vfio_populate_device call back to vfio-pci.
This way, vfio_put_base_device will only be called once.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
address_space_destroy_dispatch is called from an RCU callback and hence
outside the iothread mutex (BQL). However, after address_space_destroy
no new accesses can hit the destroyed AddressSpace so it is not necessary
to observe changes to the memory map. Move the memory_listener_unregister
call earlier, to make it thread-safe again.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Fixes: 374f2981d1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Expand out STATUS_PARAM wherever it is used and delete the definition.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Version: GnuPG v1
iQEcBAABAgAGBQJU1MtgAAoJEJykq7OBq3PId6IH/2p7BZSEal1CqmxgmcAyRxrB
IZ3RkDKyCF3ELBozvJ9RLHEakARVBNBSc4YSiQTFIcE6QYe8rRWXthbo6k6MiCnC
5w3Yh1EdocKLNOU0jCl0yN0cqJyWp6ax//66K4iFn7Q1+LCRVs74JO7z9U7tEXuW
cz3fRzb2OsP2tjUDTsnaIQNs7zewn1w9DgSnhtt9KS6rF9V9qDHeX4pjIcdEM45w
S+YMUaLtTmyTJ55ldq7YCMjBU+3KxFQi8LuEPjCwBMLyLaF35Uy2N99NIHGa0696
P8WAL67SV4YR9KpKIjL3w82Fjx22cpe1cUuxVTkEzCTFKHgq2yzHTdy0I02nhkc=
=9OUs
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging
# gpg: Signature made Fri 06 Feb 2015 14:10:40 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
* remotes/stefanha/tags/net-pull-request:
monitor: more accurate completion for host_net_remove()
net: del hub port when peer is deleted
net: remove the wrong comment in net_init_hubport()
monitor: print hub port name during info network
rtl8139: simplify timer logic
MAINTAINERS: add Jason Wang as net subsystem maintainer
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Current completion for host_net_remove will show hub ports and clients
that were not peered with hub ports. Fix this.
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-id: 1422860798-17495-4-git-send-email-jasowang@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We should del hub port when peer is deleted since it will not be reused
and will only be freed during exit.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-id: 1422860798-17495-3-git-send-email-jasowang@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Pavel Dovgalyuk reports that TimerExpire and the timer are not restored
correctly on the receiving end of migration.
It is not clear to me whether this is really the case, but we can take
the occasion to get rid of the complicated code that computes PCSTimeout
on the fly upon changes to IntrStatus/IntrMask. Just always keep a
timer running, it will fire every ~130 seconds at most if the interrupt
is masked with TimerInt != 0.
This makes rtl8139_set_next_tctr_time idempotent (when the virtual clock
is stopped between two calls, as is the case during migration).
Tested with Frediano's qtest.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1421765099-26190-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When debugging migration it's useful to know the PID of
each trace message so you can figure out if it came from the source
or the destination.
Printing the time makes it easy to do latency measurements or timings
between trace points.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 1421746875-9962-1-git-send-email-dgilbert@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
fix mc146818rtc wrong subsection name to avoid vmstate_subsection_load() fail
during incoming migration or loadvm.
Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com.cn>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Turn all the D/DD/DDDPRINTFs into trace events
Turn most of the fprintf(stderr, into error_report
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This patch adds a python tool to the scripts directory that can read
a dumped migration stream if it contains the JSON description of the
device states. I constructs a human readable JSON stream out of it.
It's very simple to use:
$ qemu-system-x86_64
(qemu) migrate "exec:cat > mig"
$ ./scripts/analyze_migration.py -f mig
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
One of the annoyances of the current migration format is the fact that
it's not self-describing. In fact, it's not properly describing at all.
Some code randomly scattered throughout QEMU elaborates roughly how to
read and write a stream of bytes.
We discussed an idea during KVM Forum 2013 to add a JSON description of
the migration protocol itself to the migration stream. This patch
adds a section after the VM_END migration end marker that contains
description data on what the device sections of the stream are composed of.
This approach is backwards compatible with any QEMU version reading the
stream, because QEMU just stops reading after the VM_END marker and ignores
any data following it.
With an additional external program this allows us to decipher the
contents of any migration stream and hopefully make migration bugs easier
to track down.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
For ftell we flush the output buffer to ensure that we don't have anything
lingering in our internal buffers. This is a very safe thing to do.
However, with the dynamic size measurement that the dynamic vmstate
description will bring this would turn out quite slow.
Instead, we can fast path this specific measurement and just take the
internal buffers into account when telling the kernel our position.
I'm sure I overlooked some corner cases where this doesn't work, so
instead of tuning the safe, existing version, this patch adds a fast
variant of ftell that gets used by the dynamic vmstate description code
which isn't critical when it fails.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
To support programmatic JSON assembly while keeping the code that generates it
readable, this patch introduces a simple JSON writer. It emits JSON serially
into a buffer in memory.
The nice thing about this writer is its simplicity and low memory overhead.
Unlike the QMP JSON writer, this one does not need to spawn QObjects for every
element it wants to represent.
This is a prerequisite for the migration stream format description generator.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Mostly on the load side, so that when we get a complaint about
a migration failure we can figure out what it didn't like.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Convert a bunch of fprintfs to error_reports
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Commit 22382bb96c renamed the
'hw_cursor_x' and 'hw_cursor_y' fields in cirrus_vga. Update the static
checker's whitelist to allow matching against the old and new names.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Memory allocated with GLib needs to be freed with GLib. Freeing it
with free() instead of g_free() is a common error. Harmless when
g_free() is a trivial wrapper around free(), which is commonly the
case. But model the difference anyway.
In a local scan, this flags four ALLOC_FREE_MISMATCH. Requires
--enable ALLOC_FREE_MISMATCH, because the checker is still preview.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Without a model, Coverity can't know that the result of g_strdup()
needs to be fed to g_free().
One way to get such a model is to scan GLib, build a derived model
file with cov-collect-models, and use that when scanning QEMU.
Unfortunately, the Coverity Scan service we use doesn't support that.
Thus, we're stuck with the other way: write a user model. Doing that
for all of GLib is hardly practical. I'm doing it for the "String
Utility Functions" we actually use that return dynamically allocated
strings.
In a local scan, this flags 20 additional RESOURCE_LEAKs. The ones I
checked look genuine.
It also loses a NULL_RETURNS about ppce500_init() using
qemu_find_file() without error checking. I don't understand why.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
In current versions of GLib, g_new() may expand into g_malloc_n().
When it does, Coverity can't see the memory allocation, because we
don't model g_malloc_n(). Similarly for g_new0(), g_renew(),
g_try_new(), g_try_new0(), g_try_renew().
Model g_malloc_n(), g_malloc0_n(), g_realloc_n(). Model
g_try_malloc_n(), g_try_malloc0_n(), g_try_realloc_n() by adding
indeterminate out of memory conditions on top.
To avoid undue duplication, replace the existing models for g_malloc()
& friends by trivial wrappers around g_malloc_n() & friends.
In a local scan, this flags four additional RESOURCE_LEAKs and one
NULL_RETURNS.
The NULL_RETURNS is a false positive: Coverity can now see that
g_try_malloc(l1_sz * sizeof(uint64_t)) in
qcow2_check_metadata_overlap() may return NULL, but is too stupid to
recognize that a loop executing l1_sz times won't be entered then.
Three out of the four RESOURCE_LEAKs appear genuine. The false
positive is in ppce500_prep_device_tree(): the pointer dies, but a
pointer to a struct member escapes, and we get the pointer back for
freeing with container_of(). Too funky for Coverity.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>