The schib parameter of css_do_msch() can be declared as const to
make it clear that it does not get modified by this function.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
According to the POP specification, the parameter blocks of various
functions like the IO instructions are accessed with logical addresses.
Thus we need a function that can read or write a buffer from/to the
guest's logical address space.
This patch now provides a function that can be used to access virtual
guest memory by using the mmu_translate function of QEMU to convert
the virtual addresses to physical.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Program access exceptions are defined to deliver a translation exception
code in the low-core. Add a function trigger_access_exception() that
generates the proper program interrupt on both KVM and non-KVM systems
and switch the existing code to use it.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We can get rid of the switch(asc) in mmu_translate_asc() by simply
selecting the right control register ASCE in the mmu_translate()
function already.
This patch is based on an original patch/idea by Ralf Hoppe.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Bit 52 in a page table entry has always to be zero, or a translation
specification exception is to be recognized.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
An Address Space Control Element (ASCE) is only the very first unit of
an s390 address translation (normally residing in one of the control
registers). The entries in the page tables are called differently.
So let's call the relevant variable pt_entry instead of asce in
mmu_translate_pte() to avoid future confusion (thus there is no
functional change in this patch, just renaming).
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If the "DAT-protection" bit is set in the region table entry and EDAT is
enabled, only read accesses are allowed in the corresponding memory area.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Each different level of region/segment table has a dedicated
exception type for illegal entries.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If an ASCE has illegal bits set, an ASCE-type exception should be
generated instead of a translation specification exception.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The address space bits in the translation exception code were wrong.
In fact, we can simply copy the bits from the PSW, so there's no need
for the trans_bits() function anymore.
Additionally, we now also set the fetch/store bits in the translation
exception code, so a guest can determine whether the exception occured
during a write or during a read.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When a fault occurs during the MMU lookup in s390_cpu_get_phys_page_debug(),
the trigger_page_fault() function writes the translation exception code
into the lowcore - something you would not expect during a memory access
by the debugger. Ease this problem by adding an additional parameter to
mmu_translate() which can be used to specify whether a program check and
the translation exception code should be injected or not.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The ACSEs have a table length field and the region entries have
table length and offset fields which must be checked during
translation to see whether the given virtual address is really
covered by the translation table.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The current code used a wrong and very confusing way of dealing with
the table levels by introducing a "fake level above current". However,
the real problem was simply that the checks for the region/segment
invalid bit and for the matching region/segment level was done at the
wrong spot in the code - it has to be done after the first table entry
has been looked up instead (e.g. there is also no "invalid" bit in the
ASCE itself and the current "level" has to be the same as the level in
the entry that we just looked up).
Also the entries for the segment table are quite a bit different compared
to the region table entries. So this patch moves the related code into the
function mmu_translate_segment() to make it clear at which table level we
currently are and to get rid of the ugly switch-statement in the function
mmu_translate_region().
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The real-space designation bits live in the ASCEs, not in the table entries,
so the check must be done before we start walking the MMU table.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
helper.c is quite overcrowded already, so let's move the MMU
translation to a separate file instead (like it has been done
with the other targets already).
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We have to migrate the reipl parameters, so a reboot on the migrated machine
will behave just like on the origin. Otherwise, the reipl parameters configured
by the guest would be lost.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Whenever a reboot initiated by the guest is done, the reipl parameters should
remain valid. The disk configured by the guest is to be used for
ipl'ing. External reboot/reset request (e.g. via virsh reset guest) should
completely reset the guest to the initial state, and therefore also reset the
reipl parameters, resulting in an ipl behaviour of the initially configured
guest. This could be an external kernel or a disk.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
To support dynamically updating the IPL device from inside the KVM
guest on the s390 platform, DIAG 308 instruction is intercepted
in QEMU to handle the request.
Subcode 5 allows to specify a new boot device, which is saved for
later in the s390_ipl device. This also allows to switch from an
external kernel to a boot device.
Subcode 6 retrieves boot device configuration that has been previously
set.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We will need bios support in order to be able to support selecting a
different boot device via diagnose 308 in the ccw machine, so let's
make the bios mandatory for the ccw machine.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU3Y5ZAAoJEK0ScMxN0CebuvYH/iQa+ZA1yavKsOa2xtB4tq0K
gwS0hwCmAtS7CV6iM8Wrte/3HJr4xtnQGeBQgatmM7PrDx6o6cjnxwga2mBx6EQv
fIhWlNPlctAgd11ocuhXXUdGv2yL8TWf+WLTDcO6CS6zbdYp+42sOvrWGKL1UQbz
RAJhiK5BZsDHv2DodqZBk7JZr9l4hAohARs627xG7uq/j6vggEHo6y/9Z6mETcRX
fkIwx/qjUmDJbBlFgtgEPBYMX/AWDu7XbhU1kT+EjcrFwEjEd0mxi52NN8uEyGQ9
xgAekmPvcNR0UpkVl5pIr64/wTkd+m2mBG8CVENG3ryDKePJk64jVth+I/8Z/Qo=
=+mGW
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20150212' into staging
Convert to linked list.
# gpg: Signature made Fri 13 Feb 2015 05:40:41 GMT using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg: aka "Richard Henderson <rth@redhat.com>"
# gpg: aka "Richard Henderson <rth@twiddle.net>"
* remotes/rth/tags/pull-tcg-20150212:
tcg: Remove unused opcodes
tcg: Implement insert_op_before
tcg: Remove opcodes instead of noping them out
tcg: Put opcodes in a linked list
tcg: Introduce tcg_op_buf_count and tcg_op_buf_full
tcg: Move emit of INDEX_op_end into gen_tb_end
tcg: Reduce ifdefs in tcg-op.c
tcg: Move some opcode generation functions out of line
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* PCIe support in virt board
* Support 32-bit guests on 64-bit KVM hosts in virt board
* Fixes to avoid C undefined behaviour
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJU3ZFDAAoJEDwlJe0UNgzeigkP/3YaRaADAHuwFsdOt3h9k7eZ
H/yGimwUURsVmEJKTVQLFtq0GOsUgixpbMwjrOe37wJHGY8/kUlw/8HzEBcfTtnF
+e6BjtjOOV+TQNBpnls+DV1exJ1MeJt5onl/sKEHR0H/WYNQ2z+M/gqSw5nasjqb
DjVWuvjgtnGR5oD/1pv0g2pld6BBcrpoxu9LNHgftTyWlJk2KuE4r7xgNu/pYROB
uyWRwQHxQ3LG/aYXLYKWQrm+AWOx2gZ7Zz3GG+RGPU8sTkTqyqcZ/VNWpZ3ocWRO
So2wqYT09piuEaEFTHYgqy/aaE1TXitHBpuI8korM9yow0lCQ5ZisYssf5xALEHq
Q1ab74QzQHisb2GaqG8XQsZF5Rvzx/xrTvcbNaf+DxprEbaXZjKLEO2iZnK24A+1
XUVgfUY4YHdSPY18T3UCiGOjB+V4fJc2aZqu7GQvTKD/F7TmfFXkM6mzUQamPdIi
Gow5PZtFBZYc2FoN9n06XaixuLOHaQ3s78vubOIv60U0vmI5ygvXXBKz+rfP4Xqu
YiiH9VKoi+5fujTsdFrT4JCwqE70+oIsrchP5h4q0zVukUFsYx24lj+PIRGX8oa5
7Mj0s28yn8/2+cpZXYxT/0tQpJjb5p+5kMDa3ee8EL22mx9fee0J8b12EJGLIU+t
jkXVJOODQYXeTbxranAZ
=U6iA
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150213' into staging
target-arm queue:
* PCIe support in virt board
* Support 32-bit guests on 64-bit KVM hosts in virt board
* Fixes to avoid C undefined behaviour
# gpg: Signature made Fri 13 Feb 2015 05:53:07 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
* remotes/pmaydell/tags/pull-target-arm-20150213:
target-arm: A64: Avoid signed shifts in disas_ldst_pair()
target-arm: A64: Avoid left shifting negative integers in disas_pc_rel_addr
target-arm: A64: Fix handling of rotate in logic_imm_decode_wmask
target-arm: A64: Fix shifts into sign bit
target-arm: Add AArch32 guest support to KVM64
target-arm: Add 32/64-bit register sync
target-arm: Add feature parsing to virt
target-arm: Add CPU property to disable AArch64
pci: Move PCI VGA to pci.mak
arm: Add PCIe host bridge in virt machine
pci: Add generic PCIe host bridge
pci: Allocate PCIe host bridge PCI ID
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Avoid shifting potentially negative signed offset values in
disas_ldst_pair() by keeping the offset in a uint64_t rather
than an int64_t.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1423233250-15853-5-git-send-email-peter.maydell@linaro.org
Shifting a negative integer left is undefined behaviour in C.
Avoid it by assembling and shifting the offset fields as
unsigned values and then sign extending as the final action.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1423233250-15853-4-git-send-email-peter.maydell@linaro.org
The code in logic_imm_decode_wmask attempts to rotate a mask
value within the bottom 'e' bits of the value with
mask = (mask >> r) | (mask << (e - r));
This has two issues:
* if the element size is 64 then a rotate by zero results
in a shift left by 64, which is undefined behaviour
* if the element size is smaller than 64 then this will
leave junk in the value at bit 'e' and above, which is
not valid input to bitfield_replicate(). As it happens,
the bits at bit 'e' to '2e - r' are exactly the ones
which bitfield_replicate is going to copy in there,
so this isn't a "wrong code generated" bug, but it's
confusing and if we ever put an assert in
bitfield_replicate it would fire on valid guest code.
Fix the former by not doing anything if r is zero, and
the latter by masking with bitmask64(e).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1423233250-15853-3-git-send-email-peter.maydell@linaro.org
Fix attempts to shift into the sign bit of an int, which is undefined
behaviour in C and warned about by the clang sanitizer.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1423233250-15853-2-git-send-email-peter.maydell@linaro.org
Add 32-bit to/from 64-bit register synchronization on register gets and puts.
Set EL1_32BIT feature flag passed to KVM
Signed-off-by: Greg Bellows <greg.bellows@linaro.org>
Message-id: 1423736974-14254-5-git-send-email-greg.bellows@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add AArch32 to AArch64 register sychronization functions.
Replace manual register synchronization with new functions in
aarch64_cpu_do_interrupt() and HELPER(exception_return)().
Signed-off-by: Greg Bellows <greg.bellows@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1423736974-14254-4-git-send-email-greg.bellows@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Added machvirt parsing of feature keywords added to the -cpu command line
option. Parsing occurs during machine initialization.
Signed-off-by: Greg Bellows <greg.bellows@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1423736974-14254-3-git-send-email-greg.bellows@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Adds registration and get/set functions for enabling/disabling the AArch64
execution state on AArch64 CPUs. By default AArch64 execution state is enabled
on AArch64 CPUs, setting the property to off, will disable the execution state.
The below QEMU invocation would have AArch64 execution state disabled.
$ ./qemu-system-aarch64 -machine virt -cpu cortex-a57,aarch64=off
Also adds stripping of features from CPU model string in acquiring the ARM CPU
by name.
Signed-off-by: Greg Bellows <greg.bellows@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1423736974-14254-2-git-send-email-greg.bellows@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Every platform that supports PCI can also spawn the Bochs VGA PCI adapter. Move
it to pci.mak to enable it for everyone.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now that we have a working "generic" PCIe host bridge driver, we can plug
it into ARM's virt machine to always have PCIe available to normal ARM VMs.
I've successfully managed to expose a Bochs VGA device, XHCI and an e1000
into an AArch64 VM with this and they all lived happily ever after.
Signed-off-by: Alexander Graf <agraf@suse.de>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
[PMM: Squashed in fix for off-by-one error in bus-range DT property
from Laszlo Ersek <lersek@redhat.com>]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
With simple exposure of MMFG, ioport window, mmio window and an IRQ line we
can successfully create a workable PCIe host bridge that can be mapped anywhere
and only needs to get described to the OS using whatever means it likes.
This patch implements such a "generic" host bridge. It handles 4 legacy IRQ
lines. MSIs need to be handled external to the host bridge.
This device is particularly useful for the "pci-host-ecam-generic" driver in
Linux.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We are going to introduce a PCIe host controller that doesn't exist that
way in real hardware, but still needs to expose some PCIe root device which
has PCI IDs.
Allocate a PCI ID in the Red Hat space that we use for other devices of this
kind.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We no longer need INDEX_op_end to terminate the list, nor do we
need 5 forms of nop, since we just remove the TCGOp instead.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Rather reserving space in the op stream for optimization,
let the optimizer add ops as necessary.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
With the linked list scheme we need not leave nops in the stream
that we need to process later.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The previous setup required ops and args to be completely sequential,
and was error prone when it came to both iteration and optimization.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The method by which we count the number of ops emitted
is going to change. Abstract that away into some inlines.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Almost completely eliminates the ifdefs in this file, improving
confidence in the lesser used 32-bit builds.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Some of these functions are really quite large. We have a number of
things that ought to be circularly dependent, but we duplicated code
to break that chain for the inlines.
This saved 25% of the code size of one of the translators I examined.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This fixes a compiler error which occurs if DEBUG_VFIO is defined.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The difference between v1 and v2 is fairly subtle, simply more
deterministic behavior for unmaps. The v1 interface allows the user
to attempt to unmap sub-regions of previous mappings, returning
success with zero size if unable to comply. This was a reflection of
the underlying IOMMU API. The v2 interface requires that the user
may only unmap fully contained mappings, ie. an unmap cannot intersect
or bisect a previous mapping, but may cover multiple mappings. QEMU
never made use of the sub-region v1 support anyway, so we can support
either v1 or v2. We'll favor v2 since it's newer.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In the case of VFIO, the unrealize callback is too early to munmap the
BARs. The munmap must be delayed until memory accesses are complete.
To do this, split vfio_unmap_bars in two. The removal step, now called
vfio_unregister_bars, remains in vfio_exitfn. The reclamation step
is vfio_unmap_bars and is moved to the instance_finalize callback.
Similarly, quirk MemoryRegions have to be removed during
vfio_unregister_bars, but freeing the data structure must be delayed
to vfio_unmap_bars.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In order to enable out-of-BQL address space lookup, destruction of
devices needs to be split in two phases.
Unrealize is the first phase; once it complete no new accesses will
be started, but there may still be pending memory accesses can still
be completed.
The second part is freeing the device, which only happens once all memory
accesses are complete. At this point the reference count has dropped to
zero, an RCU grace period must have completed (because the RCU-protected
FlatViews hold a reference to the device via memory_region_ref). This is
when instance_finalize is called.
Freeing data belongs in an instance_finalize callback, because the
dynamically allocated memory can still be used after unrealize by the
pending memory accesses.
This starts the process by creating an instance_finalize callback and
freeing most of the dynamically-allocated data in instance_finalize.
Because instance_finalize is also called on error paths or also when
the device is actually not realized, the common code needs some changes
to be ready for this. The error path in vfio_initfn can be simplified too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Now that vfio_put_base_device is called unconditionally at instance_finalize
time, it can be called twice if vfio_populate_device fails. This works
but it is slightly harder to follow.
Change vfio_get_device to not touch the vbasedev struct until it will
definitely succeed, moving the vfio_populate_device call back to vfio-pci.
This way, vfio_put_base_device will only be called once.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
address_space_destroy_dispatch is called from an RCU callback and hence
outside the iothread mutex (BQL). However, after address_space_destroy
no new accesses can hit the destroyed AddressSpace so it is not necessary
to observe changes to the memory map. Move the memory_listener_unregister
call earlier, to make it thread-safe again.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Fixes: 374f2981d1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Warning from the Sparse static analysis tool:
hw/char/virtio-serial-bus.c:31:3:
warning: symbol 'vserdevices' was not declared. Should it be static?
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>