They are wrappers of POSIX fcntl "file private locking", with a
convenient "try lock" wrapper implemented with F_OFD_GETLK.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Opening the backing image for the second time is bad, especially here
when it is also in use as the active image as the source. The
drive-backup job itself doesn't read from target->backing for COW,
instead it gets data from the write notifier, so it's not a big problem.
However, exporting the target to NBD etc. won't work, because of the
likely stale metadata cache.
Use BDRV_O_NO_BACKING in this case and manually set up the backing
BdrvChild.
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The COLO block replication architecture requires one disk to be shared
between primary and secondary, in the test both processes use posix file
protocol (instead of over NBD) so it is affected by image locking.
Disable the lock.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We share the same set of QAPI options with file-posix, but locking is
not supported here. So error out if it is specified as 'on' for now.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Making this option available even before implementing it will let
converting tests easier: in coming patches they can specify the option
already when necessary, before we actually write code to lock the
images.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The test scenario doesn't require the same image, instead it focuses on
the duplicated node-name, so use null-co to avoid locking conflict.
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the case where we test the expected error when a blockdev-snapshot
target already has a backing image, the backing chain is opened multiple
times. This will be a problem when we use image locking, so use a
different backing file that is not already open.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Double attach is not a valid usage of the target image, drive-backup
will open the blockdev itself so skip the add_drive call in this case.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The qemu-img info command is executed while VM is running, add -U option
to avoid the image locking error.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu-img and qemu-io commands when guest is running need "-U" option,
add it.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add --force-share/-U to program options and -U to open subcommand.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This will force the opened images to allow sharing all permissions with other
programs.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It can be used outside of block.c for making user friendly messages.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
vga display update mis-calculated the region for the dirty bitmap
snapshot in case the scanlines are padded. This can triggere an
assert in cpu_physical_memory_snapshot_get_dirty().
Fixes: fec5e8c92b
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170509104839.19415-1-kraxel@redhat.com
This patch adds support for absolute pointer events to the input-linux
subsystem. This support was omitted from the original input-linux patch,
however most of the code required for it is already in place.
Support for absolute events is especially useful for guests with vga
passthrough. Since they have a physical monitor, none of normal channels
for sending video output (vnc, etc) are used, meaning they also can't be
used to send absolute input events. This leaves QMP as the only option
to send absolute input into vga passthrough guests, which is not its
intended use and is not efficient.
This patch allows, for example, uinput to be used to create virtual
absolute input devices. This lets you build external systems which share
physical input devices between guests. Without absolute input
capability, such external systems can't seamlessly share pointer devices
between guests.
Signed-off-by: Philippe Voinov <philippevoinov@gmail.com>
Message-id: 20170505134231.30210-1-philippevoinov@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This patch refactors ui/input.c to support absolute axis
minimum values other than 0. All dependent calls to qemu_input_queue_abs
have been updated to explicitly supply 0 as the axis minimum value.
Signed-off-by: Philippe Voinov <philippevoinov@gmail.com>
Message-id: 20170505133952.29885-1-philippevoinov@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When running with KVM, we update the "family" CPU alias to point
to the right host CPU type, so that it for example possible to
use "-cpu POWER8" on a POWER8NVL host. However, the function for
printing the list of available CPU models is called earlier than
the KVM setup code, so the output of "-cpu help" is wrong in that
case. Since it would be somewhat ugly anyway to have different
help texts depending on whether "-enable-kvm" has been specified
or not, we should better always print the same text, so fix this
issue by printing "alias for preferred XXX CPU" instead.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This makes some changes to fix build failures on the 'min-glib' docker
image, and maybe other platforms with a buildchain that's less tolerant
about duplicated typedefs.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
POWER9 DD1 silicon has some bugs which mean it a) isn't really compliant
with the ISA v3.00 and b) require a number of special workarounds in the
kernel.
At the moment, qemu isn't aware of DD1. For TCG we don't really want it to
be (why bother emulating buggy silicon). But with KVM, the guest does need
to be aware of DD1 so it can apply the necessary workarounds.
Meanwhile, the feature negotiation between qemu and the guest strongly
favours architected compatibility modes to "raw" CPU modes. In combination
with the above, this means the guest sees architected POWER9 mode, and
doesn't apply the DD1 workarounds. Well, unless it has yet another
workaround to partially ignore what qemu tells it.
This patch addresses this by disabling support for compatibility modes when
using KVM on a POWER9 DD1 host.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Logic in spapr_populate_pa_features() enables the bit advertising
Hardware Transactional Memory (HTM) in the guest's device tree only when
KVM advertises its availability with the KVM_CAP_PPC_HTM feature.
However, this assumes that the HTM bit is off in the base template used for
the device tree value. That is true for POWER8, but not for POWER9.
It looks like that was accidentally changed in 9fb4541 "spapr: Enable ISA
3.0 MMU mode selection via CAS".
Fixes: 9fb4541f58
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
The PowerPCCPU typedef is included twice if a file includes
both hw/ppc/xics.h and target/ppc/cpu-qom.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that we have added all the infrastructure we can enable a pseries TCG
guest to use radix.
In order to do this we have to add the appropriate bits to the
ibm,arch-vec-5-platform-support vector to represent that we support both
hash and radix mmu models.
A radix guest can now be booted in pseries tcg mode by specifying:
-cpu POWER9
Note that we assume hash, that is we allocate a hpt, until a guest tells
us otherwise via a H_REGISTER_PROCESS_TABLE call with radix specified - in
which case we free the hpt. If we were right and the guest is hash then
there's nothing for us to do.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ISA V3.00 introduced a new radix mmu model. Implement the page fault
handler for this so we can run a tcg guest in radix mode and perform
address translation correctly.
In real mode (mmu turned off) addresses are masked to remove the top
4 bits and then are subject to partition scoped translation, since we only
support pseries at this stage it is only necessary to perform the masking
and then we're done.
In virtual mode (mmu turned on) address translation if performed as
follows:
1. Use the quadrant to determine the fully qualified address.
The fully qualified address is defined as the combination of the effective
address, the effective logical partition id (LPID) and the effective
process id (PID). Based on the quadrant (EA63:62) we set the pid and lpid
like so:
quadrant 0: lpid = LPIDR, pid = PIDR
quadrant 1: HV only (not allowed in pseries)
quadrant 2: HV only (not allowed in pseries)
quadrant 3: lpid = LPIDR, pid = 0
If we can't get the fully qualified address we raise a segment interrupt.
2. Find the guest radix tree
We ask the virtual hypervisor for the partition table which was registered
with H_REGISTER_PROC_TBL which points us to the process table in guest
memory. We then index this table by pid to get the process table entry
which points us to the appropriate radix tree to translate the address.
If the process table isn't big enough to contain an entry for the current
pid then we raise a storage interrupt.
3. Walk the radix tree
Next we walk the radix tree where each level is a table of page directory
entries indexed by some number of bits from the effective address, where
the number of bits is determined by the table size. We continue to walk
the tree (while entries are valid and the table is of minimum size) until
we reach a table of page table entries, indicated by having the leaf bit
set. The appropriate pte is then checked for sufficient access permissions,
the reference and change bits are updated and the real address is
calculated from the real page number bits of the pte and the low bits of
the effective address.
If we can't find an entry or can't access the entry bacause of permissions
then we raise a storage interrupt.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Add missing parentheses to macro]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The tlbie[l] instructions are used to invalidate TLB entries used to cache
address translations.
In ISAv3.00 (POWER9) more fields were added to the tblie[l] instructions
which were previously invalid. We don't care about any of these new fields
since we just invalidate the whole world anyway but we need to not
cause an illegal instruction exception when the instructions are called.
We also don't want to allow an older processor to have these fields set
since that would be invalid.
Add a new GEN_HANDLER for the ISAv3 instructions with the correct invalid
mask. These will only be generated to a POWER9 processor for now based on
the instruction flag. Also remove the PPC_MEM_TLBIE instruction flag from
the POWER9 processor definition to ensure the old tlbie isn't generated.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Guest Translation Shootdown Enable (GTSE) bit in the Logical Partition
Control Register (LPCR) can be set to enable a guest to use the tlbie
instruction directly to invalidate translations.
When the GTSE bit is set then the tlbie instruction is supervisor
privileged, otherwise it is hypervisor privileged.
Add a guest translation shootdown enable (gtse) field to the diassembly
context and use this to check the correct privilege level at code
generation time.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The UPRT and GTSE bits are set when a guest calls H_REGISTER_PROCESS_TABLE
to choose determine how address translation is performed. Currently these
bits in the LPCR are only set for the cpu which handles the H_CALL, however
they need to be set for all cpus for that guest as address translation
cannot be performed differently on a per cpu basis.
Update the H_CALL handler to set these bits in the LPCR correctly for all
cpus of the guest.
Note it is the reponsibility of the guest to ensure that any secondary cpus
are suspended when the H_CALL is made and thus we can safely update these
values here.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The QemuMacDrivers project provides virtualisation drivers for PPC MacOS
guests.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Kernel commit 17d48610ae0f ("KVM: PPC: Book 3S: XICS: Implement ICS
P/Q states") added new bits to the state used by KVM IRQs. Currently,
QEMU does not preserve these bits, so migrating (or otherwise saving
and restoring) the guest state causes the P and Q bits to be cleared.
Clearing the P bit has no effect, because the kernel will set it based
on other data, but the loss of a set Q bit will cause a lost
interrupt.
This patch preserves the P and Q bits, correcting the problem.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ics_get_kvm_state() "or"s set bits into irq->status but does not mask
out clear bits.
Correct this by initializing the IRQ status to zero before adding bits
to it.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In case when atomic operation is not supported, exit_atomic is called
and we stop the world and execute the atomic operation. This results
in a following call chain:
tcg_gen_atomic_cmpxchg_tl()
-> gen_helper_exit_atomic()
-> HELPER(exit_atomic)
-> cpu_loop_exit_atomic() -> EXCP_ATOMIC
-> qemu_tcg_cpu_thread_fn() => case EXCP_ATOMIC
-> cpu_exec_step_atomic()
-> cpu_step_atomic()
-> cc->cpu_exec_enter() = ppc_cpu_exec_enter()
Sets env->reserve_addr = -1;
But by the time it return back, the reservation is erased and the code
fails, this continues forever and the lock is never taken.
Instead set this in powerpc_excp()
Now that ppc_cpu_exec_enter() doesn't have anything meaningful to do,
let us get rid of the function.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This enables the multi-threaded system emulation by default for PPC64
guests using the x86_64 TCG back-end.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Ensure that the unplugged CPU thread is destroyed and the waiting
thread is notified about it. This is needed for CPU unplug to work
correctly in MTTCG mode.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In case where the conditional write is the first write to the page,
TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as
a special case and set the dirty bit. After that fall through to the
actual atomic instruction below.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Emulating LL/SC with cmpxchg is not correct, since it can suffer from
the ABA problem. However, portable parallel code is written assuming
only cmpxchg which means that in practice this is a viable alternative.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, when a PowerNV guest runs, it uses the sensor definitions of
the BMC simulator to populate the device tree. But an external IPMI
BMC could also be used and, in that case, it is not (yet) possible to
retrieve the sensor list. Generating the OEM SEL event for shutdown or
reboot also does not make sense as it should be generated on the BMC
side.
This change allows a guest to use an 'ipmi-bmc-extern' backend to the
'isa-ipmi-bt' device and a 'chardev' for transport such as :
-chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \
-device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \
-device isa-ipmi-bt,bmc=bmc0,irq=10
and connect to a BMC simulator, the OpenIPMI ipmi_sim simulator for
instance.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>