The VMDescription section maybe after the EOF mark, the current code
does a 'qemu_get_byte' and either gets the header byte identifying the
description or an error (which it ignores). Doing the 'get' upsets
RDMA which hangs on old machine types without the VMDescription.
Just avoid reading the VMDescription if we wouldn't send it.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Variable "r" going out of scope leaks the storage
it points to in line 3268.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
4.1 Linux kernel doesn't require specifying "master" or "self" when setting
vlans on a port, so clean these up from the tests that use vlans.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Message-id: 1435746792-41278-6-git-send-email-sfeldma@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
For pkts copied to the CPU (to be processed by guest driver), mark the Rx
descriptor with flag "OFFLOAD_FWD" to indicate device has already forwarded
pkt. The guest driver will use this indicator to avoid duplicate
forwarding in the guest OS.
Examples include bcast/mcast/unknown ucast pkts flooded to bridged ports.
We want to avoid both the device and the guest bridge driver flooding these
pkts, which would result in duplicates pkts on the wire. Packet sampling,
such as sFlow, can also use this technique to mark pkts for the guest OS to
record but otherwise drop.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Message-id: 1435746792-41278-5-git-send-email-sfeldma@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1435746792-41278-3-git-send-email-sfeldma@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Premature break in switch case block. This particular case (group L2 rewrite)
will be used for L2 LAG and L3 ECMP support, neither of which are enabled in
the guest driver at this time, but are under development.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1435746792-41278-2-git-send-email-sfeldma@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Commit 6e99c63 ("net/socket: Drop net_socket_can_send") changed the
semantics around .can_receive for sockets to now require the device to
flush queued pkts when transitioning to a .can_receive=true state. Rocker
device was not flushing the queue on .can_receive=true transition, so the
receiver was stuck.
But, turns out we really don't want any queuing at all on the port when the
port is disabled, otherwise when the port transitions to enabled, we'd
receive and forward stale pkts that really should have been dropped. So,
let's remove .can_receive so avoid queuing and drop the pkt in .receive if
the port is disabled.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1435717553-36187-1-git-send-email-sfeldma@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When running ESXi under qemu there is an issue with the ESXi guest
discarding packets that are too short. The guest discards any packets
under the normal minimum length for an ethernet packet (60). This
results in odd behaviour where other hosts or VMs on other hosts can
communicate with the ESXi guest just fine (since there's a physical NIC
somewhere doing padding), but VMs on the host and the host itself cannot
because the ARP request packets are too small for the ESXi host to
accept.
Someone in the past thought this was worth fixing, and added code to the
vmxnet3 qemu emulation such that if it is receiving packets smaller than
60 bytes to pad the packet out to 60. Unfortunately this code is wrong
(or at least in the wrong place). It does so BEFORE before taking into
account the vnet_hdr at the front of the packet added by the tap device.
As a result, it might add padding, but it never adds enough.
Specifically it adds 10 less (the length of the vnet_hdr) than it needs
to.
The following (hopefully "obviously correct") patch simply swaps the
order of processing the vnet header and the padding. With this patch an
ESXi guest is able to communicate with the host or other local VMs.
Signed-off-by: Brian Kress <kressb@moose.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
e1000_can_receive() checks the link up status register bit. If the bit
is clear, packets will be queued and the peer may disable receive to
avoid wasting CPU reading packets that cannot be delivered. The queue
must be flushed once the link comes back up again.
This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests
and tap networking. Flushing the queue invokes the async send callback,
which re-enables tap fd read.
Reported-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1435223885-12745-1-git-send-email-stefanha@redhat.com
This interface provides some registers within a 32-byte range and can be
acessed through PCI-to-LPC bridge interface (PMBASE + 0x60).
It's commonly used as a watchdog timer to detect system lockups through
SMIs that are generated -- if TCO_EN bit is set -- on every timeout. If
NO_REBOOT bit is not set in GCS (General Control and Status register),
the system will be resetted upon second timeout if TCO_RLD register
wasn't previously written to prevent timeout.
This patch adds support to TCO watchdog logic and few other features
like mapping NMIs to SMIs (NMI2SMI_EN bit), system intruder detection,
etc. are not implemented yet.
Signed-off-by: Paulo Alcantara <pcacjr@zytor.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
MIPS doesn't need it, and including it creates problem as we are adding
dependency on ISA LPC bridge.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To prepare for a generic internal cipher API, move the
built-in D3DES implementation into the crypto/ directory.
This is not in fact a normal D3DES implementation, it is
D3DES with double & triple length modes removed, and the
key bytes in reversed bit order. IOW it is crippled
specifically for the "benefit" of RFB, so call the new
files desrfb.c instead of d3des.c to make it clear that
it isn't a generally useful impl.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1435770638-25715-4-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To prepare for a generic internal cipher API, move the
built-in AES implementation into the crypto/ directory
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1435770638-25715-3-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a new crypto/ directory that will (eventually) contain
all the cryptographic related code. This initially defines a
wrapper for initializing gnutls and for computing hashes with
gnutls. The former ensures that gnutls is guaranteed to be
initialized exactly once in QEMU regardless of CLI args. The
block quorum code currently fails to initialize gnutls so it
only works by luck, if VNC server TLS is not requested. The
hash APIs avoids the need to litter the rest of the code with
preprocessor checks and simplifies callers by allocating the
correct amount of memory for the requested hash.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1435770638-25715-2-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The doc comments for bdrv_drain_all() and bdrv_drain() are outdated:
* The bdrv_drain() comment is a poor man's bdrv_lock()/bdrv_unlock()
which Fam Zheng is currently developing. Unfortunately this warning
was never really enough because devices keep submitting I/O and op
blockers don't prevent that.
* The bdrv_drain_all() comment is still partially correct but reflects
the nature of the implementation rather than API documentation.
Do make it clear that bdrv_drain() is only appropriate within an
AioContext. For anything spanning AioContexts you need
bdrv_drain_all().
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1435854281-6078-1-git-send-email-stefanha@redhat.com
The value of 'i' is guaranteed to be >= 0
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1435824371-2660-1-git-send-email-berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Switch over to virtio_instance_init_common. Drop duplicate properties
in virtio-gpu-pci and virtio-vga as they are properly aliased now. Also
drop the indirection via DEFINE_VIRTIO_GPU_PROPERTIES, we don't need it
any more as the properties are defined in a single place now.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Update the device link of the QemuConsole, so it points to the
virtio-gpu-pci or virtio-vga device instead of virtio-gpu-device.
This is needed because we want to find the device by id, for
example for input routing, and the id specified on the command
line is attached to the pci proxy, not the virtio device.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
On ARM, commit ac9d32e396 postponed the
memory preparation for boot until the machine init done notifier. This
has for consequence to insert ROM at machine init done time.
However the rom_load_all function stayed called before the ROM are
inserted. As a consequence the rom_load_all function does not do
everything it is expected to do, on ARM.
It currently registers the ROM reset notifier but does not iterate through
the registered ROM list. the isrom field is not set properly. This latter
is used to report info in the monitor and also to decide whether the
rom->data can be freed on ROM reset notifier.
To fix that regression the patch moves the rom_load_all call after
machine init done. We also take the opportunity to rename the rom_load_all
function into rom_check_and_resgister_reset() and integrate the
rom_load_done in it.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reported-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-Id: <1434470874-22573-1-git-send-email-eric.auger@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit f5a5628cf0.
This was an old patch that had been already superseded by b0e5d90eb
("dataplane: endianness-aware accesses").
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Accesses to vring_avail_event and vring_used_event must honor the queue
endianness.
This patch allows cross-endian setups to use dataplane (tested with ppc64
on ppc64le, and vice-versa).
Suggested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
it through the "smm" property of x86 machine types.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJVmq+hAAoJEL/70l94x66DKvMH/0bI2foGOmsJq6UtsRXzYXu8
rX2BEVBNHIS0acfzmBAkM+VYCEiZMHCqYubgSDvlhl2sqzSO6s+2EoWmXuS+Sln6
4lFW/YaKsbY9eN8UL/51zLI1SYj7SEUsqRS+r+1oLUxv5/v90K2xW2cvMnJFIWxk
NZruZLXhHv3U4VqIR63yV97NFyXvgvlVmpA4btLcJnqRey9QKSSvCAdUIOxPr5Iz
rHIWn4xepqzewk86wAoascDEFI504K8Vj0lZclkHXTl6QAmdCOzmjMLvpsJYUSmZ
RcOMMlZNWWmZyRk0sSX2k26hLi8rZnwaKdvqemNQFTxiM7ijBD2DgJ5YM+VDxR8=
=uKqs
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-smm' into staging
This series implements KVM support for SMM, and lets you enable/disable
it through the "smm" property of x86 machine types.
# gpg: Signature made Mon Jul 6 17:41:05 2015 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream-smm:
pc: add SMM property
ich9: add smm_enabled field and arguments
pc_piix: rename kvm_enabled to smm_enabled
target-i386: register a separate KVM address space including SMRAM regions
kvm-all: kvm_irqchip_create is not expected to fail
kvm-all: add support for multiple address spaces
kvm-all: make KVM's memory listener more generic
kvm-all: move internal types to kvm_int.h
kvm-all: remove useless typedef
kvm-all: put kvm_mem_flags to more work
target-i386: add support for SMBASE MSR and SMIs
piix4/ich9: do not raise SMI on ACPI enable/disable commands
linux-headers: Update to 4.2-rc1
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Fix pba_offset initialization value for Chelsio T5 Virtual Function
device. The T5 hardware has a bug in it where it reports a Pending Interrupt
Bit Array Offset of 0x8000 for its SR-IOV Virtual Functions instead
of the 0x1000 that the hardware actually uses internally. As the hardware
doesn't return the correct pba_offset value, add a quirk to instead
return a hardcoded value of 0x1000 when a Chelsio T5 VF device is
detected.
This bug has been fixed in the Chelsio's next chip series T6 but there are
no plans to respin the T5 ASIC for this bug. It is just documented in the
T5 Errata and left it at that.
Signed-off-by: Gabriel Laupre <glaupre@chelsio.com>
Reviewed-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
On systems with guest visible IOMMU, adding a new memory region onto
PCI bus calls vfio_listener_region_add() for every DMA window. This
installs a notifier for IOMMU memory regions. The notifier is supposed
to be removed vfio_listener_region_del(), however in the case of mixed
PHB (emulated + VFIO devices) when last VFIO device is unplugged and
container gets destroyed, all existing DMA windows stay alive altogether
with the notifiers which are on the linked list which head was in
the destroyed container.
This unregisters IOMMU memory region notifier when a container is
destroyed.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This patch aims at optimizing IRQ handling using irqfd framework.
Instead of handling the eventfds on user-side they are handled on
kernel side using
- the KVM irqfd framework,
- the VFIO driver virqfd framework.
the virtual IRQ completion is trapped at interrupt controller
This removes the need for fast/slow path swap.
Overall this brings significant performance improvements.
Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Commit f41389ae3c introduced kvm_resamplefds_enabled() and
associated kvm_resamplefds_allowed boolean. This patch adds
non-KVM version for kvm_resamplefds_enabled and also declares
kvm_resamplefds_allowed in kvm-stub as it is done for fellow
kvm_irqfds_allowed.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a new connect_irq_notifier notifier in the SysBusDeviceClass. This
notifier, if populated, is called after sysbus_connect_irq.
This mechanism is used to setup VFIO signaling once VFIO platform
devices get attached to their platform bus, on a machine init done
notifier.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The arm_gic_kvm now calls kvm_irqchip_set_qemuirq_gsi to build
the hash table storing qemu_irq/gsi mappings. From that point on
irqfd can be setup directly from the qemu_irq using
kvm_irqchip_add_irqfd_notifier.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
VFIO platform device needs to setup irqfd but it does not know the
gsi corresponding to the device qemu_irq. This patch proposes to
store a hash table in kvm_state using the qemu_irq as key and the gsi
as a value.
kvm_irqchip_set_qemuirq_gsi allows to insert such a pair. The interrupt
controller is supposed to use it.
kvm_irqchip_[add, remove]_irqfd_notifier allows to setup/tear down
irqfd directly from the qemu_irq.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Anticipating for the introduction of new add/remove functions taking
a qemu_irq parameter, let's rename existing ones with a gsi suffix.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Vikram Sethi <vikrams@codeaurora.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This is system level code, and should only depend on the host page
size, not the target page size.
Note that HOST_PAGE_SIZE is misleadingly lead and is really aligning
to both host and target page size. Hence it's replacement with
REAL_HOST_PAGE_SIZE.
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Currently the "host" page size alignment API is really aligning to both
host and target page sizes. There is the qemu_real_page_size which can
be used for the actual host page size but it's missing a mask and ALIGN
macro as provided for qemu_page_size. Complete the API. This allows
system level code that cares about the host page size to use a
consistent alignment interface without having to un-needingly align to
the target page size. This also reduces system level code dependency
on the cpu specific TARGET_PAGE_SIZE.
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
size_t is an unsigned type, thus the error case is never reached in
the below call to pread. If bytes is negative, it will be seen as
a very high positive value.
Spotted by Coverity.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The property can take values on, off or auto. The default is "off"
for KVM and pre-2.4 machines, otherwise "auto" (which makes it
available on TCG or on new-enough kernels).
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Q35's ACPI device is hard-coding SMM availability to KVM. Place the
logic where the board is created instead, so that it will be possible
to override it.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We will enable SMM even if KVM is in use. Rename the field and
arguments.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_CREATE_IRQCHIP should never fail, and so should its userspace
wrapper kvm_irqchip_create. The function does not do anything
if the irqchip capability is not available, as is the case for PPC.
With this patch, kvm_arch_init can allocate memory and it will not
be leaked.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make kvm_memory_listener_register public, and assign a kernel
address space id to each KVMMemoryListener.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No semantic change, but s->slots moves into a new struct
KVMMemoryListener. KVM's memory listener becomes a member of struct
KVMState, and becomes of type KVMMemoryListener.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
i386 code will have to define a different KVMMemoryListener. Create
an internal header so that KVMSlot is not exposed outside.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently kvm_mem_flags just translates bools to bits, let's
make it also determine the bools first. This avoids its parameter
list growing each time we add a flag.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Apart from the MSR, the smi field of struct kvm_vcpu_events has to be
translated into the corresponding CPUX86State fields. Also,
memory transaction flags depend on SMM state, so pull it from struct
kvm_run on every exit from KVM to userspace.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These commands are handled entirely by QEMU. Do not raise an SMI
when they happen, because Windows (at least 2008r2) expects these
commands to work and (depending on the value of APMC_EN at
startup) the firmware might not have installed an SMI handler.
When this happens (e.g. the kernel supports SMIs, or you are using
TCG, but you have used "-machine smm=off") RIP is moved to 0x38000
where there is no code to execute.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>