Traditional PCI config space access is achieved by writing a 32 bit
value to io port 0xcf8 to identify the bus, device, function and config
register. Port 0xcfc then contains the register in question. But if you
write the appropriate pair of magic values to 0xcf9, the machine will
reboot. Spectacular! And not standardised in any way (certainly not part
of the PCI spec), so different chipsets may have different requirements.
Booo.
In the PIIX3 spec, IO port 0xcf9 is specified as the Reset Control
Register. Bit 1 (System Reset, SRST) would normally differentiate between
soft reset and hard reset, but we ignore the difference beyond allowing
the guest to read it back.
RHBZ reference: 890459
This patch introduces the following overlap between the preexistent
"pci-conf-idx" region and the "piix3-reset-control" region just being
added. Partial output from "info mtree":
I/O
0000000000000000-000000000000ffff (prio 0, RW): io
0000000000000cf8-0000000000000cfb (prio 0, RW): pci-conf-idx
0000000000000cf9-0000000000000cf9 (prio 1, RW): piix3-reset-control
I sanity-checked the patch by booting a RHEL-6.3 guest and found no
problems. I summoned gdb and set a breakpoint on rcr_write() in order to
gather a bit more confidence. Relevant frames of the stack:
kvm_handle_io (port=3321, data=0x7f3f5f3de000, direction=1, size=1,
count=1) [kvm-all.c:1422]
cpu_outb (addr=3321, val=6 '\006') [ioport.c:289]
ioport_write (index=0, address=3321, data=6) [ioport.c:83]
ioport_writeb_thunk (opaque=0x7f3f622c4680, addr=3321, data=6)
[ioport.c:212]
memory_region_iorange_write (iorange=0x7f3f622c4680, offset=0,
width=1, data=6) [memory.c:439]
access_with_adjusted_size (addr=0, value=0x7f3f531fbac0,
size=1, access_size_min=1,
access_size_max=4,
access=0x7f3f5f6e0f90
<memory_region_write_accessor>,
opaque=0x7f3f6227b668)
[memory.c:364]
memory_region_write_accessor (opaque=0x7f3f6227b668, addr=0,
value=0x7f3f531fbac0, size=1,
shift=0, mask=255)
[memory.c:334]
rcr_write (opaque=0x7f3f6227afb0, addr=0, val=6, len=1)
[hw/piix_pci.c:498]
The dispatch happens in ioport_write(); "index=0" means byte-wide access:
static void ioport_write(int index, uint32_t address, uint32_t data)
{
static IOPortWriteFunc * const default_func[3] = {
default_ioport_writeb,
default_ioport_writew,
default_ioport_writel
};
IOPortWriteFunc *func = ioport_write_table[index][address];
if (!func)
func = default_func[index];
func(ioport_opaque[address], address, data);
}
The "ioport_write_table" and "ioport_opaque" arrays describe the flattened
IO port space. The first array is less interesting (it selects a thunk
function). The "ioport_opaque" array is interesting because it decides how
writing to the port is implemented ultimately.
4-byte wide access to 0xcf8 (pci-conf-idx):
(gdb) print ioport_write_table[2][0xcf8]
$1 = (IOPortWriteFunc *) 0x7f3f5f6d99ba <ioport_writel_thunk>
(gdb) print \
((struct MemoryRegionIORange*)ioport_opaque[0xcf8])->mr->ops.write
$2 = (void (*)(void *, hwaddr, uint64_t, unsigned int))
0x7f3f5f5575cb <pci_host_config_write>
1-byte wide access to 0xcf9 (piix3-reset-control):
(gdb) print ioport_write_table[0][0xcf9]
$3 = (IOPortWriteFunc *) 0x7f3f5f6d98d0 <ioport_writeb_thunk>
(gdb) print \
((struct MemoryRegionIORange*)ioport_opaque[0xcf9])->mr->ops.write
$4 = (void (*)(void *, hwaddr, uint64_t, unsigned int))
0x7f3f5f6b42f1 <rcr_write>
The higher priority of "piix3-reset-control" ensures that the 0xcf9
entries in ioport_write_table / ioport_opaque will always belong to it,
independently of its relative registration order versus "pci-conf-idx".
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Factor out smram/pam logic for use by other chipsets, namely q35
at this point.
Note: Should be factored out into a generic North Bridge Class.
[jbaron@redhat.com: changes for updated memory API]
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are
reserved) and its purpose doesn't match the name (most target_phys_addr_t
addresses are not target specific). Replace it with a finger-friendly,
standards conformant hwaddr.
Outstanding patchsets can be fixed up with the command
git rebase -i --exec 'find -name "*.[ch]"
| xargs s/target_phys_addr_t/hwaddr/g' origin
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Calling memory_region_destroy() within a transaction is illegal, since
the memory API is allowed to continue to dispatch to a region until the
transaction commits. 440fx does that however when managing PAM registers.
This bug is benign, since the regions are all aliases (which the memory
core tends to throw anyway), and since we don't do concurrent dispatch yet,
but instead of relying on that, tighten ship ahead of the coming concurrency
storm.
Fix by having a predefined set of regions, of which one will be enabled at
any time.
Signed-off-by: Avi Kivity <avi@redhat.com>
Adopt the QOM parent field name and enforce QOM-style access via casts.
Don't just typedef PCIHostState, either use it directly or embed it.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Use PCIHostState and PCI_HOST_BRIDGE() where appropriate.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
During the QOM migration they were amended with further info but this is
no longer the case. All static TypeInfos can be const these days.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This per-device notifier shall be triggered by any interrupt router
along the path of a device's legacy interrupt signal on routing changes.
For simplicity reasons and as this is a slow path anyway, no further
details on the routing changes are provided. Instead, the callback is
expected to use pci_device_route_intx_to_irq to check the effect of the
change.
Will be used by KVM PCI device assignment and VFIO.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Device assigned on KVM needs to know the mode
(enabled/inverted/disabled) and the IRQ number that a given device
triggers in the attached interrupt controller.
Add a PCI IRQ path discovery function that walks from a given device to
the host bridge, and gets this information. For
this purpose, a host bridge callback function is introduced:
route_intx_to_irq. It is so far only implemented by the PIIX3, other
host bridges can be added later on as required.
Will be used for KVM PCI device assignment and VFIO.
Based on patch by Jan Kiszka, with minor tweaks.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There is a typo in i440FX init code. This is causing problems when
somebody wants to access the 64bit PCI range.
Signed-off-by: Alexey Korolev <alexey.korolev@endace.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We want the composition tree to to be in order by the time we call
qdev_init, so that a single set of the toplevel realize property can
propagate all the way down the composition tree.
This is not the case so far. Unfortunately, this is incompatible
with calling qdev_init in the constructor wrappers for devices,
so for now we need to unattach some devices that are created through
those wrappers. This will be fixed by removing qdev_init and instead
setting the toplevel realize property after machine init.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Replace device_init() with generalized type_init().
While at it, unify naming convention: type_init([$prefix_]register_types)
Also, type_init() is a function, so add preceding blank line where
necessary and don't put a semicolon after the closing brace.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: malc <av1474@comtv.ru>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This is mostly code movement although not entirely. This makes properties part
of the Object base class which means that we can now start using Object in a
meaningful way outside of qdev.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This was done in a mostly automated fashion. I did it in three steps and then
rebased it into a single step which avoids repeatedly touching every file in
the tree.
The first step was a sed-based addition of the parent type to the subclass
registration functions.
The second step was another sed-based removal of subclass registration functions
while also adding virtual functions from the base class into a class_init
function as appropriate.
Finally, a python script was used to convert the DeviceInfo structures and
qdev_register_subclass functions to TypeInfo structures, class_init functions,
and type_register_static calls.
We are almost fully converted to QOM after this commit.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This converts three devices because apic and ioapic are subclasses of sysbus.
Converting subclasses independently of their base class is prohibitively hard.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
These are various small stylistic changes which help make things more
consistent such that the automated conversion script can be simpler.
It's not necessary to agree or disagree with these style changes because all
of this code is going to be rewritten by the patch monkey script anyway.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This recently added line in hw/pc_piix.c is causing a SEGV on a Xen
setup because the piix3 property is never created:
qdev_property_add_child(qdev_resolve_path("/i440fx/piix3", NULL),
"rtc", (DeviceState *)rtc_state, NULL);
Signed-off-by: Julian Pidancet <julian.pidancet@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Not used yet, but at least we're provided with the correct region.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
The code will remap all PAMs, even if just one is updated, resulting
in reduced performance. Wrap in a transaction to detect that those
other PAMs have not changed.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
This reverts commit 8ef9ea85a2, reversing
changes made to 444dc48298.
From Avi:
Please revert the entire pull (git revert 8ef9ea85a2) while I work this
out - it isn't trivial.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The code will remap all PAMs, even if just one is updated, resulting
in reduced performance. Wrap in a transaction to detect that those
other PAMs have not changed.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
The current implementation of PAM and the PCI holes is broken in several
ways:
- PCI BARs are not restricted to the PCI hole (a BAR may hide memory)
- PCI devices do not respect PAM (if a PCI device maps a region while
PAM maps the region to RAM, the request will be honored)
This patch fixes things by introducing a pci address space, and using
memory region aliases to represent PAM regions, SMRAM, and PCI holes.
The memory hierarchy looks something like
system_memory
|
+--- low memory alias (0-0xe0000000)
| |
| +-- ram@0
|
+--- high memory alias (0x100000000-EOM)
| |
| +-- ram@0xe0000000
|
+--- pci hole alias (end of low memory-0x100000000)
| |
| +-- pci@end-of-low-memory
|
|
+--- pam[n] (0xc0000-0xc3fff etc) (when set to pci, priority 1)
| |
| +-- pci@0xc4000 etc
|
+--- smram (0xa0000-0xbffff) (when set to pci/vga, priority 1)
|
+-- pci@0xa0000 etc
ram (simple ram region)
pci
|
+--- BARn
|
+--- VGA 0xa0000-0xbffff
|
+--- ROMs
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This lets us register BARs in the I/O address space.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This is now done sloppily, via get_system_memory(). Eventually callers
will be converted to stop using that.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Compared to the last version I only added a comment to the code.
- remove i440FX-xen and i440fx_write_config_xen
we don't need to intercept pci config writes to i440FX anymore;
- introduce PIIX3-xen and piix3_write_config_xen
we do need to intercept pci config write to the PCI-ISA bridge to update
the PCI link routing;
- set the number of PIIX3-xen interrupts line to 128;
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
If pic_irq is greater than 7, the irq level is always 0 on 32bits.
Signed-off-by: TeLeMan <geleman@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
This patch introduces Xen specific call in piix_pci.
The specific part for Xen is in write_config, set_irq and get_pirq.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
The previous patch didn't change the behavior when load,
it resulted in ugly code. This patch cleans it up.
With this patch, pic irq lines are manipulated when loaded.
It is expected that it won't change the behaviour because
the interrupts are level: at the moment e.g. pci devices already
reassert interrupts on load.
Test:
- rung linux as guest and use flooding ping (ping -f) to host
in order to trigger interrupts for e1000 emulated.
- savevm/loadvm and see guest kept running after loadvm.
To be honest, I'm not sure that ping -f caused enough interrupts
because Linux e1000 driver supports NAPI.
TODO: test more OSes, stress test with save/load, live-migration
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
optimize irq routing in piix_pic.c which has been a TODO.
So far piix3 tracks each pirq level and checks whether a given pic pins is
asserted by seeing if each pirq is mapped into the pic pin.
This is independent on irq routing, but data path is on slow path.
Given that irq routing is rarely changed and asserting pic pins is on
data path, the path that asserts pic pins should be optimized and
chainging irq routing should be on slow path.
The new behavior with this patch series is to use bitmap which is addressed
by pirq and pic pins with a given irq routing.
When pirq is asserted, the bitmap is set and see if the pic pins is
asserted by checking the bitmaps.
When irq routing is changed, rebuild the bitmap and re-assert pic pins.
test:
- create VM with 4 e1000 nics in different pci slots
(i.e. fn=0 for each e1000)
Thus those e1000's INTA are connected to each PIRQ[A-D].
- run linux as guest and saw each devices triggers interrupt
by seeing /proc/interrupts. And then confirmed that each PIRQ[A-D]
surely asserted interrupts.
Because irq 10 and 11 are shared by 4 e1000's, it only one NIC is activated
with ifconfig ethN up/down when counting interrupts.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PIIX3State::pci_irq_levels are redundant which is already tracked by
PCIBus layer. So eliminate them.
Cc: Juan Quintela <quintela@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch tags all pci devices which belong to the piix3/4 chipsets as
not hotpluggable (Host bridge, ISA bridge, IDE controller, ACPI bridge).
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add "fw_name" to DeviceInfo to use in device path building. In
contrast to "name" "fw_name" should refer to functionality device
provides instead of particular device model like "name" does.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Extract range functions from pci.h. These will be used by later patches
by non-PCI devices. Adjust current users.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Set PCI multi-function bit according to multifunction property.
PCI address, devfn ,is exported to users as addr property,
so users can populate pci function(PCIDevice in qemu)
at arbitrary devfn.
It means each function(PCIDevice) don't know whether pci device
(PCIDevice[8]) is multi function or not.
So this patch allows user to set multifunction bit via property
and checks whether multifunction bit is set correctly.
Cc: Juan Quintela <quintela@redhat.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
use pci_create_simple_multifunction() for normal device which sets
multifunction bit.
At the moment, only pc_piix.c and mips_malta.c uses multifunction
devices with piix3/4 pci-isa bridge.
And other boards don't populate those devices.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>