Fix problem where packets would be duplicated, possibly looping, when
a domU is doing IP routing.
Problem reported and fix tested by Mike M. Volokhov on port-xen
While there, add some __predict_false/true in conditionnals where
appropriate, remove a always-true test, and fix handling of
mbuf shortage.
- kernel (both dom0 and domU) boot, console is functionnal and it can starts
software from a ramdisk
- there is no driver front-end expect console for domU yet.
- dom0 can probe devices and ex(4) work when Xen3 is booted without acpi
and apic support. But the on-board IDE doens't get interrupts.
The PCI code still needs work (it's hardcoded to mode 1). Some of this
code should be shared with ../x86
The physical insterrupt code needs to get MPBIOS and ACPI support, and
do interrupt routing to properly interract with Xen.
To enable Xen-3.0 support, add
options XEN3
to your kernel config file (this will disable Xen2 support)
Changes affecting Xen-2.0 support (no functionnal changes intended):
- get more constants from genassym for assembly code
- remove some unneeded registers move from start()
- map the shared info page from start(), and remove the pte = 0xffffffff hack
- vector.S: in hypervisor_callback() make sure %esi points to
HYPERVISOR_shared_info before accessing the info page. Remplace some
hand-written assembly with the equivalent macro defined in frameasm.h
- more debug code, dissabled by default.
while here added my copyright on some files I worked on in 2005.
Always call HYPERVISOR_fpu_taskswitch() at the end of npxsave_lwp().
This fixes the FPU problems detected by paranoia on a NetBSD/Xen guest.
Based on patch sent by Paul Ripke on port-xen, but reworked by me.
Should fix port-xen/30977.
kernel by x86 platforms (instead of a simple char *). This way, the code
in, e.g., lookup_bootinfo, is a bit easier to understand.
While here, move the lookup_bootinfo function used in x86 platforms (amd64,
i386 and xen) to a common file (x86/x86_machdep.c), as it was exactly the
same in all of them.
- add a buch of PCI storage devices
- add firewire devices
- add some missing PCI network devices
- add serial and parallel PCI adapters
- add lpt0 at isa
- add com1 at isa
com0 not added for the benefit of serial console users (it will conflicts
with the Xen kernel).
XXX this means that setups with serial console on com1 will now break with the
default kernel.
Use userconf(4) (add -c to kernel command line) or change your setup to
com0 instead (most bios allows arbitrary mappings of com ports)
- move copyin and friends from locore.S to their own file, copy.S.
share it between i386 and xen.
- defparam KERNBASE and kill KERNBASE_LOCORE hack.
- add more symbols to assym.h and use it where appropriate.
- rather than embedding bufq_state in driver softc,
have a pointer to the former.
- move bufq related functions from kern/subr_disk.c to kern/subr_bufq.c.
- rename method to strategy for consistency.
- move some definitions which don't need to be exposed to the rest of kernel
from sys/bufq.h to sys/bufq_impl.h.
(is it better to move it to kern/ or somewhere?)
- fix some obvious breakage in dev/qbus/ts.c. (not tested)
xennetback instances. To support this, more fields from xni_page to xni_pkt.
This would also have the effect to move more code into the
while (!SLIST_EMPTY(&pkt_page->xni_pkt_head)) loop in xennetback_tx_free().
By passing xni_pkt instead of xni_page to xennetback_tx_free we can
avoid the loop, and the xni_pkt_head list completely. There is even a sligh
performance increase if the domU deals with xni_txring->event properly.
Based on comments from YAMAMOTO Takashi.
In theory mbufs can have an infinite life time and could block the transmit
ring (as slots are released when the mbuf external storage is freed). To
avoid this, when we're processing the last slot of the ring copy the buffer
and release the slot immediatly.
receive system:
- on the receive side, attach the mapped buffer as external storage
instead of copying it. As a mapped buffer may not live much longer, we
have to deal with the fact that one page of buffer may containt several
packets, and it's not possible to map them several times. Use a hashed list
to keep track of mapped pages, and use reference counters.
- on the transmit side, when MCLBYTES == PAGE_SIZE, give away the mbuf
cluster page when possible instead of copying it. Keep a pool of physical
pages to map in place of the page we give away. When copying, use a
pool_cache(9) to manage copy buffers (use mclpool_cache when
MCLBYTES == PAGE_SIZE, otherwise use a private pool/pool_cache) instead
of a local list. This should reduce the number of hypercalls and MMU
operations in the copy case as well.
send queue. This give upper layer an opportunity to queue up all available
packets before starting to process them. This reduce the number of interrupt
generated on the backend, and the time spent doing hypercalls in a significant
way.
the ressources (this can lead to a dom0 kernel crash when destroying a domain
with active I/O going). For this, add a refcount to struct xbdback_instance,
and defer CMSG_BLKIF_BE_DISCONNECT reply until refcount is back to 0 (which
means all queued buffers have completed).
Based on patch sent by Jed Davis on port-xen.
instead to always trying PG_RW and falling back to PG_RO if this fails.
Use uvm_map_checkprot() in IOCTL_PRIVCMD_MMAP and IOCTL_PRIVCMD_MMAP_BATCH
to compute the appropriate vm_prot_t for pmap_remap_pages().
Thanks to Jed Davis for pointing out uvm_map_checkprot().
In pmap_remap_pages() new mappings are created (PG_RW|PG_M). When saving
a domain, the hypervisor will refuse to map the foreing pages RW.
As a temporary measure, retry the mapping read-only if PG_RW fails, so that
domain save will work. Also fix the PTP's wire_count if the MMU update
fails (prevent a kernel panic).
which attach to hypervisor. This allows to use config_found_ia() instead of
config_found(), instead of relying on the order of which device are
written in ioconf.c.
From Quentin Garnier.
- Define _BUS_AVAIL_END to 0xffffffff, as we don't have an easy way to
find the upper bound for our machine address space (and this can change
when we swap pages with the hypervisor).
- implement _xen_bus_dmamem_alloc_range(), which will request a contigous
set of pages to the hypervisor if the pages returned by uvm_pglistalloc()
don't fit the constraints.
We can't deal with the low/high constraints yet, because Xen doesn't offer a
way to get pages in a specific ranges of addresses.
Based on patches from Dave Thompson (in private mail), with heavy hacking
by me.
the generic pci_enumerate_bus() in Xen because the hypervisor may
hide the function 0, but give access to other functions of the same device
and pci_enumerate_bus() will stop if function 0 isn't available.
Ceri Storey to port-xen, with some additionnal changes by me:
- include bus_dma.c, bus_space.c and pci_machdep.c if pci is defined
instead of dom0ops
- Make various initialisations, and probe/attach pci busses based on NPCI
instead of DOM0OPS
- in conf/files.xen, move xen-specific devices before non-xen specific devices
so that the xen-specific match function is called first, to avoid false
attachement from too liberal match function in non-xen code.
#define clockframe somethingelse
to:
struct clockframe {
struct somethingelse cf_se;
};
and change access macros accordingly.
That means that, at least for that very issue, things will not go
ka-boomy if you don't have the actual definition of struct clockframe
before including systm.h.
disks to other domains) from Jed Davis, <jld@panix.com>:
* Issue multiple requests when necessary rather than
assuming that arbitrary requests can be mapped into single
contiguous virtual address ranges.
* Don't assume that all data for a request is consecutive
in memory. With some client OSes, it's not.
The above two changes fix data corruption issues with Linux
clients with certain filesystem block sizes.
* Gracefully handle memory or pool allocation failures after
beginning to handle a request from the ring.
* Merge contiguous requests to avoid the "64K turns into 44K + 20K
and doubles the transactions per second at the disk" problem
caused by the 11-page limit caused by the structure of Xen
ring entries. This causes a very slight performance decrease
for sequential 64K I/O if the disk is not already saturated with
requests (about 1%) but halves the transactions per second we
hit the disk with -- or better. It even compensates for bizarre
Linux behaviour like breaking long requests up into 5.5K pieces.
* Probably some stuff I forgot to mention.
Disk throughput (though not latency) is now much, much closer to the
"raw hardware" case than it was before.
for xen_microtime(). This match more closely what is done on a real i386
(where we read the RTC), and seems to fix gettimeofday() sometime going
backward by several seconds for me.
initialise the microtime state again. it was initialised at the first
clock tick before time of day was set. Should fix port-xen/29846.
Also, try harder to call microset() once a second; the clock interrupt handler
can be called more often than HZ in some case.