disks to other domains) from Jed Davis, <jld@panix.com>:
* Issue multiple requests when necessary rather than
assuming that arbitrary requests can be mapped into single
contiguous virtual address ranges.
* Don't assume that all data for a request is consecutive
in memory. With some client OSes, it's not.
The above two changes fix data corruption issues with Linux
clients with certain filesystem block sizes.
* Gracefully handle memory or pool allocation failures after
beginning to handle a request from the ring.
* Merge contiguous requests to avoid the "64K turns into 44K + 20K
and doubles the transactions per second at the disk" problem
caused by the 11-page limit caused by the structure of Xen
ring entries. This causes a very slight performance decrease
for sequential 64K I/O if the disk is not already saturated with
requests (about 1%) but halves the transactions per second we
hit the disk with -- or better. It even compensates for bizarre
Linux behaviour like breaking long requests up into 5.5K pieces.
* Probably some stuff I forgot to mention.
Disk throughput (though not latency) is now much, much closer to the
"raw hardware" case than it was before.
for xen_microtime(). This match more closely what is done on a real i386
(where we read the RTC), and seems to fix gettimeofday() sometime going
backward by several seconds for me.
initialise the microtime state again. it was initialised at the first
clock tick before time of day was set. Should fix port-xen/29846.
Also, try harder to call microset() once a second; the clock interrupt handler
can be called more often than HZ in some case.
them from interrupt context.
xpq_flush_queue() is called from IPL_NET in if_xennet.c, and
other xpq_queue* functions may be called from interrupt context via
pmap_kenter*(). Should fix port-xen/30153.
Thanks to Jason Thorpe and YAMAMOTO Takashi for enlightments on this issue.
event not to be sent when one is needed. Fixing this would require
one hypercall per packet, instead of one per NB_XMIT_PAGES_BATCH pages.
It's not worth it, so always send an event at the end of xennetback_ifstart()
- there is no callback mechanism to notify us when a guest has handled
packets we sent. If we stop transmitting because the ring is full or we're
out of pages when the ifq is also full, nothing will call
xennetback_ifstart() again and transmit is stalled. Abuse the watchdog
to kick the transmit queue once second after an out of ressources
condition.
where the printing of `version' is already performed.
This has the benefit of allowing the copyright to be available
via dmesg(8) on platforms which need the `msgbuf' to be setup
in cpu_startup() before printed output is remembered.
of relying on the fact that the domain controller is still getting interrupts
when xenconscn_getc()/xenconscn_putc() is called.
Now that polling is fixed, move IPL_CTRL down between IPL_SOFTSERIAL and
IPL_TTY, fixing port-xen/29999 by YAMAMOTO Takashi.
Now that IPL_CTRL is low enouth, remove the softintr stuff from the
domain controller, and call wakeup() directly.
Thanks to YAMAMOTO Takashi and Christian Limpach for feedback and suggestions.
- sort the ih_evt_handler list by IPL, higher first. Otherwise some handlers
would have been delayed, event if they could run at the current IPL.
- As ih_evt_handler is sorted, remove IPLs that have been processed for
an event when calling hypervisor_set_ipending()
- In hypervisor_set_ipending(), enter the event in ipl_evt_mask only
for the lowest IPL. As deffered IPLs are processed high to low,
this ensure that hypervisor_enable_event() will be called only when all
callbacks have been called for an event. We don't need the evtch_maskcount[]
counters any more.
Thanks to YAMAMOTO Takashi for ideas and feedback.
cause an event to be both handled and marked as pending, or being
marked as pending twice (triggering the diagnostic check
evtch_maskcount[port] == 0 in hypervisor_set_ipending):
mask and clear event by word of 32bit in do_hypervisor_event() or stipending(),
instead of by indiviual bits in do_event() or xenevt_event().
In addition this is marginally more efficient.
(which may sleep). Fix port-xen/29851 by YAMAMOTO Takashi.
Use config_pending_incr()/config_pending_decr() in if_xennet instead of
busy-looping (which doesn't work any more).
Remove the kernel thread from xbd, which isn't needed any more.
for physical IRQ, or device name for xen device drivers). This makes
systat and vmstat output more usable, especially as the channel numbers
change each time a guest reboots.
including soft interrupt, and this is way too low in some use (lots of domains,
or domains with lots of xennet, or even hardware with lots of devices at
different interrupts).
Based on idea from YAMAMOTO Takashi, keep one list of handler per-event and
one per-IPL (so the same handler is now in 2 lists). In the common case were
an event is received at low IPL, we can call the handlers quickly (there
is usually only one handler per event, unless the event is mapped to a
physical interrupt and this interrupt is shared by different devices).
Deffered events and software interrupts are handled by a bitmask (as before)
with one bit per IPL. When one IPL has an event pending all handlers for
this IPL will be called.
With this change, it is now possible to have all the 1024 events active.
While here, handle debug event in a special way: the handler is always called,
regardless of the current IPL. Make the handler print usefull informations
about events and IPL states.
Also remove code not used on Xen in files inherited from the x86 port.
- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
We can't write directly to a gdt slot, we need to go though
xen_update_descriptor().
From YAMAMOTO Takashi: this is not a problem in HEAD, because
as the kva for gdt is pageable, when gdt_put_slot1 attempts to modify gdt,
the fault handler allocates and maps a new page for you.
So the kernel doesn't panic, but could leak some memory.
the interrupt latch (which is probably done by the hypervisor, linux/xen
doesn't do it either). Now the "fputest" configure test from pkgsrc/math/yorick
works as expected.
Thanks to Christian Limpach for the hint.
if the corresponding bit needs to be set in evtchn_pending_sel, and eventually
force an upcall (if we got a second event when this one what being handled).
For now to this by calling hypervisor_enable_irq(), this could be rewritten
in inline assembly by someone knowing enouth about i386 assembly :)
Check the passed in address as well as determining the maximum length
using VM_MAXUSER_ADDRESS in copyinstr and copyoutstr.
Problem originally fixed in OpenBSD/i386.
This fix suggested by Charles Hannum (mycroft at netbsd dot org).
in connected state. Avoid a panic when the interface is configured
before being in connected state (e.g. when configured automatically by xend
when a domain is created).
Remove the padding bits from blkif_extent_t, so that the message size doesn't
change. You'll need xentools20-2.0.3nb1 if you upgrade your kernel
(the old tools didn't zero out the padding bits, and a new kernel will
interpret them as part of the device number).
previously-allocated pages may not have been loaned to the remote domain
yet. Let the called deal with this.
- clean up the TX and RX loops to use less MASK_NETIF_xX_IDX() calls
- When we're done with the queued TX requests, try again in case some new
ones are available
- avoid leaking a xmit page if we queued NB_XMIT_PAGES_BATCH requests.
- remove dead code.
Protect the callback queue with splvm()
XXX some debug printf about the callback stuff is left here. This is because
I've not been able to trigger this condition yet, so I've left them
until we sure the code works as intended.
- Ffs internal snapshots get compiled in unconditionally.
- File system snapshot device fss(4) added to all kernel configs that
have a disk. Device is commented out on all non-GENERIC kernels.
Reviewed by: Jason Thorpe <thorpej@netbsd.org>
All those kernels have a line for both tun and bridge, and if either is
commented out, tap is commented out also. With the exception of i386's
GENERIC_TINY.
XXX: we _need_ some way of making this more simple.
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
around a cngetc() that will never return while "halted", which is rude,
and which also requires domain 0 to not just restart us, but kill us
first. Suggestion from Michael Kukat (though this change is not the
same as the one he suggested).
Will cngetc() actually return something when running in domain 0 with a
VGA console? I don't think it will; if it actually does, we should make
this behaviour depend on whether we're in domain 0 or some other domain.
does different things depending what's in %ebx.
We weren't setting %ebx here *at all*; so we did not get SCHEDOP_yield,
which was what was wanted; so unpredictable things happened, notably
immediate return to the NetBSD idle loop, in other words spinning on CPU.
Definitely not cool!
Michael Kukat caught this one and suggested this fix on port-xen.
clockframe -> intrframe, but that is included too late, because this
file includes systm.h and it is in the path of including systm.h. Fix
it by not including <systm.h>; it was only needed for the panic() calls
which I have disabled, since they look more like debugging calls to me.
Also add forward struct declaration for trapframe.
because the x86/intr.h needs them and does not include xenfunc.h. Including
xenfunc.h in cpufunc.h is a clear lose because xenfunc.h needs a boatload
of include functions in order to compile.
it has to load hypervisor.h which has all sorts of other problems assuming
_KERNEL). Provide the defs before including hypervisor-if.h and then turn
them off afterwards (ala the same as hypervisor.h does
actually adjusting the time correctly (calling hardclock as needed, not
just blindly every time Xen schedules us) based on Xen's idea of the
time in the shared page.
Xen source repo change info:
ChangeSet
2004/09/22 13:47:22+01:00 cl349@freefall.cl.cam.ac.uk
Fix time.
netbsd-2.0-xen-sparse/sys/arch/xen/xen/clock.c
2004/09/22 13:47:21+01:00 cl349@freefall.cl.cam.ac.uk +28 -3
Don't call hardclock on spurious timer interrupt and call hardclock
for missed interrupts.
netbsd-2.0-xen-sparse/sys/arch/xen/conf/XEN
2004/09/22 13:47:21+01:00 cl349@freefall.cl.cam.ac.uk +0 -1
Don't need custom HZ value any longer.
: ----------------------------------------------------------------------
which bustype should be attached with a specific call to config_found()
(from a "mainbus" or a bus bridge).
Do it for isa/eisa/mca and pci/agp for now. These buses all attach to
an mi interface attribute "isabus", "eisabus" etc., and the autoconf
framework now allows to specify an interface attribute on config_found()
and config_search(), which limits the search of matching config data
to these which attach to that specific attribute.
So we basically have to call config_found_ia(..., "foobus", ...) where
such a bus is attached.
As a consequence, where a "mainbus" or alike also attaches other
devices (eg CPUs) which do not attach to a specific attribute yet,
we need at least pass an attribute name (different from "foobus") so
that the foo bus is not found at these places. This made some minor
changes necessary which are not obviously related to the mentioned buses.
This has a couple of beneficial effects:
1) The TOD time will be preserved across boots, as one would expect.
2) Newly-started VMs will get the correct time according to domain 0.
Previously, since we never set the time back to Xen, each VM would
get the uncorrected system clock time, never even seeing any
changes made since the current startup.
I don't understand how XenoLinux slaves the other domain's clocks to
the Xen "wall clock" time that we're setting here, so I haven't even
tried. But now we can run ntpd in each domain without a huge offset
at boot caused by never updating the wall clock time at all...
by 0x100000 (above the I/O Memory "hole") leaving all physical addresses
below unused, don't perform phys<->mach mapping for addresses below 0x100000
or beyond the real hardware's physical memory.
-> /dev/mem works now as expected and X works in domain0.