When a xenstore watch triggers, the event is processed on process_msg
and if a valid handle it's found the handler is queued for execution
on the pending xen watches queue (watch_events).
This may present a problem if we trigger a xenwatch several times and
then disconnect the device. If several xenwatch events are added to
the watch_events queue and the device is disconnected afterwards, the
first processed xenwatch event will disconnect the device, remove the
watch and free all resources. This triggers a panic if there are
pending xenwatch events for that device already queued in the local
queue of the function xenwatch_thread, since when the next watch that
has the same handler tries to execute we get a panic due to the fact
that the device is already disconnected and all resources had been
freed:
xenbus_watch: 0xffffa0000b7cd1d0
xbw_callback: 0xffffffff80755dd4
otherend_changed: backend/vif/1/0
backend/vif/1/0/state 6
backend/vif/1/0 -> Closed
backend/vif/1/0 -> backend_device, b_detach: 0xffffffff8075a2bf
xenbus_watch: 0xffffa0000b7cd1d0
xbw_callback: 0xfc5ec02183e547a8
fatal protection fault in supervisor mode
trap type 4 code 0 rip ffffffff80756596 cs e030 rflags 10246 cr2
7f7ff7b4c020 ilevel 0 rsp ffffa000e6d82c50
curlwp 0xffffa0000a72d580 pid 0 lid 36 lowest kstack
0xffffa000e6d7f000
kernel: protection fault trap, code=0
Stopped in pid 0.36 (system) at netbsd:xenwatch_thread+0xc7: call
*10(%rax
)
xenwatch_thread() at netbsd:xenwatch_thread+0xc7
ds f
es 5987
fs 2c40
gs 1460
rdi ffffa0000b7cd1d0
rsi ffffa0000a5477f0
rbp ffffa000e6d82c70
rbx ffffa0000b7c14c0
rdx 2
rcx f
rax ffffa0000b7cd1d0
r8 78
r9 ffffffef
r10 deadbeef
r11 1
r12 ffffa000e6d82c50
r13 ffffa0000a72d580
r14 ffffa0000a72d580
r15 0
rip ffffffff80756596 xenwatch_thread+0xc7
cs e030
rflags 10246
rsp ffffa000e6d82c50
ss e02b
netbsd:xenwatch_thread+0xc7: call *10(%rax)
on the order they are passed in through xenstore. While this works for
hand-crafted Xen configuration files, it does not work for XenServer, XCP or
EC2 instances. This means that adding an extra virtual disk can make the
domU unbootable.
ID is actually based on the Linux device major/minor so this approach isn't
entirely correct (for instance, you can specify devices to be non-contiguous
which doesn't fit too well with our autoconf approach), but it works as a
first approximation.
Tested by me on XenServer and riz@ on EC2. OK bouyer@
slightly modified by me to profit from runtime checks for dom0 privileges
instead of using compile time macros (DOM0OPS).
It should now be possible to use pkgsrc's sysutils/xentools inside
a domU to query XenStore entries (or even modify part of it if the domain
has enough rights).
Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.
Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands
The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.
XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.
XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.
Tested under i386 and amd64, bear in mind ring corruption though.
No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
- turns balloon into a driver that attaches to xenbus(4). This allows to
disable the functionality either at compile time or boot time via
userconf(4). Driver can implement detach or pmf(9) hooks if deemed
necessary.
- keeps Cherry's locking model, but simplify it a bit. There is now
only one target value serialized inside balloon, we do not feedback
alternative value to Xenstore (clients are not expected to see its value
evolve behind their back, and can't do much about that either)
- implements min threshold; this is an admin-settable value that tells
driver to "not balloon below this threshold." This can be used by domain
to keep memory reservations, useful if activity is expected in the near
future.
- in addition to min threshold, the driver implements internally a
safeguard value (uvmexp.freemin + 1MiB), so that admin cannot
inadvertently set min to a very low value forcing domain into heavy
memory pressure and swapping.
- create the sysctl(8) kern.xen.balloon tree. 4 nodes are actually present
(values are in KiB):
- min: (rw) an admin-settable value that prevents ballooning below this
mark
- max: (ro) the maximum size for reservation, as set by xm(1) mem-max.
- current: (ro) the current reservation for domain.
- target: (rw) the targetted reservation for domain.
- fix a few limitations here and there, most notably the max_reservation
hypercall, and KiB vs pages representations at interfaces.
The driver is still turned off by default. Enabling it would need more
approval, especially from bouyer@, cherry@ and cegger@.
FWIW: tested it two days long, from amd64 dom0 (with dom0 ballooning
enabled for xend), and bunch of domUs. Did not notice anything suspicious.
XXX it still has one big limitation: it cannot hotplug memory pages in
uvm(9) if they were not present beforehand. Example: ballooning above
physmem will give more pages to domain but it won't use it to serve
allocations, unless we teach uvm(9) how to handle the extra pages.
- Use free_otherend_details() instead of calling free() on xbusd_otherend.
- rename talk_to_otherend() to watch_otherend(). We register a watch for
changes in the otherend device "state"; we are not really talking to it.
- add missing prototypes.
attached earlier during boot, when initializing hypervisor.
This avoids the "unknown type console at xenbus0 id 0 not configured"
autoconf(9) messages, which are misleading during domU's boot.
See also http://mail-index.netbsd.org/port-xen/2009/01/05/msg004621.html
Ok by bouyer@ in private mail.
- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.
- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.
- make start_info aligned on a page boundary, as Xen expects it to be so.
- mask event channel during xbd detach before removing its handler (can
avoid spurious events).
- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).
- modifications to xs_init(), so that it can properly return an error.
Reviewed by Christoph (cegger@).
- fix and add comments
- make some panic/error messages more relevant
- remove last '\n' in DPRINTK() macros, not required as it is already part of format string.
No functional changes.
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.
TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.
NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
As we want some control on the name the backend driver will have we
can't use autoconf(9) here. Instead backend drivers registers to
xenbus, which will call a create callback when a new device is there.
Backend devices won't have a "struct device" in xenbus, use a void pointer
instead.