- introduce a function to reserve an idt entry and use it instead of
manipulating idt_allocmap directly.
- rename idt to xen_idt for amd64 xen. add missing #ifdef XEN.
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.
- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
the process page table, properly uvm_unmap1() the VA range and enter a new
uvm_map backed by an object. There is a know race between uvm_unmap1() and
uvm_map(), where another thread could get our VA range; in practice
I think none of the softwares using this interface are multithreaded (or at
last they are single-threaded when using it).
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
argument. Use this and replace the inline assembly (mul + div using the
64bit intermediate result) with normal 32bit multiplication and
division. The compiler can turn the division into a multiplication and
shift, making it even cheaper then the original assembly. For extreme
long delays, just use 64bit arithmetic.
- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.
This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.
TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.
NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
calling mutex_enter(). So call initgdt() ASAP, and consinit() right after
initgdt().
Fix Dom0 crash reported by Mikolaj Golub and others when using the VGA
console.
This removes a hard limit that would prevent a guest from running with more
than 7 virtual network interface.
If running on a hypervisor that doesn't support GNTTABOP_query_size, fall back
to a 4-pages grant table, so this change is backward-compatible.
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
from the command line (nfsroot=, ip=) in the same way domUs did.
While there convert a for (;;) loop to TAILQ_FOREACH().
From Christoph Egger in private mail.
Make all bootstatic callbacks return the new NFS_BOOTSTATIC_NOSTATIC flag
when no nfs boot information is found on command line.
link context instead of NULL. Otherwise, if we got a signal while the
lwp had a link context set, the link context would be set to NULL upon
return from signal delivery.
christos@tech-kern: "I think you are right."
reorganise the IPL vectors a bit so that they can be used from more
places than splllower and doreti.
Include patch from Kazushi Marukawa (fixed to handle pending interrupts),
which should fix the read_psl() == 0 assertion failure reported by
several users.
if we enter doreti_checkast with only soft interrupt pendings, we would
jump to one of the soft* vector with %esi uninitialized, and the vector
would do a jump *%esi at the end ...
I don't know why nobody ever went into this; I guess in the common case
there's no soft irq pending without a hard irq in this code path ...
config_handle_wedges() and read_disk_sectors(). On x86, handle_wedges()
is a thin wrapper for config_handle_wedges(). Share opendisk()
across architectures.
Add kernel code in support of specifying a root partition by wedge
name. E.g., root specifications "wedge:wd0a", "wedge:David's Root
Volume" are possible. (Patches for config(1) coming soon.)
In support of moving disks between architectures (esp. i386 <->
evbmips), I've written a routine convertdisklabel() that ensures
that the raw partition is at RAW_DISK by following these steps:
0 If we have read a disklabel that has a RAW_PART with
p_offset == 0 and p_size != 0, then use that raw partition.
1 If we have read a disklabel that has both partitions 'c'
and 'd', and RAW_PART has p_offset != 0 or p_size == 0,
but the other partition is suitable for a raw partition
(p_offset == 0, p_size != 0), then swap the two partitions
and use the new raw partition.
2 If the architecture's raw partition is 'd', and if there
is no partition 'd', but there is a partition 'c' that
is suitable for a raw partition, then copy partition 'c'
to partition 'd'.
3 Determine the drive's last sector, using either the
d_secperunit the drive reported, or by guessing (0x1fffffff).
If we cannot read the drive's last sector, then fail.
4 If we have read a disklabel that has no partition slot
RAW_PART, then create a partition RAW_PART. Make it span
the whole drive.
5 If there are fewer than MAXPARTITIONS partitions,
then "slide" the unsuitable raw partition RAW_PART, and
subsequent partitions, into partition slots RAW_PART+1
and subsequent slots. Create a raw partition at RAW_PART.
Make it span the whole drive.
The convertdisklabel() procedure can probably stand to be simplified,
but it ought to deal with all but an extraordinarily broken disklabel,
now.
i386: compiled and tested, sparc64: compiled, evbmips: compiled.
from doc/BRANCHES:
idle lwp, and some changes depending on it.
1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
even if a pmap_enter_ma() (the underlying hypercall, really) returns one.
The fact that the mapping failed is notified to the called by oring
0xF0000000 to the mfn. This makes Xen 3.0.4 HVM work.
Seperate "cpubus" and "ioapicbus" -- while they share a common "address
space" (the apic id), the kernel doesn't use this fact. There are different
data passed to cpus and apics, which caused some ugly polymorphism. This
also saves the special "submatch" functions needed to distingush cpus
and ioapics for autoconf. (And it makes that "apid" locators wired
in the kernel configuration are honored now; this allows one to dumb down
an mp box to singleprocessor by userconfig.)
Print "apid" locators in the buses "print" function "as everyone does",
so the per-port cpu drivers don't need to do it.
Being here, constify "struct cpu_functions" and g/c the unused MP_PICMODE
flag.
interface doesn't allow it), so that .e.g bus_dma_subregion() has a chance
to work. Unfortunably a stock Xen hypervisor won't allow a upper bound less
than 2^31 (2GB) so devices like bce(4) will need a hacked hypervisor to
work properly.
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.
Restores source compatibility with pre-newlock2 tools like ps or top.
Reviewed by Andrew Doran.
Patch by Slava Semushin <slava.semushin@gmail.com>
Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
- In xencons_handler(), update in_cons inside the loop, otherwise,
we would trigger the xenconscn_getc() workaround wich reset cons and prod
to their original values, and this creates an infinite loop
Should fix the console hang reported by several users on port-xen@.
otherwise, xbdstart kicked via dk_iodone still sees RING_FULL condition
and just returns. it makes xbd stall when all requests are acked by
a single xbd_handler. PR/34005.
requests and centralizing them all. The result is that some of these
are not used on some architectures, but the documentation was updated
to reflect that.
- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
this device in userland (e.g. qemu-dm when running a HVM guest), we'll reuse
the existing vnode and we'll panic in VOP_UNLOCK(). We don't have this issue
most of the time because when xbdback is the only user, we get a specfs
vnode for which locking operation are NOPs.
Thanks to Antti Kantee for spotting the missing vn_lock() in sources and
giving details about vnode locking.
event should be raised every 10ms if we're runnable. Unfortunately,
there seems to be an intermittent bug in the hypervisor such that,
for about 1<<32 ns (~4.3s) after it manifests, every running domain
continues to run but not get its timer events and new timestamps (nor
is it preempted in favor of other domains on that CPU). This can cause
strange behavior from our timekeeping; for example, hardclock() is never
called during this interval.
So here's a workaround: if timestamp is allegedly up to date but is more
than 40ms old (this is semi-arbitrary), the domain sets its timer to a
time in the past, which causes it to become immediately pending, and also
results in the publication of a new timestamp.