two muls and a shift, which needs at most 2ms on a 25MHz i386 and should
end up as fast as delay(1) was before due to using a reminder of 2.
Discussed with ad@.
- introduce a function to reserve an idt entry and use it instead of
manipulating idt_allocmap directly.
- rename idt to xen_idt for amd64 xen. add missing #ifdef XEN.
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.
xpq_update_foreign(). Note that this also affects native operations, but it
shoulnd't cause problems even for SMP system. Proposed on port-amd64@ and
port-i386@
Don't invalidate the recursive PTE entry in user pmap when switching from
kernel to userland on Xen/amd64. This effectively means that a userland
process can read its own page tables (no write, of course) on Xen/amd64, but
it shouldn't cause security issue (discussed on tech-kern@ some time ago).
This makes NetBSD Xen/amd64 more than 10x faster
building pkgsrc/pkgtools/digest
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.
than TSC, but doesn't suffer from SpeedStep as TSC does.
The default quality is higher than HPET for UP, but -100 for
MULTIPROCESSOR as it needs CPU local state which doesn't exist yet.
- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
1) If ODCM is disabled (ODCM_ENABLE not set), clockmod_getstate() should
return the maximum level (7), not the lowest (0), as the levels are
defined as duty cycles where the highest implies no ODCM. Now sysctl
machdep.clockmod.current doesn't lie upon init.
2) Make the sysctl handler ensure that no disabled levels are permitted.
Previously, a level disabled due to errata could be passed to
clockmod_setstate(), which would search through the state array,
skipping the unusable value. Consequently our index would be out of
range and badness could ensue.
Okay'd by xtraeme@.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
http://mail-index.netbsd.org/tech-kern/2007/11/09/0001.html
sysmon_envsys_create() and sysmon_envsys_destroy() were added to
create/destroy sysmon_envsys objects (and its TAILQ/LIST for sensors/events).
sysmon_envsys_sensor_attach() and sysmon_envsys_sensor_detach() were
added to attach/detach sensors to a specified sysmon_envsys device.
The events framework is now per device and configurable via the
ENVSYS_SETDICTIONARY ioctl or /etc/envsys.conf and envstat(8).
Update all users and documentation to reflect these changes.
Add some more defines from the spec. Remove some old ones not
existing in the current Intel Architecture Guide. Use some more
understandable names.
ANSIfy and use uintXX_t to hurt my eyes less.
Further improve readability by exploiting __HAVE_TIMECOUNTER as
invariance on x86 platforms.
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.
voltage for different frequencies. The fake table can be computed as
driven by the frequency here as well, but don't recompute the voltage as
it would result in an underflow.
Fix argument order in a debug message to match the format string.
argument. Use this and replace the inline assembly (mul + div using the
64bit intermediate result) with normal 32bit multiplication and
division. The compiler can turn the division into a multiplication and
shift, making it even cheaper then the original assembly. For extreme
long delays, just use 64bit arithmetic.
"AMD64 Architecture Programmer's Manual"). Declare the variables
holding them as vaddr_t, otherwise the upper bits are lost.
(CR0 is actually 64-bit too, but the upper bits are unused, so I am
not changing it now.)
Should fix the reboot caused by X11. From Arto Huusko in
PR port-amd64/37043.
- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.
This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.
TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.
NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
This saves two levels of indentation for the main body, making it more
readable. Don't hide the error for disabling the RNG under DIAGNOSTIC,
verbose is enough. Use aprint_*_dev.
Always write entries to all IOAPIC pins. The first 16 pins are
threated as ISA IRQs by default, the others like PCI IRQs. This avoids
inconsistencies based on incomplete BIOS setups. This resulted in early
ACPI SCI notifications to be lost, effectively breaking the Embedded
Controller on cold start on many notebooks.
Don't special case the IOAPIC setup between ioapic_attach and
ioapic_enable, always setup the correct redirections. Depend on
splhigh/disable_intr to stop interrupts and don't keep them masked in
the IOAPIC. This avoids unacknowleged edge interrupts and fixing the problem
of broken PS/2 keyboard when hitting keys during early boot.
mpacpi.c 1.48.12.2 from jmcneill-pm:
Don't process the MADT and modify the interrupt config at one moment and
later trying to figure out if an entry was overriden and matches the
ACPI SCI. This is brain-dead and breaks in various situations.
Just check for each ISA override entry, if it matches the SCI. If it
does, remember it and use it for the interrupt setup. If there's no such
override assume that it is not changed, but override the polarity and
level from ISA settings to PCI settings.
When changing the redirection entry for an interrupt, write the
high 32bit first. The low 32bit contain the mask bit and removing
that before setting the destionation ID can lead to lost interrupts.
- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
pages immediately above and below the x86 interrupt stack so that
both an overgrown interrupt stack and other faults produce a page
fault trap. Condition this on the historical option NOREDZONE,
for now.
- Moved to x86/pci, so that EM64T systems running NetBSD/amd64 can use it.
- Added support for the TCO on ICH6 or newer chipsets, adapted from
FreeBSD.
- Added timecounter support for the power management timer, adapted from
OpenBSD.
- Plus some misc/cosmetic changes.
Thanks to yukonbob on irc@freenode for testing the TCO part on ICH4-M.
Tested by me with ICH7 too.
is found, try to make it unique by appending a count (1-99) to the sensor
description (truncating, if necessary). This takes my Dell PowerEdge 1800
from:
Temp: 40.000 degC
VRD 1 Temp: 35.000 degC
VRD 0 Temp: 39.000 degC
Planar Temp: 35.000 degC
Ambient Temp: 20.000 degC
Fan 2: 1500 RPM
Fan 1: 1425 RPM
CMOS Battery: 3.057 V
Intrusion: ON
Status : ON
to:
Temp3: 40.000 degC
Temp2: 40.000 degC
VRD 1 Temp: 35.000 degC
VRD 0 Temp: 39.000 degC
Planar Temp: 35.000 degC
Ambient Temp: 20.000 degC
Temp1: 41.000 degC
Temp: 43.000 degC
Fan 2: 1500 RPM
Fan 1: 1425 RPM
CMOS Battery: 3.057 V
Intrusion: ON
Status 1: ON
Status : ON
- On Power Supply sensors:
* if power supply is not installed, return a critical event.
* if power supply is installed but not powered on, return a
warnover event.
(Part 2: drivers)
* Support for detachable sensors.
* Cleaned up the API for simplicity and efficiency.
* Ability to send capacity/critical/warning events to powerd(8).
* Adapted all the code to the new locking order.
* Compatibility with the old envsys API: the ENVSYS_GTREINFO
and ENVSYS_GTREDATA ioctl(2)s are supported.
* Added support for a 'dictionary based communication channel' between
sysmon_power(9) and powerd(8), that means there is no 32 bytes event
size restriction anymore.
* Binary compatibility with old envstat(8) and powerd(8) via COMPAT_40.
* All drivers with the n^2 gtredata bug were fixed, PR kern/36226.
Tested by:
blymn: smsc(4).
bouyer: ipmi(4), mfi(4).
kefren: ug(4).
njoly: viaenv(4), adt7463.c.
riz: owtemp(4).
xtraeme: acpiacad(4), acpibat(4), acpitz(4), aiboost(4), it(4), lm(4).
config_handle_wedges() and read_disk_sectors(). On x86, handle_wedges()
is a thin wrapper for config_handle_wedges(). Share opendisk()
across architectures.
Add kernel code in support of specifying a root partition by wedge
name. E.g., root specifications "wedge:wd0a", "wedge:David's Root
Volume" are possible. (Patches for config(1) coming soon.)
In support of moving disks between architectures (esp. i386 <->
evbmips), I've written a routine convertdisklabel() that ensures
that the raw partition is at RAW_DISK by following these steps:
0 If we have read a disklabel that has a RAW_PART with
p_offset == 0 and p_size != 0, then use that raw partition.
1 If we have read a disklabel that has both partitions 'c'
and 'd', and RAW_PART has p_offset != 0 or p_size == 0,
but the other partition is suitable for a raw partition
(p_offset == 0, p_size != 0), then swap the two partitions
and use the new raw partition.
2 If the architecture's raw partition is 'd', and if there
is no partition 'd', but there is a partition 'c' that
is suitable for a raw partition, then copy partition 'c'
to partition 'd'.
3 Determine the drive's last sector, using either the
d_secperunit the drive reported, or by guessing (0x1fffffff).
If we cannot read the drive's last sector, then fail.
4 If we have read a disklabel that has no partition slot
RAW_PART, then create a partition RAW_PART. Make it span
the whole drive.
5 If there are fewer than MAXPARTITIONS partitions,
then "slide" the unsuitable raw partition RAW_PART, and
subsequent partitions, into partition slots RAW_PART+1
and subsequent slots. Create a raw partition at RAW_PART.
Make it span the whole drive.
The convertdisklabel() procedure can probably stand to be simplified,
but it ought to deal with all but an extraordinarily broken disklabel,
now.
i386: compiled and tested, sparc64: compiled, evbmips: compiled.
To use it on EM64T CPUs supporting the EST CPUID feature. Note that
some CPUs still don't work with this driver, like Xeon or Pentium 4.
Move the p[34]_get_bus_clock functions into its own file,
intel_busclock.c and remove this code from i386/identcpu.c.
Tested on i386 by myself and amd64 by Tonerre.
from doc/BRANCHES:
idle lwp, and some changes depending on it.
1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
enable the rdmsr call in msr_write_ipi(), so that when it's not
defined we don't read it before writing; disabled in powernow_k8
and enabled in the others.
powernow_k8 driver (much better than undeffing and write it again).
* Fix the WRITE_FIDVID macro, I changed it to use the third argument
for the bitmask, but it's not correct.
Last change should fix the problem reported by FUKUMOTO Atsushi.
Modulation.
This works by changing the duty cycle of the clock modulation,
and saves power and helps to not increase the temperature by
software.
Adapted from OpenBSD/FreeBSD's p4tcc.
To enable it one must use "options INTEL_ONDEMAND_CLOCKMOD".
Tested by me in UP and SMP, ok'ed by Matthew R. Green.
in all CPUs available in the system. This adds another member
to struct cpu_info, ci_msr_rvalue; it will contain the value of the MSR
in a previous operation.
Tested with clockmod in UP and SMP by me, tested with est in SMP
by Daniel Carosone and Michael Van Elst.
Ok'ed by Andrew Doran and Matthew R. Green.
Minor modifications by me:
-use an mi device major number
-(coarsly) divided into pci card specific and less specific parts, moved
the latter to dev/drm
-renamed autoconf attributes to reflect this
Todo:
-adapt all card frontends but i915 to drm include file location
-review the mtrr change
-make the change to agp_i810.c coexist with the fix for buggy VESA
BIOSes which is commented out temporarily
-RCS IDs etc style stuff
-LKM support (rescan support for vga)
-test
Just run it once (in the first cpu probed) with the RUN_ONCE(9)
framework.
Change the argument of est_init and k8_powernow_init to void, we don't
need cpu_info * anymore.
Suggested by tls@ and mrg@.
(Thermal Monitor).
This driver will throttle the CPU clock modulation, saving some
power, also known as ODMC (On Demand Modulation Clock).
The processor can change from 12.5% to 100% (there are two erratas,
so two levels might be skipped in the worst case).
If supported, you'll see the following sysctl sub-tree:
machdep.p4tcc.throttling.target: CPU Clock throttling state (0 = lowest, 7 highest)
machdep.p4tcc.throttling.current: current CPU throttling state
machdep.p4tcc.throttling.available: list of CPU Clock throttling states
machdep.p4tcc.throttling.target = 2
machdep.p4tcc.throttling.current = 2
machdep.p4tcc.throttling.available = 7 6 5 4 3 2
Adapted from OpenBSD/FreeBSD.
Seperate "cpubus" and "ioapicbus" -- while they share a common "address
space" (the apic id), the kernel doesn't use this fact. There are different
data passed to cpus and apics, which caused some ugly polymorphism. This
also saves the special "submatch" functions needed to distingush cpus
and ioapics for autoconf. (And it makes that "apid" locators wired
in the kernel configuration are honored now; this allows one to dumb down
an mp box to singleprocessor by userconfig.)
Print "apid" locators in the buses "print" function "as everyone does",
so the per-port cpu drivers don't need to do it.
Being here, constify "struct cpu_functions" and g/c the unused MP_PICMODE
flag.
int _bus_dmatag_subregion(bus_dma_tag_t tag,
bus_addr_t min_addr,
bus_addr_t max_addr,
bus_dma_tag_t *newtag,
int flags)
void _bus_dmatag_destroy(bus_dma_tag_t tag)
that allow a (normally broken/limited) device to restrict the bus address
range it can talk to. this is used by bce(4) to limit DMA addresses to
1GB range, the maximum the chip can address.
all this is from Yorick Hardy <yhardy@uj.ac.za> with input from several
people on tech-kern.
XXX: bus_dma(9) needs an update still.
C5P and later cores (also known as 'ACE', which is part of the VIA PadLock
security engine). Ported from OpenBSD.
Reviewed on tech-crypto and port-i386, no objections to commiting this.
by Slava Semushin <slava.semushin@gmail.com>.
To verify that no nasty side effects of duplicate includes (or their
removal) have an effect here, I've compiled an i386/ALL kernel with
and without the patch, and the only difference in the resulting .o
files was in shifted line numbers in some assert() calls.
The comparison of the .o files was based on the output of "objdump -D".
Thanks to martin@ for the input on testing.
requests and centralizing them all. The result is that some of these
are not used on some architectures, but the documentation was updated
to reflect that.
- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
I didn't know what header to put the prototype in, so it's both in
i386/mem.c and amd64/mem.c; probably can be moved later.
Tested on amd64, assumed working on i386. :)
yamt@ okay
displayed, to make the code much simpler and easier to follow. Also, use
bitmask_printf() to make output consistent with other stuff. Use
CPUID2FAMILY() where appropriate.
SMBios detection and mapping to bios32.c, also from OpenBSD (for now this
is only compiled in if ipmi(4) is configured). The sensors and watchdog are
accessible though envsys(4).
Works on i386; some work is needed on amd64 to access the BIOS. It would
eventually work on Xen if the SMBios is accessible (to be tested).
- make it possible for machine-specific code to provide custom R/W routines
in its i82093*.h headers
- always initialize sc->sc_pins[pin], even in the !ioapic_cold case.
No objections on port-i386 and port-amd64.
static limits to meet modern bloat requirements.
VM_PHYSSEG_MAX needs it to run on Intel's D946GZIS motherboard, as reported
by rix on #NetBSD-code on freenode. This has a consequence on the initial
number of possible extent allocations for iomem_ex, so increase that value
too.
While there, clarify the action to be taken when VM_PHYSSEG_MAX is maxed
out.
Do that on both amd64 and i386 because the causes, the effects and the code
are mostly the same.
est.c:
* Use a quintuplet (vendor, MHz_hi, mV_hi, MHz_lo, mV_lo } to match
CPUs more correctly than parsing the brand string.
* Add support for a bunch of models.
* Create a fake table on the fly if the CPU is unknown (there's no
table for it) with the current/highest/lowest frequency.
specialreg.h:
* Add some MSRs needed to get the bus clock value.
identcpu.c:
* Add functions specific to Pentium III, Pentium M and Pentium 4 to
get the bus clock value.
Note that the new fake table code from Simon Burge is not included on
this commit.
Ok'ed by simonb and dogcow.
will wait for the bit pending wait to be cleared. When this bit is
cleared the CPU is ready to enter to new processor state. [1]
- Remove useless comments.
- Sync boot messages with est.c
[1] Macro taken from FreeBSD.
And new changes tested by elad.