message machinery.
Quiet boots look like this (inspired by BSD/OS):
.
.
Found tlp0 at pci0
.
.
Found wd0 at wdc0
.
.
Silent boots look like this:
.
.
Detecting hardware...<twiddle>done.
.
.
NOTE: This requires cooperation on the part of all device drivers,
changes to which have not yet been checked in.
the number of times it is called. This allows subsystems to report
the number of errors that occurred during a quiet/silent subsystem
startup. aprint_get_error_count() reports this count and resets it
to 0.
Also add printf_nolog(), which is like printf(), but prevents the
output from hitting the system log.
autoconfiguration messages:
aprint_normal: Send to console unless AB_QUIET. Always goes to the log.
aprint_naive: Send to console only if AB_QUIET. Never goes to the log.
aprint_verbose: Send to console only if AB_VERBOSE. Always goes to the log.
aprint_debug: Send to console and log only if AB_DEBUG.
API inspired by the same routines in BSD/OS.
Will be used to address kern/5155.
<sys/kprintf.h> header file. This allows subsystems that need
printf semantics other than what are provided by the standard
kernel printf routines to implement exactly what they want.
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.
possible to use alternate system call tables. This is usefull for
displaying correctly the arguments in Mach binaries traces.
If NULL is given, then the regular systam call table for the process is used.
This does not buy us new functionnality for now, because we still have to
discover how mach_init (which acts as a name server, enabling processes to
discover each other's ports) is able to receive messages from other processes
(this is a bootstrap problem, and the bootstrap port might be the place to
search).
While we are there:
- removed a lot of debug which is now available using ktrace.
- reworked message handling to avoid mutliple copyin/copyout of the
same data. ktrace of Mach message now uses the in-kernel copy of the
message instead of copying it from userland.
- packed mach trap handlers arguments into a structure to avoid modifying
everything next time we have to add an argument.
and make the sleep length depend on value of variable forkfsleep;
it's set to zero by default (no sleep)
this is a preparation for making the sleep length settable via sysctl
These are of use to userland code which previously depended on the
hard-coded values of LABELSECTOR and LABELOFFSET to figure out the
location of the disklabel for a particular platform.
With the introduction of umbrella ports such as evbarm, evbmips, etc,
the location of the disklabel may vary between kernels for the same
MACHINE. This sysctl will allow userland programs to remain independent
of the particular flavour of MACHINE in such cases.
and seems like generally sensible (more sensible than not doing so), so done
in generic code rather than compat glue only
Change proposed in PR kern/18767 by Emmanuel Dreyfus.
- leave 5 processes for root-only use, the previous value of 1
was unsufficient to execute additional commands once logged, and
perhaps also not enough to actually login remotely with recent (open)sshd
- protect the log of "proc: table full" with ratecheck(), so that
the message is only logged once per 10 seconds; though syslogd normally
doesn't pass the repeated messages through, this avoids flooding
syslogd and potentially also screen/logs
- If the process hits either system limit of number of processes in system,
or user's limit of same, force the process to sleep for 0.5 seconds
before returning failure. This turns 2000 rampaging fork monsters into
2000 harmlessly snoozing fork monsters.
The sleep is intentionally uninterruptible by signals.
These are not intended as ultimate protection agains fork-bombs.
Determined attacker can eat CPU differently than via repeating
fork() calls. But this is good enough to help protect against
programming mistakes or simple-minded tests.
Based on FreeBSD kern_fork.c change in revision 1.132 by
Mike Silbersack <silby at FreeBSD org>
Change also discussed on tech-kern@NetBSD.org, thread
'Fork bomb protection patch'.
until after wakeup event, so we can't clear the SI_COLL flag
in selrecord(). Thus, effectively back rev. 1.57 off.
Problem reported in PR kern/17517 by David Laight, program triggering
the problem is in regress/sys/kern/poll/poll3w.c.
need to reparent the process to initproc, so that child wouldn't
have its p_pptr pointer still pointing on the exited parent
pointed out by Dave Sainty in private mail (the patch in kern/14443
didn't have this bug)
in this case, and even if not, the process would be already woken up by the
wakeup() call
change sent as part of kern/17517 by David Laight
XXX perhaps should KASSERT() sel_pid is zero in the SI_COLL case
the same file multiple times because of recursive loading (ie: libx require
liby and libz and liby require libz, so libz would be loaded twice)
This is probably suboptimal, but it enable /bin/sh to load on the PowerPC,
so it's a good interim solution until we figure precisely how things should
work.
I'm not sure whether this makes the excessive recursive check useless or not.
reparented back to original parent before it's killed.
This makes the original parent aware that the child has exited if
the debugger failed to wait() on the debugged zombie before exiting.
Since we clear tracing flags before killing the child, the reparenting
logic in wait4() wouldn't be triggered, so it's necessary to do it here.
Problem reported and fix provided in kern/14443 by David Sainty.
macho_hdr, argc, *argv, NULL, *envp, NULL, progname, NULL,
*progname, **argv, **envp
Where progname is a pointer to the program name as given in the first
argument to execve(), and macho_hdr a pointer to the Mach-O header at
the beginning of the executable file.
backed by physical pages (ie. because it reused a previously-freed one),
so that we can skip a bunch of useless work in that case.
this fixes the underlying problem behind PR 18543, and also speeds up fork()
quite a bit (eg. 7% on my pc, 1% on my ultra2) when we get a cache hit.
original system call number, which can be negative for a Mach trap.
We cannot just replace code by realcode, because ktrsyscall uses it as
an index in the system call table, thus crashing the kernel when the
value is negative.
this gives:
* linux sysconf(_SC_CLK_TCK) gives correct value for linux binaries (hz)
even if hz != 100
* glibc gets proper information on real/effective uid and enables
secure mode for suid binaries
g/c LINUX_COPYARGS_FUNCTION, replaced by linux ELF copyargs function
g/c alpha-specific linux ELF copyargs function and linux ELF defines
device. Should help performance when no fingerprints are loaded.
* Back down the securelevel, now securelevel of 2 will make lack of
fingerprint or fingerprint mismatch a fatal error. Previously this
was done at securelevel 3 or greater.
and friends should either be made first-class citizens and moved
to an include file (systm.h perhaps), or nuked completely, but
not be redefined in a lot of files.
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.
This feature is designed so that it is esay to attach the process using gdb
before it has done anything.
It works also with sproc, kthread_create, clone...
in the event that it needs to use a special VM range (x86_64 falls
into this category). We fall back onto kernel_map if machine-dependent
code doesn't create a special map.
wanted sizeof(struct disk_sysctl), use the old size. for non-COMPAT_16,
however, we return EINVAL so that all future programs are forced into
passing the wanted size. 1.6 iostat(8) works with -current kernel again.
as seen on tech-kern.
userland structure size (if passed in).
Use the supplied userland structure size (if passed in) to check if
there is enough room to copyout the next structure.
- disk_unbusy() gets a new parameter to tell the IO direction.
- struct disk_sysctl gets 4 new members for read/write bytes/transfers.
when processing hw.diskstats, add the read&write bytes/transfers for
the old combined stats to attempt to keep backwards compatibility.
unfortunately, due to multiple bugs, this will cause new kernels and old
vmstat/iostat/systat programs to fail. however, the next time this is
change it will not fail again.
this is just the kernel portion.
be32toh produces an unsigned long result, causing a printf argument
mismatch. This is the wrong fix, but I am not going to change the
powerpc macros; fix the powerpc macros and revert my change.
While a hard link to a symbolic link is not ruled out by POSIX-2001,
the link(2) interface is to perform normal pathname resolution,
which includes the resolution of symbolic links.
kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals
kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)
based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
with privilege elevation no suid or sgid binaries are necessary any
longer. Applications can be executed completely unprivileged. Systrace
raises the privileges for a single system call depending on the
configured policy.
Idea from discussions with Perry Metzger, Dug Song and Marcus Watts.
Approved by christos and thorpej.
now carries the name of the attachment (e.g. "tlp_pci" or "audio"),
and cfattach structures are registered at boot time on a per-driver
basis. The cfdriver and cfattach pointers are cached in the device
structure when attached.
devices have been discovered. All finalizer routines are iteratively
invoked until all of them report that they have done no work.
Use this hook to fix a latent bug in RAIDframe autoconfiguration of
RAID sets exposed by the rework of SCSI device discovery.
the most "officially looking" is IEC 60027-2, ie "Ki", "Mi", ...,
which is not too popular, and which would require more code changes.
So stick with the traditional capital "K" for (divisor==1024), and use
the SI "k" otherwise (ie (divisor==1000)).
a vector of indices into the cfdata table to specify potential parents,
record the interface attributes that devices have and add a new "parent
spec" structure which lists the iattr, as well as optionally listing
specific parent device instances.
See:
http://mail-index.netbsd.org/tech-kern/2002/09/25/0014.html
...for a detailed description.
While here, const poison some things, as suggested by Matt Thomas.
This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.
Also provides an opportunity for optimisations if "switching to self".
Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.
All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.
- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports
- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));
- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
This merge changes the device switch tables from static array to
dynamically generated by config(8).
- All device switches is defined as a constant structure in device drivers.
- The new grammer ``device-major'' is introduced to ``files''.
device-major <prefix> char <num> [block <num>] [<rules>]
- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.
- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.
- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.
- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.
- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
being written to. Breakpoints aren't good in a RAS. This test isn't
infallible, since we can't protect memory which will be registered
as a RAS in the future.
Also, set the PC before attempting to single-step, so we can backout
from single-stepping. Just in case we try to single-step into a RAS.
name to start up as init (rather than just cycling thru initpaths[]
and panicing when out of options). if RB_ASKNAME isn't set, the old
behaviour remains. inspired by changes in der Mouse's patchtree.
resolves [kern/18027] from me.
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.
pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.
- avoid race conditions by having seqno in ioctl
- better uid/gid tracking
- "replace" policy to replace args
- less diffs, as many of local changes were fed back to openbsd already
due to the 1st item, it was impossible for us to provide backward-compatibility
(new kernel + old bin/systrace won't work). upgrade both.
* In pool_prime_page(), assert that the object being placed onto the
free list meets the alignment constraints (that "ioff" within the
object is aligned to "align").
* In pool_init(), round up the object size to the alignment value (or
ALIGN(1), if no special alignment is needed) so that the above invariant
holds true.
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.
Reviewed by Christos Zoulas.
One basic struct, a function to setup a queue with a specific strategy and
three macros to put buf's into the queue, get and remove the next buf or
get the next buf without removal.
The BUFQ_XXX interface will be removed in the future.
The B_ORDERED flag is not longer supported.
Approved by: Jason R. Thorpe <thorpej@wasabisystems.com>
* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.
Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.