external storage. Highlights:
- additional "void *" argument to (*ext_free)(), an opaque
cookie for use by the free function.
- MCLALLOC() and MCLFREE() calls are gone. They are replaced
by MEXTADD() (add external storage to mbuf), MEXTMALLOC()
(malloc() external storage and attach to mbuf), and
MEXTREMOVE() (remove external storage from mbuf).
- completely new external storage reference counting
mechanism; mclrefcnt[] is gone.
These changes will eventually be used to pass driver DMA buffers up
the network stack, and reduce/eliminate copies in certain code paths
(e.g. NFS writes).
From Matt Thomas <matt@3am-software.com> and myself <thorpej@nas.nasa.gov>,
with some input from Chris Demetriou <cgd@cs.cmu.edu> and review by
Charles Hannum <mycroft@mit.edu>.
Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.
For the detailed change history, look at the commit log entries for
the is-newarp branch.
work. Not quite as good as with the Lite2 merges, but it'll do until then.
* dounmount() expects to be called with the mountpoint marked busy
* all callers of dounmount() thus make the call themselves
* if a filesystem was being unmounted, and we're woken up in vfs_busy(),
don't reference the mountpoint struct pointer, as it has very probably
been freed.
in the assembly file genassym.s into the usual assym.h file. The
assym.h file generated this way is identical to the output generated
if I simply compile and run the genassym.s file. "Heh, Kewl!"
Thanks to Matthias Pfaller for the "translate the .s file" idea!
* prototype it before it is used (several ports compile with
-Wstrict-prototypes -Wmissing-prototypes), so this is _necessary_.
* conform to C syntax (yes, that's right, it wouldn't parse).
* make error check less error-prone, + style fixups.
mount the root file system. If the operator specified the root
file system type in the kernel configuration file, attempt to
mount that file system type on the root device. If the root
file system type was wildcarded (or unspecified), try all of
the file systems statically built into the kernel until one
succeeds. If no file systems succeed, return an error. The
system will recover from this condition.
- Implement vfs_getopsbyname(). This function returns the file
system ops vector given a file system name.
from the version used by NetBSD/alpha, with several changes by me.
Support for asking for root device and root file system type on any
kernel, obsoleting "options GENERIC".
- Make my mountroothook implementation used by the sparc and x68k
ports machine-independent, and use it here. Mountroothooks allow
devices to execute special functions before being mounted as the
root device (such as ejecting the floppy and prompting for a new
floppy disk).
- Make swapconf() machine-independent. It was identical on all ports.
- Run mountroot hooks before we attempt to mount the root device, and
destroy mountroot hooks after the root file system has been sucessfully
mounted.
- Don't panic if we can't mount root. Instead, set RB_ASKNAME and
call setroot(), which will prompt the operator for the root device
and file system type.
is running (and NTP is not enabled), the adjtime()-handling code clobbers
any tickfix that may be necessary for systems with clocks with frequency
greater than 1000Hz.
Eliminate obsolete global kernel variable "struct timezone tz"
Add RTC_OFFSET option
Add global kernel variable rtc_offset, which is initialized by
RTC_OFFSET at kernel compile time.
on i386, x68k, mac68k, pc532 and arm32, RTC_OFFSET indicates how many
minutes west (east) of GMT the hardware RTC runs. Defaults to 0.
Places where tz variable was used to indicate this in the past have
been replaced with rtc_offset.
Add sysctl interface to rtc_offset.
Kill obsolete DST_* macros in sys/time.h
gettimeofday now always returns zeroed timezone if zone is requested.
settimeofday now ignores and logs attempts to set non-existant kernel
timezone.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.
* change in-kernel syscall prototypes to match user-land prototypes in
the following ways:
+ add 'const' where appropriate.
+ make the following "safe" type changes where appropriate:
caddr_t -> struct msghdr *
caddr_t -> struct sockaddr *
caddr_t -> void *
char * -> void *
int -> uid_t (safe because uid_t not used as index/count)
int -> gid_t (safe because gid_t not used as index/count)
u_int -> size_t
+ change "int" to "u_long" in flags arguments to chflags() and
fchflags(). This is safe because the arguments are used as
flag bits and there's nothing that would cause the top bit
of the int to be set yet, and because the user-land definitions
already specified u_long, so a u_long's worth of argument was
already being passed in.
wrong for a bunch of functions:
void: sys_exit, sys_sync
ssize_t: sys_read, sys_write, sys_recvmsg, sys_sendmsg,
sys_recvfrom, sys_readv, sys_writev, sys_sendto
long: sys_pathconf, sys_fpathconf
void *: sys_shmat
* Note that sys_open, sys_ioctl, and sys_fcntl are defined such that their
last argument is optional.
These changes should not have any real effect, because right now this
information is not actually used for anything.
* Don't output prototypes for INDIR syscalls (since they always show up as
sys_nosys() in the syscall table).
* Add "indir" to the comment for INDIR syscalls in the syscalls table, so
it's more obvious why they call sys_nosys().
* Deal with multi-word system call return types (i.e. foo *, or
struct foo *, or struct foo, etc.).
* Add a new class of system calls "INDIR" (for "indirect"), which
is to be used to represent indirect syscalls like syscall() and
__syscall() which are implemented in MD code and which don't want
args structures defined. (The old way of declaring this type of
syscalls still works.)
* Allow system calls to be marked as having a variable number of
arguments, by inserting "..." (no trailing comma) before the
first hf the optional arguments in the syscall definition. Because
of the way syscall arguments are handled by MI code, _ALL_ syscall
arguments must actually be included in the definition, i.e.
"optional" arguments are either "are there or aren't," i.e. these
aren't really varargs functions. Therefore, for normal syscalls,
there _must_ be arguments listed after the "...". For INDIR
syscalls, which really do have a variable number of arguments and
which aren't handled via the normal mechanism, that requirement is
not in force.
* output primitive (machine-parsable) syscall descriptions as comments
in <sys/syscall.h>. These can be used to easily build real function
prototypes, or to build stub functions for use by lint.
>- Optional systems calls are "UNIMPL" if the support is not being
> compiled into the kernel.
It had implications that didn't occurr to me at the time. *sigh*
>- Optional systems calls are "UNIMPL" if the support is not being
> compiled into the kernel.
It had implications that didn't occur to me at the time. *sigh*
defined:
define match functions to take a struct cfdata * as their second
argument, config_search() to take a struct cfdata * as its second
argument, and config_{root,}search() to return struct cfdata *.
remove 'cd_indirect' cfdriver element.
remove config_scan().
remove config_make_softc() as a seperate function, reintegrating
its functionality into config_attach().
Ports will define __BROKEN_INDIRECT_CONFIG until their drivers prototypes
are updated to work with the new definitions, and until it is sure that
their indirect-config drivers do not assume that they have a softc
in their match routine.
the client and server/shared data initialization into separate functions,
and calling the server/shared initialization directly from main().
Problem noted in PR #1308 (Kenneth Stailey) and PR #1780 (Chris Demetriou).
Fix suggested in PR #1780 by Chris Demetriou, and munged a bit by me,
and OK'd by Frank van der Linden <fvdl@netbsd.org>.
(1) after removing a shutdown hook (in shutdownhook_disestablish()),
free it. We created it, we have to free it. Without this,
shutdownhook_disestablish() leaks memory.
(2) in doshutdownhooks(), before running each hook, remove it from the
shutdown hook list. This makes sure that every hook is tried once
(because doshutdownhooks() is called from before rebooting, and
a fault in a shutdown hook will cause doshutdownhooks() to be called
again), but prevents the hooks from potentially being run infinitely
(as used to be possible, in the above-mentioned situation).
If not compiled with -D_KERNEL, include different includes and
do so macro magic so that this will fit sanely into test harnesses.
When used in user-land, this should be compiled with -D_EXTENT_TESTING.
Bug fixes:
(extent_insert_and_optimize) You can't do things like:
LIST_REMOVE(elem->...le_next, ...);
free(elem->...le_next, ...);
They just don't work (and will corrupt your list and/or malloc free list).
(extent_alloc_region_descriptor) Unless you wait, malloc can fail.
Don't accidentally deref a potentially-NULL pointer.
- The functions that implement them and the argument names are
prepended with "sys_".
- Optional systems calls are "UNIMPL" if the support is not being
compiled into the kernel.
representing the names of those bits, prints them into a buffer
provided by the caller, and returns a pointer to that buffer.
Functionality is identical to that of the (non-standard) `%b' printf()
format, which will be deprecated.
Rename the non-exported function ksprintn() to ksnprintn(), and change
it to use a buffer provided by the caller, rather than at static
buffer.
* handle interpreters with nonzero virtual address of entry-point:
subtract p_vaddr from computed entrypoint, as the mips elf exec did.
* Add #ifdef ELF_INTERP_NON_RELOCATABLE/#endif around the code
that tries to choose a `good' address at which to load an interpreter,
if none was set by the emul probe function.
(the address chosen could be improved to avoid fragmenting the
process virtual address space).
* define ELF_INTERP_NON_RELOCATABLE in machine/elf_machdep.h for mips CPUs,
which currently use a GNU-derived ld.so.
ELF_INTERP_NON_RELOCATABLE is not necessary for native NetBSD/alpha ELF
binaries. It may be required for GNU-derived ELF dynamic loaders (Linux/i386?)
Keep queue of pending sockets in a double linked list. Previously,
a singly linked list was used, giving O(N) insertion/deletion times,
and was a major time consumer for sockets with large pending queues.
The double linked list give O(C) insertion/deletion times with only
a small cost in complexity.
Since a socket can be on, at most, one queue at a time, both so_q and
so_q0 can safely be used as (forward and backward, respectively) queue
pointers.
Submitted my Matt Thomas <matt@3am-software.com>, a long time ago.
(Geez, I've been running with this patch for _months_, and had completely
forgotten about it!)
struct member cn_nameptr 'const', since they should never be used to
modify the path name. (Only the pathname buffer, cn_pnbuf, should be
modified.) Propagate the const poisoning to code that uses the namei
and componentname structs.
not used by anything, for now), and implement MNT_NOCOREDUMP by checking
whether or not MNT_NOCOREDUMP is set on the file system where the dump
would land (i.e. the file system of the process's current working
directory), and disallowing the core dump if it's set.
- Rename EX_NOBLOB to EX_NOCOALESCE; it's much more descriptive of
what's going on.
- In extent_free_region_descriptor(), if we're a fixed extent,
freeing a dynamically allocated region descriptor, and someone
is waiting on the freelist, let the waiter have it, rather than
free'ing it back to the system.
- Use ALIGN(), rather than our homegrown EXTENT_ALIGN(), when dealing
with map overhead. Privatize the EXTENT_ALIGN() macro; there's no need
to export it.
- Implement EX_BOUNDZERO flag. This changes the boundary line policy in
extent_alloc() and extent_alloc_subregion(); boundary lines are
computed relative to 0, rather then the start of the extent.
- Fix a nasty race between multiple participants doing region and
descriptor allocation.
- Add a new flag to specify that it's ok to wait for space in the
extent: EX_WAITSPACE.
- Blow away an unnecessary splhigh()/splx().
- Put a bunch of sanity code inside #ifdef DIAGNOSTIC/#endif.
of using it directly, use a local, and set that local to be curproc
if curproc is not NULL else a pointer to process 0's proc struct.
If syncing disks while handling a panic that occurred while 'curproc'
was NULL, the old code would dereference NULL and die.
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.
section. Patch come up with by Bob Baron <rvb+@cs.cmu.edu> and myself.
This entire bit of code (the code which sets daddr/dsize and taddr/tsize)
is very bogus, but it's not clear what the 'right' way to fix it is
and this patch fixes a problem preventing some ELF executables from
being run.
ELF_ROUND (round to higher alignment boundary), and use them properly.
Also, change a bit of code in elf_load_psection to use the next ELF_ROUND
macro. This fixes a bug found by Robert Baron <rvb+@cs.cmu.edu> where
elf_load_psection, if given a properly aligned address at which to load
the section, would round actually load it at the next highest alignment
boundary.
for NOEXEC and NOSUID, and make sure the interpreter file is executable.
The mount point checks are done because, even though the interpreter
is not the program being 'executed', code from the interpreter is being
executed, and so the mount point's flags should be respected.
and shell script support to be optional (conditioned on EXEC_SCRIPT).
Remove the implicit inclusion of EXEC_ECOFF when COMPAT_OSF1 and/or
COMPAT_ULTRIX is included, and of EXEC_ELF32 when COMPAT_LINUX and/or
COMPAT_SVR4 is included.
queue.h list/queue head initializer macros. mountlist was converted so
that panics (or other reboots) early on in kernel startup don't cause
sys_sync() to croak. vnode_free_list was converted because it was nearby.
macros to use to remove #ifdefs from the machine ID case check.
Eventually, these headers will contain other information, e.g.
machine-dependent relocation information, etc.
p->p_vmspace->vm_shm would be NULL. Protected the rest of the cases where
that might happen too. This was the reason why sunxdoom would panic the
system in SVR4 emulation.
programs which attach their own header) can crash the machine. The problem
in this case was:
a variable "space" was set to the total data to copy,
len was used to remember how much to copy in this chunk (mbuf),
in one case, len = min(MCLBYTES - max_hdr, resid) but
size -= MCLBYTES;
instead of
size -= len;
Note that userland programs can still crash the machine by providing
bogus data in the ip->ip_len field I suspect. I haven't verified this,
but will soon be doing so and applying a fix of some sort. Probably
clamping the ip->ip_len value to the true packet size will be ok.
a boot string for firmware that can do this, such as the SPARC and
the sun3 models. It is currently silently ignored on all other
hardware now, however. The MD function "boot()" has been changed to
also take a char *.
Understands allocation aligment and boundary restrictions, "specific region"
allocations, and suballocations. Capable of statically or dynamically
allocating map overhead.
Many thanks to Matthias Drochner for running the code for me, and sending
me bug fixes, optimizations, and suggestions. Also, many thanks to
Chris Demetriou for his extremely helpful suggestions.
XXX No manual page yet. One is forthcoming, as soon as I can scare up
the time to write one. This has been sitting on my plate for quite a
while, and several projects are waiting for it. Time to move on.
delayed write is logically converted to a sync write, mirroring the async case.
In bdwrite(), move the tape case earlier to avoid needless reassignbuf()s.
and free some space by calling m_reclaim(). Also, log the "mb_map full"
error message (at most) every 60-seconds. The old code would log it
once over the lifetime of the system, but that's not a useful diagnostic.
(More useful is the new behaviour, which roughly indicates how often
periods of heavy load occur, without spamming the console and system
logs with messages.)
Right now, this code just panic()s (same as kmem_malloc() used to do
before, but different message), but in the future it should be modified
to try to reclaim wasted memory.
automatic array rather than an array allocated with alloca().
(This was the only use of alloca() in the kernel, and it wasn't
necessary or consistent with the way other functions in this file
work.)
structure and 'aux', right before ca_attach is called for the
newly-attached device. This allows the alpha port to do root device
autodetection without modifying every bus and device driver which could
be in the 'boot path.' In the long run, it may make sense to make
this machine-independent.
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
documented behavior. sysctl(3) is documented to return 0 on success,
-1 on failure. The previous behavior was to return -1 on failure
and the number of bytes copied back down to user space.
Fixes part of PR #1999, from Kevin M. Lahey <kml@nas.nasa.gov>
return a struct device * of attached device, or NULL if device attach failed,
rather than 1/0 for success/failure, so that code that bus code which needs
to know what the child device is doesn't have to open-code a hacked variant
of config_found(). Make config_attach() return struct device *, rather than
void, to facilitate that.
- split softc size and match/attach out from cfdriver into
a new struct cfattach.
- new "attach" directive for files.*. May specify the name of
the cfattach structure, so that devices may be easily attached
to parents with different autoconfiguration semantics.