in the mbuf header.
* Use the new cached paddr feature of the pool_cache API to record
the physical address of mbuf clusters. (We cannot use a ctor for
clusters, since clusters have no constructed form; they are merely
buffers).
Bus_dma back-ends may use the cached physical addresses to save having to
extract the physical address from virtual.
* Provide space in m_ext recording the vm_page *'s for an SOSEND_LOAN_CHUNK-
sized non-cluster external buffer. Use this in the sosend_loan code to
save having to extract the physical address from virtual and then look
up the vm_page *'s.
* Provide an indication that an external buffer is mapped read-only at
the MMU. Set this flag for the external buffer in the sosend_loan
case, since loaned pages are always mapped read-only. Bus_dma back-ends
may use this information to save cache flushing, since a cache flush of
a read-only mapping is redundant on some architectures (the cache would
have already been flushed when making the mapping read-only).
Part 2 in a series of simple patches contributed by Wasabi Systems
to improve network performance.
objects. Clients of the pool_cache API must consistently use
the "paddr" variants or not, otherwise behavior is undefined.
Enable this on Alpha, ARM, MIPS, and x86. Other platforms must
define POOL_VTOPHYS() in the appropriate manner in order to enable
the feature.
Part 1 of a series of simple patches contributed by Wasabi Systems
to improve network performance.
64 bit block pointers, extended attribute storage, and a few
other things.
This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.
Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
As discussed on tech-net.
* Define a range of bits in the new wider m_flags which are reserved for
M_EXT mbufs. Define M_EXT_CLUSTER to indicate the ext is a cluster, and
re-define M_CLUSTER as M_EXT_CLUSTER for source-level compatibility.
* Define a new M_EXTCOPYFLAGS to be used when manipulating M_EXT state.
Avoids a lot of casting and removes the need for some line breaks.
Removed a load of (caddr_t) casts from calls to copyin/copyout as well.
(approved by christos - he has a plan to remove caddr_t...)
lookup and allocation, and any dependency on NPROC or MAXUSERS.
NO_PID changed to -1 (and renamed NO_PGID) to remove artificial limit
on PID_MAX.
As discussed on tech-kern.
having some #ifdef UNION code in vfs_vnops.c, introduce variable
'vn_union_readdir_hook' which is set to address of appropriate
vn_readdir() hook by union filesystem when it's loaded & mounted
people would read that list in a more timely fashion!), change the new
64-bit memory reporting sysctl nodes to report bytes. This should not
be a problem, since it's only a week old, and no applications use the
new nodes yet.
We cannot store LWP pointers permanently in lock structures, for two reasons:
1) They are somewhat ephemeral. Dangling pointers are bad.
2) A different LWP may issue the unlock, and in this case, we were not actually
doing the unlock at all. This was causing processes to exit without undoing
fcntl(2) locks. Furthermore, the locks are process-specific to begin with,
so the test was just plain wrong.
Instead, we go back to storing a proc pointer for POSIX locks. In addition, we
add an extra pointer to the LWP, which is used in deadlock detection. After
the lock is granted, this pointer is 0ed and there is no reference to the LWP.
Now evolution can inc my mail again.
and in libkvm. Then teach ps how to show them to you.
Also, teach ps how to show the names for all the uids, the rest of the
group numbers, and the "group access list".
VM_DEFAULT_ADDRESS from elf*_makecmds to elf*_load_file. In load_file,
actually determine ahead of time how much space will be needed and pass
that to VM_DEFAULT_ADDRESS. Now we have a relatistic starting address
so we can do the loading of psections normally with no extra topdown
code in load_psection. Also, if there is a gap in betweeen psections
zero map an inaccessible region between (just like ld.elf_so does) to
avoid inadvertant mmaps in the gap.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
means that the dynamic linker gets mapped in at the top of available
user virtual memory (typically just below the stack), shared libraries
get mapped downwards from that point, and calls to mmap() that don't
specify a preferred address will get mapped in below those.
This means that the heap and the mmap()ed allocations will grow
towards each other, allowing one or the other to grow larger than
before. Previously, the heap was limited to MAXDSIZ by the placement
of the dynamic linker (and the process's rlimits) and the space
available to mmap was hobbled by this reservation.
This is currently only enabled via an *option* for the i386 platform
(though other platforms are expected to follow). Add "options
USE_TOPDOWN_VM" to your kernel config file, rerun config, and rebuild
your kernel to take advantage of this.
Note that the pmap_prefer() interface has not yet been modified to
play nicely with this, so those platforms require a bit more work
(most notably the sparc) before they can use this new memory
arrangement.
This change also introduces a VM_DEFAULT_ADDRESS() macro that picks
the appropriate default address based on the size of the allocation or
the size of the process's text segment accordingly. Several drivers
and the SYSV SHM address assignment were changed to use this instead
of each one picking their own "default".
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:
* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.
* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.
* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.
And a few that are not strictly necessary:
* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."
* Unified GOP_ALLOC between FFS and LFS.
* Update LFS copyright headers to correct values.
* Actually cast to unsigned in lfs_shellsort, like the comment says.
* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
timeout
the semantics of 'timeout' parameter differ to POSIX for the syscall
(not const, may be modified by kernel if interrupted from the wait) -
libc will provide appropriate wrapper
since sigwaitinfo(2) will be implemented as wrapper around sigtimedwait()
too, remove it's reserved slot and move sigqueue slot 'up', freeing
slot #246
conditions at points where it's necessary to access both the up-stream
and down-stream parts of the bi-directional pipe data structure. These
are marked `XXXSMP' in the code.
Also, since the changes are pretty invasive, there little point in keeping
all the "#ifdef FreeBSD" code around; so all of that has been stripped out.
kernel config option) that controls whether the kernel dumps to the
dump device on panic. Dumps can still be forced via the ``sync''
command from ddb. Defaults to ``on''.
arbitrary number of timers, both oneshot and periodic.
from FreeBSD, only adapted to NetBSD kernel API - mstohz() instead
of tvtohz(), and takes advantage of callout_schedule() in filt_timerexpire()
done by Artur Grabowski and Thomas Nordin for OpenBSD, which is more
efficient in several ways than the callwheel implementation that it is
replacing. It has been adapted to our pre-existing callout API, and
also provides the slightly more efficient (and much more intuitive)
API (adapted to the callout_*() naming scheme) that the OpenBSD version
provides.
Among other things, this shaves a bunch of cycles off rescheduling-in-
the-future a callout which is already scheduled, which the common case
for TCP timers (notably REXMT and KEEP).
The API has been simplified a bit, as well. The (very confusing to
a good many people) "ACTIVE" state for callouts has gone away. There
is now only "PENDING" (scheduled to fire in the future) and "EXPIRED"
(has fired, and the function called).
Kernel version bump not done; we'll ride the 1.6N bump that happened
with the malloc(9) change.
mechanism by keeping a list (bitset) of which timers have fired and using
that list in the upcall (Does this sound familiar? SEND HELP NEED SIGINFO).
Provoke the idle LWP into running again with setrunnable(sa->sa_idle)
instead of a wakeup() call, since we know what it is.
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
(1) ELFNAME(load_file)() now takes a pointer to the entry point
offset, instead of taking a pointer to the entry point itself. This
allows proper adjustment of the ultimate entry point at a higher level
if the object containing the entry point is moved before the exec is
finished.
(2) Introduce VMCMD_FIXED, which means the address at which a given
vmcmd describes a mapping is fixed (ie, should not be moved). Don't
set this for entries pertaining to ld.so.
Also some minor comment/whitespace tweaks.
the number of times it is called. This allows subsystems to report
the number of errors that occurred during a quiet/silent subsystem
startup. aprint_get_error_count() reports this count and resets it
to 0.
Also add printf_nolog(), which is like printf(), but prevents the
output from hitting the system log.
autoconfiguration messages:
aprint_normal: Send to console unless AB_QUIET. Always goes to the log.
aprint_naive: Send to console only if AB_QUIET. Never goes to the log.
aprint_verbose: Send to console only if AB_VERBOSE. Always goes to the log.
aprint_debug: Send to console and log only if AB_DEBUG.
API inspired by the same routines in BSD/OS.
Will be used to address kern/5155.
<sys/kprintf.h> header file. This allows subsystems that need
printf semantics other than what are provided by the standard
kernel printf routines to implement exactly what they want.
for forking the traditional UNIX init(8) and it does the Mach port naming
service. We need mach_init for the naming service, but unfortunately, it
will only act as such if its PID is 1. We introduce a sysctl
(emul.darwin.init_pid) to fool a given process into thinking its PID is 1.
That way we can get mach_init into behaving as the name server.
Typical use:
/sbin/sysctl -w emul.darwin.init_pid=$$ ; exec /emul/darwin/sbin/mach_init
possible to use alternate system call tables. This is usefull for
displaying correctly the arguments in Mach binaries traces.
If NULL is given, then the regular systam call table for the process is used.
These are of use to userland code which previously depended on the
hard-coded values of LABELSECTOR and LABELOFFSET to figure out the
location of the disklabel for a particular platform.
With the introduction of umbrella ports such as evbarm, evbmips, etc,
the location of the disklabel may vary between kernels for the same
MACHINE. This sysctl will allow userland programs to remain independent
of the particular flavour of MACHINE in such cases.
the same file multiple times because of recursive loading (ie: libx require
liby and libz and liby require libz, so libz would be loaded twice)
This is probably suboptimal, but it enable /bin/sh to load on the PowerPC,
so it's a good interim solution until we figure precisely how things should
work.
I'm not sure whether this makes the excessive recursive check useless or not.
macho_hdr, argc, *argv, NULL, *envp, NULL, progname, NULL,
*progname, **argv, **envp
Where progname is a pointer to the program name as given in the first
argument to execve(), and macho_hdr a pointer to the Mach-O header at
the beginning of the executable file.
and friends should either be made first-class citizens and moved
to an include file (systm.h perhaps), or nuked completely, but
not be redefined in a lot of files.
that can be used to block a process after fork(2) or exec(2) calls. The
new process is created in the SSTOP state and is never scheduled for running.
This feature is designed so that it is esay to attach the process using gdb
before it has done anything.
It works also with sproc, kthread_create, clone...
in the event that it needs to use a special VM range (x86_64 falls
into this category). We fall back onto kernel_map if machine-dependent
code doesn't create a special map.
- disk_unbusy() gets a new parameter to tell the IO direction.
- struct disk_sysctl gets 4 new members for read/write bytes/transfers.
when processing hw.diskstats, add the read&write bytes/transfers for
the old combined stats to attempt to keep backwards compatibility.
unfortunately, due to multiple bugs, this will cause new kernels and old
vmstat/iostat/systat programs to fail. however, the next time this is
change it will not fail again.
this is just the kernel portion.
kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals
kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)
based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
removing has no next element, verify that the queue head agrees that
the current element is the last one. (this is how I found the recent
ppc pmap bugs).
children nodes have reached their final state before augmenting the
parent. This fixes an obscure inconsistency of the space in the
vm_map tree that gcc 3.2 triggers when compiling isp.o on alpha. (this
only led to some leaked space). from art@openbsd.org
with privilege elevation no suid or sgid binaries are necessary any
longer. Applications can be executed completely unprivileged. Systrace
raises the privileges for a single system call depending on the
configured policy.
Idea from discussions with Perry Metzger, Dug Song and Marcus Watts.
Approved by christos and thorpej.
now carries the name of the attachment (e.g. "tlp_pci" or "audio"),
and cfattach structures are registered at boot time on a per-driver
basis. The cfdriver and cfattach pointers are cached in the device
structure when attached.
devices have been discovered. All finalizer routines are iteratively
invoked until all of them report that they have done no work.
Use this hook to fix a latent bug in RAIDframe autoconfiguration of
RAID sets exposed by the rework of SCSI device discovery.
This is the bulk of PR #17345
The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
a vector of indices into the cfdata table to specify potential parents,
record the interface attributes that devices have and add a new "parent
spec" structure which lists the iattr, as well as optionally listing
specific parent device instances.
See:
http://mail-index.netbsd.org/tech-kern/2002/09/25/0014.html
...for a detailed description.
While here, const poison some things, as suggested by Matt Thomas.
This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.
Also provides an opportunity for optimisations if "switching to self".
Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.
All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.
- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports
- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));
- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.
with humanize_number(3). In reality, the kernel version should be changed
to more closely match the userland version so that there are no prototype
conflicts.
This merge changes the device switch tables from static array to
dynamically generated by config(8).
- All device switches is defined as a constant structure in device drivers.
- The new grammer ``device-major'' is introduced to ``files''.
device-major <prefix> char <num> [block <num>] [<rules>]
- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.
- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.
- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.
- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.
- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.