unions `union_elem: ...', and use c99 syntax `.union_elem = ...' only
where necessary.
in this case, there's no need to tag elf_probe_func because that's the
first union element, and therefore, the implicit case. only specifically
mention ecoff_probe_func where necessary.
if we decide to not use this c99 feature for now, at least there's now
less stuff to rip out.
Previously, we passed __FILE__ and __LINE__ on all pool_get/pool_set calls.
This change results in a measured 1.2% performance improvement in
ping-flood packets-per-second as reported by ping(8).
that the caller allocate the pool_item_header when it allocates the
pool page, so we can avoid a locking pitfall (sleeping with a simple
lock held).
Also revive pool_prime(), as there are some letigimate uses of it,
but in doing so, eliminate some of the bogosities of the old version
(i.e. don't do an implicit "setlowat", just prime the pool, and incr
the minpages for each additional page we add, and compute the number
of pages to prime in a way that callers would expect).
to 512. Apparently, there are ELF binaries with more than 128 section
headers - an example is one of Linux Word Perfect 8 utilities.
This fixes kern/12455 by Mark Davies.
flag.
EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.
EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.
EMUL_NO_SIGIO_ON_READ notes that the emulated binaries that requested
asynchrnous I/O expect the reader process to be notified by a SIGIO, but
not the writer process. OSes without this flag expect the reader and the
writer to be notified when some data has arrived or when some data have been
read. As far as we know, the OSes that need EMUL_NO_SIGIO_ON_READ are Linux
and SunOS.
SPINLOCK_SPIN_HOOK, so that we actually check for
pending IPIs on the Alpha more than once. Also,
when we call alpha_ipi_process(), make sure to go
to splipi().
vfs_busy'ing just before the dounmount() call. This is to avoid
sleeping with the mountlist_slock held -- but we must acquire
syncer_lock before vfs_busy because the syncer itself uses
syncer_lock -> vfs_busy locking order.
callers and appropriate routines to cope. This makes fo_stat more
consistent with rest of fileops routines and also makes the fo_stat
match FreeBSD as an added bonus.
Discussed with Luke Mewburn on tech-kern@.
Use a relative path (../..) instead of /sys.
Enhance the sed expression to work with .'s in paths.
Quote sed expressions in single quotes rather than double
quotes unless there's a good reason otherwise.
mappings (vnode -> name) in the reverse mapping hash table. Without
this option, there is no change; only directories will be entered to
speed up getcwd. This is an option because it will cause getcwd
to hit longer hash chains, and at the moment its usefulness is
still limited.
return NULL instead of restarting the loop since we might sleep
while starting the i/o. this tells getblk() to check if someone else
created the buffer while we slept. from OpenBSD.
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
(only if cmd exited successfully) use tmp file as input to sed pipeline.
This works around two issues:
(1) a pathological case where the script would fail in ... interesting
ways if the command being executed closed its stdout. (Certain
commands are used only for their side effects, but not their output,
and doing some testing on my own i got into hot water when one
of my mods caused a command to close its output).
(2) the fact that genassym would succeed even when the command in
fact failed (because the last cmd in the pipeline is the one whose
exit status would be reported).
space is already torn down in uvmspace_free() when the vmspace
refrence count reaches 0. Move the shmexit() call into uvmspace_free().
Note that there is a beneficial side-effect of deferring the unmap
to uvmspace_free() -- on systems where TLB invalidations are
particularly expensive, the unmapping of the address space won't
have to cause TLB invalidations; uvmspace_free() is going to be
run in a context other than the exiting process's, so the "pmap is
active" test will evaluate to FALSE in the pmap module.
is disconnected by RST right before accept(2). fixes PR 10698/12027.
checked with SUSv2, XNET 5.2, and Stevens (unix network programming
vol 1 2nd ed) section 5.11.
on memory shortage. Instead, use the same wait/nowait condition with the
item requested, and just cleanup and return failure if we can't allocate
page header while we aren't allowed to wait.
do not return junk data in mbuf (= sockaddr on accept(2)'s 2nd arg).
set the length zero.
behavior checked with bsdi and freebsd.
partial solution to PR 12027 and 10698 (need more investigation).
and number of ops, not touch anything - vnode_if.sh now generated
proper offset numbers; vfs_op_check() is only defined and called for DEBUG
kernels
constify extern declaration of vfs_op_descs[]
g/c vfs_opv_numops, use VNODE_OPS_COUNT instead
make vfs_opv_init_explicit() and vfs_opv_init_default() static
then don't need to be patched at runtime
add new define VNODE_OPS_COUNT (to vnode_if.h) so that the number is known
at compile-time
make stuff const, it now can be
This structures are actually modified at kernel init time by vfs_op_init.
XXX - looks like the state after initialization is pretty const and with
some magic in the generator script (and appropriate changes to vfs_op_init)
it could be made const.
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.
Per discussion w/ mycroft.
between write i/os in a disk-based filesystem vs. the disk block being
freed by a truncation, allocated to a new file, and written again with
different data. if the disk driver reorders the requests and does
the second i/o first, the old data will clobber the new, corrupting
the new file.
are done inside of wakeup which is holding the sched lock. Printf can cause
wakeup to get called again (pty redirection of console message) which will
panic with sched lock already held.
This isn't a long term fix as not being able to printf vs. sched lock should
be cleaned up better but this avoids continual panics with lockdebug running
and an xterm -C.
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx
This addresses kern/10981 by Matthew Orgass.
* __HAVE_SYSCALL_INTERN. If this is defined, e_syscall is replaced by
e_syscall_intern, which is called at key places in the kernel. This can be
used to set a MD syscall handler pointer. This obsoletes and replaces the
*_HAS_SEPARATED_SYSCALL flags.
* __HAVE_MINIMAL_EMUL. If this is defined, certain (deprecated) elements in
struct emul are omitted.
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).
defined, call addupc_intr() directly from statclock() in the system time case,
using the same P_OWEUPC path if the copyin/copyout fails.
Use this in i386 to remove profiling code from the normal userret() path.
* Make the syscallnames[] table const.
* Add a separator between the #include section and the syscalls section, so
that #if/#else/#endif can be handled differently in the two.
* Add support for rounding up the size of the sysent table.
constructed objects in the pool allocator, similar to caching
of constructed objects in the Solaris SLAB allocator.
This implementation is a separate API (pool_cache_*()) layered
on top of pools to keep the caching complexity out of the way
of pools that won't benefit from it.
While we're here, allow pool items to be as large as the pool
page size.
*_emul_path variables
change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation
remove no longer needed header files
add e_flags and e_syscall to struct emul; these are unsed and empty for now
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures
there are direct use of MFREE() from sys/kern.
(we experienced no memory leak so far, but if we use m_aux for other purposes,
we will need this change)
and adding it to allproc) after it's fully initialized.
this prevents the scheduler from coming in via a clock interrupt
and tripping over a partially-initialized proc.
emulation-specific data
add process exit, exec and fork function hooks into struct emul:
* e_proc_fork() - called in fork1() after the new forked process is setup
* e_proc_exec() - called in sys_execve() after the executed process is setup
* e_proc_exit() - called in exit1() after all the other process cleanups are
done, right before machine-dependant switch to new context; also called
for "old" emulation from sys_execve() if emulation of executed program and
the original process is different
This was discussed on tech-kern.
setgroups() did not result in actual changes. This has the nice
side effect that we don't needlesly allocate new credential and
resource limit data structures.
This is so that non setuid programs that call seteuid(getuid()),
don't end up setting P_SUGID, resulting in broken behavior [i.e.
non setuid ssh, doesn't read ~/.hostaliases...].
This is a good candidate for a pullup, if someone reviews it.
`struct vmspace' has a new field `vm_minsaddr' which is the user TOS.
PS_STRINGS is deprecated in favor of curproc->p_pstr which is derived
from `vm_minsaddr'.
Bump the kernel version number.
file to write out. If both are 0, the whole file is synced. A filesystem
that is not able to sync out a range of a file may elect to sync
the whole file anyway.
of the vnode ops, and if LKM support is included in the kernel,
always include the non-inline stubs regardless of whether or not
they're being used in the static kernel iamge.
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
make all local variables static
use simplelocks - we really need only mutexes, full locks are not necessary
update couple of comments to be more accurate
add function pty_maxptys(), which provides a safe way to get&set maxptys - this
also supports setting maxptys to lower than current value, if the
value is lower or equal current number of ptys
to support arbitrary number of ptys without need of kernel recompile
(the extra device special files in /dev/ still need to be created, of course)
upper limit of supported ptys is controlled via new sysctl variable
kern.maxptys (KERN_MAXPTYS), which is raise-only and defaults to 512.
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.
XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.
tsleep() instead of DELAY. Also, keep trying flushing buffers when the
number of dirty buffers decreases (20 rounds may not be enouth for a
very large buffer cache).
Using tsleep instead of delay gives a chance to others kernel threads to run,
which is needed for raidframe. With this change I've not been able to
reproduce the 'dirty buffer not flushed' problem with raidframe.
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.
sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.
to update it, so don't bother with <machine/atomic.h>
Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.
- In simple_lock_switchcheck(), allow/enforce exactly one lock to be
held: sched_lock.
- Per e-mail to tech-smp from Bill Sommerfeld, r/w spin locks have
an interlock at splsched(), rather than splhigh().
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.
Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.
- LOCK_ASSERT(), which expands to KASSERT() if LOCKDEBUG.
- new simple_lock_held(), which tests if the calling CPU holds
the specified simple lock.
From Bill Sommerfeld, modified slightly by me.
instead test for (p->p_flag & I_INMEM), and don't access the U-area
(via p->p_stats) if that bit is clear. Fixes the hangs people have
seen when the system is paging and the user runs top/ps/w.
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.
NTP is not defined.
Also removes sysctl_ntptime, since that's unreferenced without NTP.
ntp_gettime(2) is left alone, since it doesn't raise SIGSYS, which sys_nosys()
does.
sig = (int)(long)*(caddr_t *)data;
to *properly* dereference the passed data. this makes signals on
ptys actually *work* on the sparc64 port. from mycroft.
XXX: the release branch version needs this ASAP as it is probably
unstable on ILP32BE.
int lf_advlock __P((struct lockf **,
off_t, caddr_t, int, struct flock *, int));
to
int lf_advlock __P((struct vop_advlock_args *, struct lockf **, off_t));
This matches common usage and is also compatible with similar change
in FreeBSD (though they use u_quad_t as last arg).
Move platform db_trap callback from arch/mips into ddb as suggested by
jhawk. This callback is used by platform code to manage things like
watchdogs that should be disabled while in ddb. Done as a callback
for processors such as mips that support lots of different systems.
vslock the user pages for the data being copied out to userspace,
so that we won't sleep while holding a lock in case we need to
fault the pages in.
- Sprinkle some const and ANSI'ify some things while here.
Stops sleeps from returning early (by up to a clock tick), and return 0
ticks for timeouts that should happen now or in the past.
Returning 0 is different from the legacy hzto() interface, and callers
need to check for it.
check that newstart + size - 1 doesn't overflow the end of the extent, rather
than the "dontcross" value, which can easily overflow the end of an extent
when being asked for an object with a large boundary requirement. this test
is more valid, in any case, and fixes extent_alloc() failure when the start of
the extent is not "aligned".
to machine memory size upon boot if the number has not been specified
explicitly in kernel config - at this moment, 0.5% of system
memory is used for vnodes (but minimum NVNODE vnodes)
use that to inform about way to raise current limit when we reach maximum
number of processes, descriptors or vnodes
XXX hopefully I catched all users of tablefull()
<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>
also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
developed by Christopher G. Demetriou for the NetBSD Project.") with
a generic NetBSD one ("This product includes software developed for the
NetBSD Project. See http://www.netbsd.org/ for information about NetBSD.")
so that this same set of terms can be used by others if they so desire.
(Eventually i'll be converting more/all of my code.)
* Handle KERN_PROC_SESSION that has been defined in <sys/sysctl.h> from
day one.
* Add handlers for KERN_PROC_GID and KERN_PROC_RGID.
* If "op" doesn't valid, return EINVAL.
- document a data structure invariant in lockf.h
- add KASSERT() to check the invariant.
- be more consistent about dequeuing ourselves from the blocked list
after a tsleep().
- Fix two places where the invariant is violated.
- correct a few comments here and there
- If we're still following a lock dependancy chain after maxlockdepth
processes and haven't gotten back to the start, assume that we're in a
cycle anyway and return EDEADLK.
Fix is a superset of an existing fix in FreeBSD, but independantly
derived.
Fixes kern/3860.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.
Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.
"KERN_SYSVIPC_SEM_INFO" and "KERN_SYSVIPC_SHM_INFO" to return the
info and data structures for the relevent SysV IPC types. The return
structures use fixed-size types and should be compat32 safe. All
user-visible changes are protected with
#if !defined(_POSIX_C_SOURCE) && !defined(_XOPEN_SOURCE)
Make all variable declarations extern in msg.h, sem.h and shm.h and
add relevent variable declarations to sysv_*.c and remove unneeded
header files from those .c files.
Make compat14 SysV IPC conversion functions and sysctl_file() static.
Change the data pointer to "void *" in sysctl_clockrate(),
sysctl_ntptime(), sysctl_file() and sysctl_doeproc().
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.
While I'm here, comment a few places where there are known issues
for the SMP implementation.
- need deep compare of open files, not a shallow pointer compare.
- reorder fdrelease()/FILE_UNUSE() invocations so fdrelease doesn't
block waiting for something which can't happen until after it returns.
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.
This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.
sys_semconfig into a placebo system call, to avoid giving folks an
easy way to wedge processes which use semaphores.
NOTE: unlike 386bsd and freebsd, processes which did not have
semaphore undo records would not be affected by this problem (reducing
it from a serious local denial-of-service problem to a largely
cosmetic problem, since virtually nobody uses semaphores). But the
code is just Wrong so we're ripping it out anyway.
by Anders Magnusson.
Honor elem_count in the KERN_PROC2 case, as well as overall buffer
space. The only user-land code to use this set the elem_count to
"buffer_space / elem_size", so we've had no incorrect behaviour to date.
- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.
- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.
- Add a second proc argument for inferior() since callers all had
curproc handy.
Also, miscellaneous cleanups in ktrace:
- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.
- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.
- simplify interface to ktrwrite()
state into global and per-CPU scheduler state:
- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.
- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).
- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.
- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.
Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.
db_stack_trace_print(__builtin_frame_address(0),...), to printf() the
stack trace to the message bufffer and console. Idea from SunOS/Solaris.
Useful when dumping fails.
errors from ps(1) and some other kernel grovellers, and return some
data that has previously only been accessable with /dev/kmem read
access. The sysctls are:
+ KERN_PROC2 - return an array of fixed sized "struct kinfo_proc2"
structures that contain most of the useful user-level data in
"struct proc" and "struct user". The sysctl also takes the size of
each element, so that if "struct kinfo_proc2" grows over time old
binaries will still be able to request a fixed size amount of data.
+ KERN_PROC_ARGS - return the argv or envv for a particular process id.
envv will only be returned if the process has the same user id as the
requestor or if the requestor is root.
+ KERN_FSCALE - return the current kernel fixpt scale factor.
+ KERN_CCPU - return the scheduler exponential decay value.
+ KERN_CP_TIME - return cpu time state counters.
With input and suggestions from many people on tech-kern.
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.
Change #define's of the form
#define panic(a) printf(a)
to
#define \
panic(a) printf(a)
to prevent ctags(1) from detecting there is a tag.
Otherwise, the tags file claims panic() is in subr_extent.c
instead of subr_prf.c.
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.
Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.
Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.
Closes PR#8996.
contains the values __SIMPLELOCK_LOCKED and __SIMPLELOCK_UNLOCKED, which
replace the old SIMPLELOCK_LOCKED and SIMPLELOCK_UNLOCKED. These files
are also required to supply inline functions __cpu_simple_lock(),
__cpu_simple_lock_try(), and __cpu_simple_unlock() if locking is to be
supported on that platform (i.e. if MULTIPROCESSOR is defined in the
_KERNEL case). Change these functions to take an int * (&alp->lock_data)
rather than the struct simplelock * itself.
These changes make it possible for userland to use the locking primitives
by including <machine/lock.h>.
MALLOC()/FREE().
- In ktrgenio():
- Don't allocate the entire size of the I/O for the temporary
buffer used to write the data to the trace file. Instead,
do it in page-sized chunks.
- As in uiomove(), preempt the process if we are hogging the CPU.
- If writing to the trace file errors, abort rather than continuing
to loop through the buffer.
From Artur Grabowski <art@stacken.kth.se>, with some additional cleanup
by me.
symlinks, and thus can operate on symlinks. remove a bogus comment in
chflags(1) that claims symlinks do not have file flags.
XXX: todo -- make chflags(1) use lchflags(2) when given the right options.
getnewvnode() has been changed to virtually guarantee that we'll have more
vnodes than "desired", so previously there would always be more vnodes
than namecache entries. this fixes PR 9792.
* Remove the casts to vaddr_t from the round_page() and trunc_page() macros to
make them type-generic, which is necessary i.e. to operate on file offsets
without truncating them.
* In due course, cast pointer arguments to these macros to an appropriate
integral type (paddr_t, vaddr_t).
Originally done by Chuck Silvers, updated by myself.
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.
The old timeout()/untimeout() API has been removed from the kernel.
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.
For each leaf filesystem, add appropriate vfs_done routine.
metadata. If you do call it, there's actually a fair chance that it
will panic because its metadata dependencies were not cleared in
the VOP_FSYNC above (with FSYNC_DATAONLY).
mode and ownership bits are flushed to disk before the vnode is
reclaimed.
The check, introduced in the softdep merge, assumes that if no blocks
are dirty, no file data *or metadata* needs to be flushed to disk. This
is true of ffs, but is not true of lfs, and may not be true of other
filesystems.
Tested by myself and Bill Squier <groo@cs.stevens-tech.edu>.
disk space rather than doing it in timeout handler. This fixes long
standing bug that accounting file can't be put on NFS file system (so,
e.g, we couldn't turn on accounting on diskless system).
between protocol handlers.
ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.
due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.
take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).
this will bump kernel version.
(as discussed in tech-net, tested in kame tree)
the change constitutes binary compatibility issue hen sizeof(long) !=4.
there's no way to be backward compatible, and only guys affected
are IPv6 userland tools.
From: =?iso-8859-1?Q?G=F6ran_Bengtson?= <goeran@cdg.chalmers.se>