the list of routines that need to be called for setting up sysctl
variables. This worked great for all code included in the kernel
itself, but didn't deal with modules that want to create their own
sysctl data. So, we ended up with a lot of #ifdef _MODULE blocks
so modules could explicitly call their setup functions when loaded
as non-built-in modules.
So today, we complete the task that was started so many years ago.
When modules are loaded, after we've called xxx_modcmd(INIT...) we
check if the module contains its own __link_set_sysctl_funcs, and
if so we call the functions listed. We add a struct sysctllog member
to the struct module so we can call sysctl_teardown() when the module
gets unloaded. (The sequence of events ensures that the sysctl stuff
doesn't get created until the rest of the module's init code does any
required memory allocation.)
So, no more need to explicitly call the sysctl setup routines when
built as a loadable module.
There's no good reason for these build-time parameters to be allowed to
panic the kernel when it is easy to simply disable the module code and
fail gracefully.
While we're at it, similarly replace panic() when malloc fails to also
fail gracefully.
instead of the whole ds structure.
besides triggering a recently added assert in netbsd32, this stops
exposing kernel addresses.
copy the mode clamping to 0777 from sem to shm and msg.
while here, make sure that the compat callers to sysv_ipc clear
the contents of the compat structure before setting the result
members to ensure padding bytes are cleared.
don't set/copy _sem_base, _msg_first, _msg_last or _shm_internal.
even if used, which seems very dodgy, they leak KVAs as well.
possibly this may affect linux binaries, in particular, the
comments around _shm_internal ("XXX Oh well.") may mean apps
rely upon these but hopefully not -- the comments date back to
rev 1.1 in 1995.
the _key, _seq and _msg_cbytes members are exported as before as
i found multiple consumers of these (no less than ipcs(1), and
they appear to be useful for debugging and more.
XXX: the naming of compat functions have too many styles. there
are at least 3 different ones changed here.
These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.
HOWEVER! Some subsystems have
#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))
even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.
To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.
I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:
cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))
It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.
Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.
module code from module_init() rather than waiting until after calling
exec_init(). Use a RUN_ONCE routine at entry to each sys_sem* syscall
to establish the exithook, and no longer KASSERT that the hook has
been set before removing it. (A manually loaded module can be unloaded
before any syscalls have been invoked.)
Remove the conditional calls to the various xxx_init() routines from
init_main.c - we now rely on module_init() to handle initialization.
Let each sub-component's xxx_init() routine handle its own sysctl
sub-tree initialization; this removes another set of #ifdef ugliness.
Tested both built-in and loadable versions and verified that atf
test kernel/t_sysv passes.
we can correctly clean up on process exit or fork.
Without this, firefox attaches to a shared memory segment but doesn't
detach before exit. Thus once firefox causes an autoload for sysv_ipc
it cannot be unloaded since the segment still retains references.
clean up after ourselves. Mostly, this checks to make sure that
there are no active itmes, and then deallocates wired kernel virtual
memory. For SYSVSEM, we also disestablish the exithook() so we
won't try to call it after destroying its memory pool!
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.
- Simplify locking in some pmap(9) modules by removing P->V locking.
- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).
- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.
- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.
Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
- Lock processes, credentials, filehead etc correctly.
- Acquire a read hold on sysctl_treelock if only doing a query.
- Don't wire down the output buffer. It doesn't work correctly and the code
regularly does long term sleeps with it held - it's not worth it.
- Don't hold locks other than sysctl_lock while doing copyout().
- Drop sysctl_lock while doing copyout / allocating memory in a few places.
- Don't take kernel_lock for sysctl.
- Fix a number of bugs spotted along the way
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.