fhstatvfs1 and getvfsstat)
o Move the statfs family out of netbsd32_fs.c and netbsd32_netbsd.c to
netbsd_compat_20.c, compiled with COMPAT_20
Reviewed by christos@.
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
from here: set p_execsw to the new thing, and call
the new emulation's syscall_intern()
XXX there are more differences to kern_exec.c, sa/ras
related afaics, this is harmliss for now since
netbsd32 doesn't support multithreaded programs yet --
one day one execve() implementation should be shared
by native and netbsd32 code.
- delete ktrsyscall32()
- add a check #ifdef _LP64 to do the conversion if P_32 is set to the
standard ktrsyscall()
- add a couple of similar _LP64/P_32 checks to the systrace code.
this should get systrace working for 32 bit apps as well as complete
ktrace support for "trace_enter/trace_exit" using platforms such as amd64.
XXX: systrace isn't supported on sparc64 currently... (it doesn't use
trace_enter/trace_exit, or have it's own calls to systrace_xxx()...)
the main purpose of this function is to adjust the "argsize" value of
the ktrace syscall record, otherwise userland will see N/2 (rounded
down) arguments instead of N.
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.
Patch reviewed by Christos.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
so that a specific emulation has the oportunity to filter out some signals.
if sigfilter returns 0, then no signal is sent by kpsignal2().
There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.
This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)
* introduce fsetown(), fgetown(), fownsignal() - this sets/retrieves/signals
the owner of descriptor, according to appropriate sematics
of TIOCSPGRP/FIOSETOWN/SIOCSPGRP/TIOCGPGRP/FIOGETOWN/SIOCGPGRP ioctl; use
these routines instead of custom code where appropriate
* make every place handling TIOCSPGRP/TIOCGPGRP handle also FIOSETOWN/FIOGETOWN
properly, and remove the translation of FIO[SG]OWN to TIOC[SG]PGRP
in sys_ioctl() & sys_fcntl()
* also remove the socket-specific hack in sys_ioctl()/sys_fcntl() and
pass the ioctls down to soo_ioctl() as any other ioctl
change discussed on tech-kern@
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:
- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5
- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.
originally from openbsd, adapted for netbsd by me.
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
- The MD netbsd32_machdep.h header now defines the 32-bit pointer type
instead of using u_int32_t everywhere,
- The MD netbsd32_machdep.h header now defines a macro (at least on
current implementations) which converts a 32-bit pointer to its 64-bit
equivalent,
- Change the MI code to utilise the above two items in all the right places,
- Implement netbsd32___sigaction_sigtramp().
Tested on Sparc64 by Matt Green.
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.
- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports
- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));
- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.
* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.
Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.
at all, it's only needed in LKM case
use #if defined(LKM) || defined(_LKM) condition for netbsd32_execve.c,
to DTRT when either compiled statically into kernel with LKM support,
or compiled as a LKM
executable mappings. Stop overloading VTEXT for this purpose (VTEXT
also has another meaning).
- Rename vn_marktext() to vn_markexec(), and use it when executable
mappings of a vnode are established.
- In places where we want to set VTEXT, set it in v_flag directly, rather
than making a function call to do this (it no longer makes sense to
use a function call, since we no longer overload VTEXT with VEXECMAP's
meaning).
VEXECMAP suggested by Chuq Silvers.
format specific.
Struct emul has a e_setregs hook back, which points to emulation-specific
setregs function. es_setregs of struct execsw now only points to
optional executable-specific setup function (this is only used for
ECOFF).
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(
This code was tested to pass regression tests.
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.
This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.
MNT_NOSUID, just check MNT_NOSUID to clear the S{U,G}ID bits
in the attributes for the vnode we're about to exec.
We now check P_TRACED right before we would actually perform
the s{u,g}id function in the exec code.
This closes a race condition between exec of a setuid binary
and ptrace(2).
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.
While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.
callers and appropriate routines to cope. This makes fo_stat more
consistent with rest of fileops routines and also makes the fo_stat
match FreeBSD as an added bonus.
Discussed with Luke Mewburn on tech-kern@.
now obsolete, so that kernels will at least compile. I guess it was too
much trouble to change all 10 call sites, or perhaps, these days, only
things that build on i386 are important. Maybe it's the full moon tonight.
de-static netbsd32_from_stat43().
move the guts of netbsd32_execve() into netbsd32_execve2().
all of are for the forthcoming sunos32 compat mode (for sparc64).
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx
This addresses kern/10981 by Matthew Orgass.
>revision 1.25
>date: 2000/12/01 08:59:02; author: mrg; state: Exp; lines: +2 -2
>in netbsd32_elf32_probe(), 'pos' is really a pointer to an Elf_Addr, not a
>vaddr_t. cast the pointer before dereferencing it to avoid the alignment
>fault that broke compat_netbsd32, cuz pos is defined like:
> Elf_Addr phdr = 0, pos = 0;
>in exec_elf32.c.
*_emul_path variables
change macros CHECK_ALT_{CREAT|EXIST} to use that, 'root' doesn't need
to be passed explicitly any more and *_CHECK_ALT_{CREAT|EXIST} are removed
change explicit emul_find() calls in probe functions to get the emulation
path from the checked exec switch entry's emulation
remove no longer needed header files
add e_flags and e_syscall to struct emul; these are unsed and empty for now
vaddr_t. cast the pointer before dereferencing it to avoid the alignment
fault that broke compat_netbsd32, cuz pos is defined like:
Elf_Addr phdr = 0, pos = 0;
in exec_elf32.c.
* move all exec-type specific information from struct emul to execsw[] and
provide single struct emul per emulation
* elf:
- kern/exec_elf32.c:probe_funcs[] is gone, execsw[] how has one entry
per emulation and contains pointer to respective probe function
- interp is allocated via MALLOC() rather than on stack
- elf_args structure is allocated via MALLOC() rather than malloc()
* ecoff: the per-emulation hooks moved from alpha and mips specific code
to OSF1 and Ultrix compat code as appropriate, execsw[] has one entry per
emulation supporting ecoff with appropriate probe function
* the makecmds/probe functions don't set emulation, pointer to emulation is
part of appropriate execsw[] entry
* constify couple of structures
`struct vmspace' has a new field `vm_minsaddr' which is the user TOS.
PS_STRINGS is deprecated in favor of curproc->p_pstr which is derived
from `vm_minsaddr'.
Bump the kernel version number.
NTP is not defined.
Also removes sysctl_ntptime, since that's unreferenced without NTP.
ntp_gettime(2) is left alone, since it doesn't raise SIGSYS, which sys_nosys()
does.
vslock the user pages for the data being copied out to userspace,
so that we won't sleep while holding a lock in case we need to
fault the pages in.
- Sprinkle some const and ANSI'ify some things while here.
- fix pread/pwrite return values (plus some other syscalls that looked
similarly broken).
- prototypes and clean up for netbsd32_ioctl.c
now getpw*() works under compat32!
<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>
also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.
- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.
- Add a second proc argument for inferior() since callers all had
curproc handy.
Also, miscellaneous cleanups in ktrace:
- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.
- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.
- simplify interface to ktrwrite()
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.
The old timeout()/untimeout() API has been removed from the kernel.
sys_write().
netbsd32_getfsstat() cannot just copyin()/copyout(), convert the structures,
and call sys_getffstat(). sys_getffstat() wants to do its own
copyin()/copyout(). So we need to implent the whole of sys_getffstat()
in netbsd32_getfsstat().
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.
that is priority is rasied. Add a new spllowersoftclock() to provide the
atomic drop-to-softclock semantics that the old splsoftclock() provided,
and update calls accordingly.
This fixes a problem with using the "rnd" pseudo-device from within
interrupt context to extract random data (e.g. from within the softnet
interrupt) where doing so would incorrectly unblock interrupts (causing
all sorts of lossage).
XXX 4 platforms do not have priority-raising capability: newsmips, sparc,
XXX sparc64, and VAX. This platforms still have this bug until their
XXX spl*() functions are fixed.
body of reaper(), right before the call to uvm_exit(). cpu_wait() must
be done before uvm_exit() because the resources it frees might be located
in the PCB.
which use uvm_vslock() should now test the return value. If it's not
KERN_SUCCESS, wiring the pages failed, so the operation which is using
uvm_vslock() should error out.
XXX We currently just EFAULT a failed uvm_vslock(). We may want to do
more about translating error codes in the future.
vslocking here?! copyout() on its own seems to suffice just about everwhere
else, and it's not like the process is going to exit; it's in a system
call!
count is 0, wait for use count to drain before finishing the close.
This is necessary in order for multiple processes to safely share file
descriptor tables.