which can either be copied directly to userspace, or converted then copied.
Saves replicating a lot of code in the compat functions (esp. for
getvfsstat) at a cast of an extra function call in the non-emulated case -
which is unlikely to be measurable given the other costs of the actions
involved (even on vax).
Remove dofhstat() and dofhstatvfs() (and the last caller).
Remove some redundant stackgap_init() calls.
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.
pointers to and from 64bit kernel pointers. Instead use the defines
NETBSD32PTR64(p32) to read a 32bit pointer and (the new) NETBSD32PTR32(p32,p64)
to write a 32bit pointer throughout.
The 32bit pointer is now a struct to enforce the above.
amd64 (with linux emul) and sparc64 will both compile (when the arch stuff
goes in soon), and amd64 still runs some i386 binaries.
sys_stat() and friends, instead use do_sys_stat() and do_sys_fstat()
that write the answer into a kernel buffer (on stack) that can be
converted to the correct form and written the userspace.
I've test compiled a few kernels, and tested i386 netbsd1.6 ls.
Given I think I've fixed some bugs, it might be 50-50 with new ones.
netbsd32_lwp.c, and remove remaining traces of SA.
This still needs some MD (and possibly MI, depending on the chosen
solution) changes to actually work.
Patch by Slava Semushin <slava.semushin@gmail.com>
Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
> It seems that 32bits programs, running under compat_netbsd32, using
> setrlimit force all other programs to have their maximum data size
> fixed at 3GB, where native 64bits apps used 8GB previously.
I tracked this one to the `netbsd32_adjust_limits()' function (called
when creating a new process under compat_netbsd32), where data and
stack limits are set without checking for shared `p_limit' structure
(p_limit->p_refcnt > 1). This explain the side effect where processes
have their limits changed when a compat_netbsd32 (or compat_linux32)
program is run.
The fix is to use `dosetrlimit()' to ensure the needed copy-on-write
behaviour for shared structure.
- Remove useless (thanks to COMPAT_XX behaviour) #ifdefs in
syscalls.master
- Make netbsd32_compat_43.c compiled per COMPAT_LINUX32 because the latter
needs stuff from it.
Fixes Perry's PR#34951.
implies that _UC_CPU must be set in the context passed. Check for this
and return EINVAL if not; this gives a cheap test for corrupted
ucontexts eg on a signal handler stack which would go unnoticed otherwise.
-Don't ckeck for NULL ucontext pointers explicitely. This is an error,
except in the swapcontext() case where it can be easily caught in
userland.
- remove the support of variable-sized filehandle from compat version of
syscalls. (strictly speaking, it breaks abi. i don't think it's a problem
because this feature is short-lived and there are no affected in-tree
filesystems.)
- unify vfs_copyinfh_alloc and vfs_copyinfh_alloc_size.
- vfs_copyinfh_alloc_size: check fhsize strictly.
- reduce code duplication between compat and current syscalls.
While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.
Discussed on tech-kern, with lots of help from yamt (thanks!).
EPROTONOSUPPORT instead of EAFNOSUPPORT.
from pavel@ with a little bit of clean up from myself.
XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
will change "struct ntptimeval", so some translation would be necessary.
ntp_gettine is considered dispensable, the only userland program known
to use it is "ntptime".
- proc_is_traced_p() -> trace_is_enabled(), to match trace_enter() and
trace_exit().
- trace_is_enabled() becomes a real function.
- Remove unnecessary include files from various files that used to care
about KTRACE and SYSTRACE, but do no more.
- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
code.
- To achieve COMPAT_NETBSD32 compatibility, introduce a parameter to
kevent1 that points to functions that do the actual copyin/copyout
operations. This is similar to what was done in FreeBSD by Paul Saab.
- Add the COMPAT_NETBSD32 definitions and hooks.
discussed in the PR.
- introduce sys/timevar.h to hold kernel-specific stuff relevant to
sys/time.h. Ideally, timevar.h would contain all (or almost) of the
#ifdef _KERNEL part of time.h, but that's a pretty big and tedious
change to make. For now, it will contain only the prototypes I
introduced when working on COMPAT_NETBSD32.
- split copyinout_t into copyin_t and copyout_t, it makes prototypes more
explicit about the meaning of a given argument. Suggested by yamt@.
- move copyinout_t definition in sys/time.h to systm.h as copyin_t and
copyout_t
- make everything uses the new types and include the proper headers at
the proper places.
as an argument a function that will retrieve an element of the pointer
arrays in user space. This allows COMPAT_NETBSD32 to share the code for
the emulated version of execve(2), and fixes various issues that came from
the slow drift between the two implementations.
Note: when splitting up a syscall function, I'll use two different ways
of naming the resulting helper function. If it stills does
copyin/out operations, it will be named <syscall>1(). If it does
not (as it was the case for get/setitimer), it will be named
do<syscall>.
relevant code with the COMPAT_NETBSD32 version, and make the latter use
the new functions.
This fixes netbsd32_setitimer() which had drifted from the native syscall
and did not work properly anymore.
so, it introduces breakage because a lot of applications make assumptions
from its value. It's especially bad in the sparc64 case, where 64-bits
instructions can be used in 32-bits addressing mode. However, there are
other means to know the capabilities of the CPU.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
i/o is done. Instead, pass an opaque cookie which is then passed to a
new routine, coredump_write, which does the actual i/o. This allows the
method of doing i/o to change without affecting any future MD code.
Also, make netbsd32_core.c [re]use core_netbsd.c (in a similar manner that
core_elf64.c uses core_elf32.c) and eliminate that code duplication.
cpu_coredump{,32} is now called twice, first with a NULL iocookie to fill
the core structure and a second to actually write md parts of the coredump.
All i/o is nolonger random access and is suitable for shipping over a stream.
New features:
- Add a veriexec_report() routine to make most reporting consistent and
remove some common code.
- Add 'strict' mode that controls how veriexec behaves.
- Add sysctl knobs:
o kern.veriexec.verbose controls verbosity levels. Value: 0, 1.
o kern.veriexec.strict controls strict level. Values: 0, 1, 2. See
documentation in sysctl(3) for details.
o kern.veriexec.algorithms returns a string with a space separated
list of supported hashing algorithms in veriexec.
- Updated documentation in man pages for sysctl(3) and sysctl(8).
Bug fixes:
- veriexec_removechk(): Code cleanup + handle FINGERPRINT_NOTEVAL
correctly.
- exec_script(): Don't pass 0 as flag when executing a script; use the
defined VERIEXEC_INDIRECT - which is 1. Makes indirect execution
enforcement work.
- Fix some printing formats and types..
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
do { ... } while(/*CONSTCOND*/0)
so that they can be used unadorned in if/else blocks, etc. This means
that you now *have* to put a ; at the end of the "call" to these
macros.
fhstatvfs1 and getvfsstat)
o Move the statfs family out of netbsd32_fs.c and netbsd32_netbsd.c to
netbsd_compat_20.c, compiled with COMPAT_20
Reviewed by christos@.
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
from here: set p_execsw to the new thing, and call
the new emulation's syscall_intern()
XXX there are more differences to kern_exec.c, sa/ras
related afaics, this is harmliss for now since
netbsd32 doesn't support multithreaded programs yet --
one day one execve() implementation should be shared
by native and netbsd32 code.
- delete ktrsyscall32()
- add a check #ifdef _LP64 to do the conversion if P_32 is set to the
standard ktrsyscall()
- add a couple of similar _LP64/P_32 checks to the systrace code.
this should get systrace working for 32 bit apps as well as complete
ktrace support for "trace_enter/trace_exit" using platforms such as amd64.
XXX: systrace isn't supported on sparc64 currently... (it doesn't use
trace_enter/trace_exit, or have it's own calls to systrace_xxx()...)
the main purpose of this function is to adjust the "argsize" value of
the ktrace syscall record, otherwise userland will see N/2 (rounded
down) arguments instead of N.
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.
Patch reviewed by Christos.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
so that a specific emulation has the oportunity to filter out some signals.
if sigfilter returns 0, then no signal is sent by kpsignal2().
There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.
This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)
* introduce fsetown(), fgetown(), fownsignal() - this sets/retrieves/signals
the owner of descriptor, according to appropriate sematics
of TIOCSPGRP/FIOSETOWN/SIOCSPGRP/TIOCGPGRP/FIOGETOWN/SIOCGPGRP ioctl; use
these routines instead of custom code where appropriate
* make every place handling TIOCSPGRP/TIOCGPGRP handle also FIOSETOWN/FIOGETOWN
properly, and remove the translation of FIO[SG]OWN to TIOC[SG]PGRP
in sys_ioctl() & sys_fcntl()
* also remove the socket-specific hack in sys_ioctl()/sys_fcntl() and
pass the ioctls down to soo_ioctl() as any other ioctl
change discussed on tech-kern@
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:
- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5
- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.
originally from openbsd, adapted for netbsd by me.
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
- The MD netbsd32_machdep.h header now defines the 32-bit pointer type
instead of using u_int32_t everywhere,
- The MD netbsd32_machdep.h header now defines a macro (at least on
current implementations) which converts a 32-bit pointer to its 64-bit
equivalent,
- Change the MI code to utilise the above two items in all the right places,
- Implement netbsd32___sigaction_sigtramp().
Tested on Sparc64 by Matt Green.
memory fault handler. IRIX uses irix_vm_fault, and all other emulation
use NULL, which means to use uvm_fault.
- While we are there, explicitely set to NULL the uninitialized fields in
struct emul: e_fault and e_sysctl on most ports
- e_fault is used by the trap handler, for now only on mips. In order to avoid
intrusive modifications in UVM, the function pointed by e_fault does not
has exactly the same protoype as uvm_fault:
int uvm_fault __P((struct vm_map *, vaddr_t, vm_fault_t, vm_prot_t));
int e_fault __P((struct proc *, vaddr_t, vm_fault_t, vm_prot_t));
- In IRIX share groups, all the VM space is shared, except one page.
This bounds us to have different VM spaces and synchronize modifications
to the VM space accross share group members. We need an IRIX specific hook
to the page fault handler in order to propagate VM space modifications
caused by page faults.
* struct sigacts gets a new sigact_sigdesc structure, which has the
sigaction and the trampoline/version. Version 0 means "legacy kernel
provided trampoline". Other versions are coordinated with machine-
dependent code in libc.
* sigaction1() grows two more arguments -- the trampoline pointer and
the trampoline version.
* A new __sigaction_sigtramp() system call is provided to register a
trampoline along with a signal handler.
* The handler is no longer passed to sensig() functions. Instead,
sendsig() looks up the handler by peeking in the sigacts for the
process getting the signal (since it has to look in there for the
trampoline anyway).
* Native sendsig() functions now select the appropriate trampoline and
its arguments based on the trampoline version in the sigacts.
Changes to libc to use the new facility will be checked in later. Kernel
version not bumped; we will ride the 1.6C bump made recently.