While here, don't make too much use of one random value, and call
arc4random() directly. Allows for the removal of 'ep_random' from the
exec_package.
Prompted by and okay christos@.
For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack
For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack
(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie
This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.
Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
- the code to convert from a vnode to a path is commented out now until
a better solution is implemented. Only absolute paths work for now
(which is most of the cases).
requested by core
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.
Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.
reviewed by tech-kern & wrstuden
from doc/BRANCHES:
idle lwp, and some changes depending on it.
1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.
Restores source compatibility with pre-newlock2 tools like ps or top.
Reviewed by Andrew Doran.
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.
introduce fileassoc(9), a kernel interface for associating meta-data with
files using in-kernel memory. this is very similar to what we had in
veriexec till now, only abstracted so it can be used more easily by more
consumers.
this also prompted the redesign of the interface, making it work on vnodes
and mounts and not directly on devices and inodes. internally, we still
use file-id but that's gonna change soon... the interface will remain
consistent.
as a result, veriexec went under some heavy changes to conform to the new
interface. since we no longer use device numbers to identify file-systems,
the veriexec sysctl stuff changed too: kern.veriexec.count.dev_N is now
kern.veriexec.tableN.* where 'N' is NOT the device number but rather a
way to distinguish several mounts.
also worth noting is the plugging of unmount/delete operations
wrt/fileassoc and veriexec.
tons of input from yamt@, wrstuden@, martin@, and christos@.
For example, this used to give false logs of matching fingerprint for
foo.sh while foo.sh don't have an entry, and the program executed (and
matching the fingerprint) is the interpreter - /bin/sh.
as an argument a function that will retrieve an element of the pointer
arrays in user space. This allows COMPAT_NETBSD32 to share the code for
the emulated version of execve(2), and fixes various issues that came from
the slow drift between the two implementations.
Note: when splitting up a syscall function, I'll use two different ways
of naming the resulting helper function. If it stills does
copyin/out operations, it will be named <syscall>1(). If it does
not (as it was the case for get/setitimer), it will be named
do<syscall>.
Fix an issue when scripts are executed under systrace where the argv[0]
would be normalized, and hence break scripts that depend on how they were
called.
* We now use hash tables instead of a list to store the in kernel
fingerprints.
* Fingerprint methods handling has been made more flexible, it is now
even simpler to add new methods.
* the loader no longer passes in magic numbers representing the
fingerprint method so veriexecctl is not longer kernel specific.
* fingerprint methods can be tailored out using options in the kernel
config file.
* more fingerprint methods added - rmd160, sha256/384/512
* veriexecctl can now report the fingerprint methods supported by the
running kernel.
* regularised the naming of some portions of veriexec.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
define and use vm_map_set{min,max}() for modifying these values.
remove the {min,max}_offset aliases for these vm_map fields to be more
namespace-friendly. PR 26475.
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.
while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.
size 0.
This way, individual ports can circumvent sigcode mapping
by setting sigcode/esigcode.
(would be better to clean up the __HAVE_SIGINFO/COMPAT_XX
stuff, but it is not a good moment now)
exec case, as the emulation already has the ability to intercept that
with the e_proc_exec hook. It is the responsability of the emulation to
take appropriaye action about lwp_emuldata in e_proc_exec.
Patch reviewed by Christos.
so that a specific emulation has the oportunity to filter out some signals.
if sigfilter returns 0, then no signal is sent by kpsignal2().
There is another place where signals can be generated: trapsignal. Since this
function is already an emulation hook, no call to the sigfilter hook was
introduced in trapsignal.
This is needed to emulate the softsignal feature in COMPAT_DARWIN (signals
sent as Mach exception messages)
- use splay tree for the pagefault check if the thread was running on
an upcall stack.
=> removes the limitation that all upcall stacks need to be
adjoining and that all upcall stacks have to be loaded with the
1st sys_sa_stacks call.
=> enables keeping information associated with a stack in the kernel
which makes it simpler to find out which LWP is using a stack.
=> allows increasing the SA_MAXNUMSTACKS without having to
allocate an array of that size.
of the sibling list so that find_stopped_child can be optimised to avoid
traversing the entire sibling list - helps when a process has a lot of
children.
- Modify locking in pfind() and pgfind() to that the caller can rely on the
result being valid, allow caller to request that zombies be findable.
- Rename pfind() to p_find() to ensure we break binary compatibility.
- Remove svr4_pfind since p_find willnow do the job.
- Modify some of the SMP locking of the proc lists - signals are still stuffed.
Welcome to 1.6ZF
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:
- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5
- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.
originally from openbsd, adapted for netbsd by me.
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V