use with mprotect(2), but without enabling them immediately.
Extend the mremap(2) interface to allow duplicating mappings, i.e.
create a second range of virtual addresses references the same physical
pages. Duplicated mappings can have different effective protections.
Adjust PAX mprotect logic to disallow effective protections of W&X, but
allow one mapping W and another X protections. This obsoletes using
temporary files for purposes like JIT.
Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested
and not silently drop the X protection.
Improve test cases to ensure correct operation of the changed
interfaces.
and reduce the length of the randomization bits since this is unused.
- call the pax aslr stack function sooner so we don't need to re-adjust the
stack size.
- adjust the stack max resource limit to account for the maximum space that
can be lost by aslr
- tidy up debugging printfs
months ago, but no one reviewed it - probably because it's not a trivial
change.
This change fixes the following bug: when loading a PaX'ed binary, the
kernel updates the PaX flag of the calling process before it makes sure
the new process is actually launched. If the kernel fails to launch the
new process, it does not restore the PaX flag of the calling process,
leaving it in an inconsistent state.
Actually, simply restoring it would be horrible as well, since in the
meantime another thread may have used the flag.
The solution is therefore: modify all the functions used by PaX so that
they take as argument the exec package instead of the lwp, and set the PaX
flag in the process *right before* launching the new process - it cannot
fail in the meantime.
- Use MAXSSIZ32 instead of MAXSSIZ for 32 bit binaries
- Default MAXXSIZ32 to a quarter of MAXSSIZ (good enough?)
- Add debugging
XXX: Note that:
- sparc32 MAXSSIZ is larger than sparc64 MAXSSIZ
- sparc64 MAXSSIZ32 != sparc32 MAXSSIZ
- since the size is unsigned, don't check just that it is > 0, but limit
it to the MAXSSIZ
- if the stack size is reduced because of aslr, make sure we reduce the
actual allocation by the same size so that the size does not wrap around.
NB: Must be pulled up to 5.x!
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.
- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.
- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)
- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)
- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.
- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)
this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.
tested on i386 and sparc64, build tested on several other platforms.
thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack
For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack
(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie
This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.
Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.
Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.
reviewed by tech-kern & wrstuden
code and not trying to use temporary solutions.
Lots of comments and help from YAMAMOTO Takashi, also thanks to the PaX
author for being quick to recognize that something fishy's going on. :)
Hook up in mmap/vmcmd rather than (ugh!) uvm_map_protect().
Next time I suggest to commit a temporary solution just revoke my
commit bit.
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:
- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5
- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.
originally from openbsd, adapted for netbsd by me.
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
means that the dynamic linker gets mapped in at the top of available
user virtual memory (typically just below the stack), shared libraries
get mapped downwards from that point, and calls to mmap() that don't
specify a preferred address will get mapped in below those.
This means that the heap and the mmap()ed allocations will grow
towards each other, allowing one or the other to grow larger than
before. Previously, the heap was limited to MAXDSIZ by the placement
of the dynamic linker (and the process's rlimits) and the space
available to mmap was hobbled by this reservation.
This is currently only enabled via an *option* for the i386 platform
(though other platforms are expected to follow). Add "options
USE_TOPDOWN_VM" to your kernel config file, rerun config, and rebuild
your kernel to take advantage of this.
Note that the pmap_prefer() interface has not yet been modified to
play nicely with this, so those platforms require a bit more work
(most notably the sparc) before they can use this new memory
arrangement.
This change also introduces a VM_DEFAULT_ADDRESS() macro that picks
the appropriate default address based on the size of the allocation or
the size of the process's text segment accordingly. Several drivers
and the SYSV SHM address assignment were changed to use this instead
of each one picking their own "default".
(1) ELFNAME(load_file)() now takes a pointer to the entry point
offset, instead of taking a pointer to the entry point itself. This
allows proper adjustment of the ultimate entry point at a higher level
if the object containing the entry point is moved before the exec is
finished.
(2) Introduce VMCMD_FIXED, which means the address at which a given
vmcmd describes a mapping is fixed (ie, should not be moved). Don't
set this for entries pertaining to ld.so.
Also some minor comment/whitespace tweaks.