This means that the full executable path is always available.
- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change
TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?
This would have caught the mistake I made last week leading to null
pointer dereferences all over the place, a mistake which I evidently
poorly scheduled alongside maxv's change to the panic message on x86
for null pointer dereferences.
the kernel and module symbols, and when relocating a symbol that has
SHN_ABS, take its value as-is and don't return an error if it equals zero.
Sent on tech-kern@.
PROC_MACHINE_ARCH32(P) to override the value for sysctl hw.machine_arch
(native and netbsd32 commpat resp.).
Use these for arm and mips instead of the (not working, noisy, in case
of arm) sysctl override and #ifdef __mips__ in architecture neutral
code.
Candidate fix for problems with hanging after kva fragmentation related
to PR kern/45718.
Proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html
Tested by bouyer@ on i386.
This makes one small change to the semantics of pool_prime and
pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if
there is a pending call to the backing allocator in another thread but
we are not actually out of memory. That is unlikely because nearly
always these are used during initialization, when the pool is not in
use.
XXX pullup-8
XXX pullup-7
XXX pullup-6 (requires tweaking the patch)
XXX pullup-5...
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
vmin is only an optional hint since we're not passing UVM_FLAG_FIXED,
but that doesn't mean we should use uninitialized stack garbage as
the hint.
Noted by chs@.
Candidate fix for PR kern/45718: `processes sometimes get stuck and
spin in vm_map', a problem that has been plaguing all our 32-bit
ports for years.
Since we currently use large (256k) buffers for execargs, and since
nobody has stepped up to tackle breaking them into bite-sized (or at
least page-sized) chunks, after KVA gets sufficiently fragmented we
can't allocate new execargs buffers from kernel_map.
Until 2008, we always carved out KVA for execargs on boot with a uvm
submap exec_map of kernel_map. Then ad@ found that the uvm_km_free
call, to discard them when done, cost about 100us, which a pool
avoided:
https://mail-index.NetBSD.org/tech-kern/2008/06/25/msg001854.htmlhttps://mail-index.NetBSD.org/tech-kern/2008/06/26/msg001859.html
ad@ _simultaneously_ introduced a pool _and_ eliminated the reserved
KVA in the exec_map submap. This change preserves the pool, but
restores exec_map (with less code, by putting it in MI code instead
of copying it in every MD initialization routine).
Patch proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/19/msg022461.html
Patch tested by bouyer@:
https://mail-index.NetBSD.org/tech-kern/2017/10/20/msg022465.html
I previously discussed the issue on tech-kern before I knew of the
history around exec_map:
https://mail-index.NetBSD.org/tech-kern/2012/12/09/msg014695.html
The candidate workaround I proposed of using pool_setlowat to force
preallocation of KVA would also force preallocation of physical RAM,
which is a waste not incurred by using exec_map, and which is part of
why I never committed it.
There may remain a general problem that if thread A calls pool_get
and tries to service that request by a uvm_km_alloc call that hangs
because KVA is scarce, and thread B does pool_put, the pool_put in
thread B will not notify the pool_get in thread A that it doesn't
need to wait for KVA, and so thread A may continue to hang in
uvm_km_alloc. However,
(a) That won't apply here, because there is exactly as much KVA
available in exec_map as exec_pool will ever try to use.
(b) It is possible that may not even matter in other cases as long as
the page daemon eventually tries to shrink the pool, which will cause
a uvm_km_free that can unhang the hung uvm_km_alloc.
XXX pullup-8
XXX pullup-7
XXX pullup-6
XXX pullup-5, perhaps...
There is a race here, as seen on arm with FPU:
LWP L is running but not on CPU, has its FPU state on CPU2 which
has not been released yet, so fpexc still has VFP_FPEXC_EN set in the PCB copy.
LWP L is scheduled on CPU1, CPU1 calls cpu_switchto() for L in mi_switch().
cpu_switchto() will set VFP_FPEXC_EN in the FPU's fpexc register per the
PCB fpexc copy.
Before CPU1 calls pcu_switchpoint() for L, CPU2 calls
pcu_do_op(PCU_CMD_SAVE | PCU_CMD_RELEASE) for L because it still holds its
FPU state and wants to load another lwp. This cause VFP_FPEXC_EN to
be cleared in the PCB copy, but not in CPU1's register. L's l_pcu_cpu is
set to NULL.
When CPU1 calls pcu_switchpoint() for L it see l_pcu_cpu is NULL, and doesn't
call the release callback.
Now CPU1 has its FPU enabled but with the wrong FPU state.
Fix by releasing the PCU even if l_pcu_cpu is NULL.
disable it - which defaults to disabled. The following command is now
required to use linux binaries:
sysctl -w emul.linux.enabled=1
After a discussion on tech-kern@. All the other ideas to reduce the attack
surface have drawbacks, and this sysctl seems to be the best option.
handling into the backend and doing an optimistic (unlocked) check
first. Always taking the vnode interlock makes this assertion otherwise
very heavy for multi-processor machines. Ride the kernel version bump.
all the same values (except for the filename) just ignore it. Otherwise
report the duplicate-entry error.
This allows the user to create a signature file with veriexegen(8) and
not worry about duplicate entries (due to hard-linked files) which will
otherwise cause /etc/rc.d/veriexec to report an error.
Fixes PR kern/52512
XXX Pull-up for -8
This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.
This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.
This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.
Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h
Reduce code complexity after removal of this functionality.
Update TODO.ptrace accordingly: remove two entries about /proc tracing.
Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).
Proposed on tech-kern@.
All filesystem tracing utility users are encouraged to switch to ptrace(2).
Sponsored by <The NetBSD Foundation>
of this file only.
Rather than adding meaningless {} around all uses of functions that
are #defined to nothing for userland, #define the funcs to something
that is functionally equivalent (but which appeases gcc).
Also, define KASSERT() to nothing for userland, which avoids the need
to add a #definee for mutex_owned which would otherwise be needed,
and simmultaneoiusly stops gcc from complaining about a lack of a prototype.
autoloaded modules. These options are disabled everywhere (except ibcs2
on Vax, but Vax does not support kernel modules, so doesn't matter),
therefore there is no issue in removing them from the list. Interested
users will now have to do a 'modload' first, or uncomment the entries in
GENERIC.
will now have to type 'modload' to use it, or uncomment the entry in
GENERIC. I should have removed it when I disabled COMPAT_FREEBSD by
default, sorry about that.