resolved pathname. We need this in the case of scripts where p_path needs
to point to the interpreter and not the script itself. Otherwise things
like perl script that depend on /proc/$$/exe to re-exec themselves end up
being fork bombs.
In reality we should be using the fully resolved/canonicalized path here, but
namei is not giving it back to us.
This means that the full executable path is always available.
- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change
TODO:
1. reference count the path string, to save copy at fork and free
just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
vnode and then using getcwd() on the parent directory?
This would have caught the mistake I made last week leading to null
pointer dereferences all over the place, a mistake which I evidently
poorly scheduled alongside maxv's change to the panic message on x86
for null pointer dereferences.
the kernel and module symbols, and when relocating a symbol that has
SHN_ABS, take its value as-is and don't return an error if it equals zero.
Sent on tech-kern@.
PROC_MACHINE_ARCH32(P) to override the value for sysctl hw.machine_arch
(native and netbsd32 commpat resp.).
Use these for arm and mips instead of the (not working, noisy, in case
of arm) sysctl override and #ifdef __mips__ in architecture neutral
code.
Candidate fix for problems with hanging after kva fragmentation related
to PR kern/45718.
Proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html
Tested by bouyer@ on i386.
This makes one small change to the semantics of pool_prime and
pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if
there is a pending call to the backing allocator in another thread but
we are not actually out of memory. That is unlikely because nearly
always these are used during initialization, when the pool is not in
use.
XXX pullup-8
XXX pullup-7
XXX pullup-6 (requires tweaking the patch)
XXX pullup-5...
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
vmin is only an optional hint since we're not passing UVM_FLAG_FIXED,
but that doesn't mean we should use uninitialized stack garbage as
the hint.
Noted by chs@.
Candidate fix for PR kern/45718: `processes sometimes get stuck and
spin in vm_map', a problem that has been plaguing all our 32-bit
ports for years.
Since we currently use large (256k) buffers for execargs, and since
nobody has stepped up to tackle breaking them into bite-sized (or at
least page-sized) chunks, after KVA gets sufficiently fragmented we
can't allocate new execargs buffers from kernel_map.
Until 2008, we always carved out KVA for execargs on boot with a uvm
submap exec_map of kernel_map. Then ad@ found that the uvm_km_free
call, to discard them when done, cost about 100us, which a pool
avoided:
https://mail-index.NetBSD.org/tech-kern/2008/06/25/msg001854.htmlhttps://mail-index.NetBSD.org/tech-kern/2008/06/26/msg001859.html
ad@ _simultaneously_ introduced a pool _and_ eliminated the reserved
KVA in the exec_map submap. This change preserves the pool, but
restores exec_map (with less code, by putting it in MI code instead
of copying it in every MD initialization routine).
Patch proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/19/msg022461.html
Patch tested by bouyer@:
https://mail-index.NetBSD.org/tech-kern/2017/10/20/msg022465.html
I previously discussed the issue on tech-kern before I knew of the
history around exec_map:
https://mail-index.NetBSD.org/tech-kern/2012/12/09/msg014695.html
The candidate workaround I proposed of using pool_setlowat to force
preallocation of KVA would also force preallocation of physical RAM,
which is a waste not incurred by using exec_map, and which is part of
why I never committed it.
There may remain a general problem that if thread A calls pool_get
and tries to service that request by a uvm_km_alloc call that hangs
because KVA is scarce, and thread B does pool_put, the pool_put in
thread B will not notify the pool_get in thread A that it doesn't
need to wait for KVA, and so thread A may continue to hang in
uvm_km_alloc. However,
(a) That won't apply here, because there is exactly as much KVA
available in exec_map as exec_pool will ever try to use.
(b) It is possible that may not even matter in other cases as long as
the page daemon eventually tries to shrink the pool, which will cause
a uvm_km_free that can unhang the hung uvm_km_alloc.
XXX pullup-8
XXX pullup-7
XXX pullup-6
XXX pullup-5, perhaps...
There is a race here, as seen on arm with FPU:
LWP L is running but not on CPU, has its FPU state on CPU2 which
has not been released yet, so fpexc still has VFP_FPEXC_EN set in the PCB copy.
LWP L is scheduled on CPU1, CPU1 calls cpu_switchto() for L in mi_switch().
cpu_switchto() will set VFP_FPEXC_EN in the FPU's fpexc register per the
PCB fpexc copy.
Before CPU1 calls pcu_switchpoint() for L, CPU2 calls
pcu_do_op(PCU_CMD_SAVE | PCU_CMD_RELEASE) for L because it still holds its
FPU state and wants to load another lwp. This cause VFP_FPEXC_EN to
be cleared in the PCB copy, but not in CPU1's register. L's l_pcu_cpu is
set to NULL.
When CPU1 calls pcu_switchpoint() for L it see l_pcu_cpu is NULL, and doesn't
call the release callback.
Now CPU1 has its FPU enabled but with the wrong FPU state.
Fix by releasing the PCU even if l_pcu_cpu is NULL.
disable it - which defaults to disabled. The following command is now
required to use linux binaries:
sysctl -w emul.linux.enabled=1
After a discussion on tech-kern@. All the other ideas to reduce the attack
surface have drawbacks, and this sysctl seems to be the best option.
handling into the backend and doing an optimistic (unlocked) check
first. Always taking the vnode interlock makes this assertion otherwise
very heavy for multi-processor machines. Ride the kernel version bump.
all the same values (except for the filename) just ignore it. Otherwise
report the duplicate-entry error.
This allows the user to create a signature file with veriexegen(8) and
not worry about duplicate entries (due to hard-linked files) which will
otherwise cause /etc/rc.d/veriexec to report an error.
Fixes PR kern/52512
XXX Pull-up for -8
This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.
This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.
This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.
Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h
Reduce code complexity after removal of this functionality.
Update TODO.ptrace accordingly: remove two entries about /proc tracing.
Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).
Proposed on tech-kern@.
All filesystem tracing utility users are encouraged to switch to ptrace(2).
Sponsored by <The NetBSD Foundation>
of this file only.
Rather than adding meaningless {} around all uses of functions that
are #defined to nothing for userland, #define the funcs to something
that is functionally equivalent (but which appeases gcc).
Also, define KASSERT() to nothing for userland, which avoids the need
to add a #definee for mutex_owned which would otherwise be needed,
and simmultaneoiusly stops gcc from complaining about a lack of a prototype.
autoloaded modules. These options are disabled everywhere (except ibcs2
on Vax, but Vax does not support kernel modules, so doesn't matter),
therefore there is no issue in removing them from the list. Interested
users will now have to do a 'modload' first, or uncomment the entries in
GENERIC.
will now have to type 'modload' to use it, or uncomment the entry in
GENERIC. I should have removed it when I disabled COMPAT_FREEBSD by
default, sorry about that.
Takes struct bintime maximum delay, and decrements it in place so
that you can use it in a loop in case of spurious wakeup.
Discussed on tech-kern a couple years ago:
https://mail-index.netbsd.org/tech-kern/2015/03/23/msg018557.html
Added a parameter for expressing desired precision -- not currently
interpreted, but intended for a future tickless kernel with a choice
of high-resolution timers.
complementary writable sysctl for the initial guard size of threads
created via pthread_create. Let the existing attribut accessors do the
right thing. Raise the default guard size for threads to 64KB.
Always include a 1MB guard area beyond the end of stack. While ASLR will
normally create a guard area as well, this provides a deterministic area
for all binaries.
Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from
Qualys.
Additionally, change VM_DEFAULT_ADDRESS_TOPDOWN to include
user_stack_guard_size in the size reservation.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.
normally create a guard area as well, this provides a deterministic area
for all binaries.
Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from
Qualys.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
this sequence is used by ZFS in a couple places and by supporting it
natively we can undo our local ZFS changes that avoided it.
note that this is only legal when all of the waiters use cv_wait()
and not any of the other variations, and lockdebug will catch
any violations of this rule.
use FSTRANS_SHARED as lock type so remove the lock type argument.
File system state FSTRANS_SUSPENDING is now unused so remove it.
Regen vnode_if files.
Ride 8.99.1 less than a hour ago.
Add two "static inline" functions to vnode_if.c to handle MPSAFE
and FSTRANS before and after the "VCALL()".
Take FSTRANS and handle error before "VCALL(...vop_lock...)" and
release it after "VCALL(...vop_unlock...)".
node and a usecount greater zero. Therefore rename state "VS_ACTIVE"
to "VS_LOADED" and add a new synthetic state "VS_ACTIVE" for VSTATE_ASSERT()
to assert an active vnode.
Add VSTATE_ASSERT_UNLOCKED() to be used with v_interlock unheld and
move the state assertion macros to sys/vnode_impl.h.
kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()
all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.
This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).
This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).
There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.
At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.
We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
use with mprotect(2), but without enabling them immediately.
Extend the mremap(2) interface to allow duplicating mappings, i.e.
create a second range of virtual addresses references the same physical
pages. Duplicated mappings can have different effective protections.
Adjust PAX mprotect logic to disallow effective protections of W&X, but
allow one mapping W and another X protections. This obsoletes using
temporary files for purposes like JIT.
Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested
and not silently drop the X protection.
Improve test cases to ensure correct operation of the changed
interfaces.
PT_SETSTEP and PT_CLEARSTEP in the current design must unlock proc_lock and
t->p_lock. These functions use lwp_delref() for a tracee with more than one
LWP. This function internally lock (t->)p_lock and this is lock against
self.
There are coming new ATF test with PT_*STEP with multiple LWPs to catch
these bugs in future changes.
Sponsored by <The NetBSD Foundation>
at the place we expected it to be attached!
As mentioned several times (on tech-kern@ mailing list) over the past
18 months or so, I've seen a few instances where this will trigger,
although I've been unable to reproduce them. Hopefully some wider
exposure will reveal the under-lying cause of this rare phenomenon.
Commit was proposed on tech-kern list, and no objections raised.
lwp_create() has been acquired more arguments, there was missing the latest
one. Per analogiam with changes in the same commit to other source files,
go for &SS_INIT.
Print e_ident[EI_MAG3] (it was missed)
Print e_ident[EI_CLASS] as it is used do determine correct ELF magic.
No functional change for non-debug (without option DEBUG_ELF) build.
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.
Change pstat.c to retrieve vnodes by lru lists.
Don't count arguments that have WILLRELE/WILLPUT; count arguments
that are struct vnode *.
No functional change currently because it happens that every released
or put vnode argument comes first or after other ones.
Breaks file systems for which VOP_UNLOCK doesn't work on a reclaimed
vnode.
The only case in tree right now is sys/fs/union -- most file systems
use genfs_unlock, which does work on a reclaimed vnode.
Maybe we can work around this -- and still enable VOP_RECLAIM's
callees to assert lock ownership -- by having VOP_RECLAIM unlock the
vnode instead.
No bump because it wouldn't have been possible to acquire the lock in
VOP_RECLAIM anyway -- instant deadlock because vn_lock waits to
transition out of the RECLAIMING state first. Benefit is that we can
now assert ownership of the lock in any operations called by
VOP_RECLAIM.
Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html
by default, add sysctl vfs.wapbl.journal_iobufs to control it
this also removes need to allocate iobuf during commit, so it
might help to avoid deadlock during memory shortages like PR kern/47030
These operations allow to mark thread as a single-stepping one.
This allows to i.a.:
- single step and emit a signal (PT_SETSTEP & PT_CONTINUE)
- single step and trace syscall entry and exit (PT_SETSTEP & PT_SYSCALL)
The former is useful for debuggers like GDB or LLDB. The latter can be used
to singlestep a usermode kernel. These examples don't limit use-cases of
this interface.
Define PT_*STEP only for platforms defining PT_STEP.
Add new ATF tests setstep[1234].
These ptrace(2) operations first appeared in FreeBSD.
Sponsored by <The NetBSD Foundation>
calling the operation on the lower vnode.
Replace vi_lock with a rw_obj and change layered file systems
to share the lock with the lower vnode.
Layered file systems now use genfs_lock()/_unlock/_islocked().
Welcome to 7.99.67
Pointed out by Christous Zoulas that ELF_AUX_ENTRIES * sizeof(AuxInfo)
assumption is incomplete. There is emulation code that can use different
values (smaller and larger).
Previously PT_DUMPCORE and PIOD_READ_AUXV and regular core dumping retrieved
the vector of AuxInfo {a_type, a_v} + MAXPATHLEN + ALIGN(1).
The extra data is not actually needed in the returned chunk. It can be
retrieved with PT_READ_I operations and it's the preferred way to access
them as the AuxInfo fields contain pointers (void* format) to them.
This changes the behavior of the kernel, no stable releases are affected
with this move. Current software is not affected as other systems already
stop generating data on AT_NULL. This streamlines the NetBSD behavior with
other ELF format OSes. This move also simplifies determination if we got
all the needed data inside the debugger and we no longer need to eliminate
the unneeded chunk at the end.
Sponsored by <The NetBSD Foundation>
1. Supply the siginfo we expect TRAP_SC{E,X} to process_stoptrace() and set it.
2. Change the second argument of proc_stop from notify, to now meaning that
we want to stop right now. Wait in process_stoptrace until that has happened.
3. While here, fix the locking order in process_stoptrace().
- Extend alldevs_mtx section in deviter_init.
- Assert ownership of alldevs_mtx in private functions:
. deviter_reinit
. deviter_next1
. deviter_next2
- Acquire alldevs_mtx in deviter_next.
(alldevs_mtx is not relevant to the struct deviter object, which is
private to the caller who must guarantee exclusive access to it.)
- Omit mutex_exit before panic. No need.
- Sprinkle some more information into a few messages.
- Prefer __diagused over #if DIAGNOSTIC for declarations,
to reduce conditionals.
ok mrg@
by the number of concurrent I/O requests. Also introduce a new disk_wait()
function to measure requests waiting in a bufq.
iostat -y now reports data about waiting and active requests.
So far only drivers using dksubr and dk, ccd, wd and xbd collect data about
waiting requests.
to read-only and vice versa:
- Add an internal flag IMNT_WANTRDONLY.
- Set either IMNT_WANTRDWR or IMNT_WANTRDONLY if going from or to read-only.
- After successfull call to VFS_MOUNT() set or clear MNT_RDONLY.
Adapt tmpfs and rumpfs to the new protocol. Other file systems will be
updated when they get the IMNT_CAN_RWTORO property.
Welcome to 7.99.64
This interface is modeled after FreeBSD API with the usage.
This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.
Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel
Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored
Further ideas:
- restrict this interface with securelevel
Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).
This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.
This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.
This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:
--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)
#ifdef HAVE_PT_GETDBREGS
+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{
Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.
GDB demo:
(gdb) c
Continuing.
Watchpoint 2: traceme
Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);
(Currently the GDB interface is not reliable due to NetBSD support bugs)
Sponsored by <The NetBSD Foundation>
Use proper check for LW_SYSTEM, don't depend on PT_GETREGS/PT_SETREGS.
Don't allow to mask SA_CANTMASK signals with PT_SET_SIGMASK (this covers
SIGSTOP and SIGKILL).
Add new ATF tests:
- setsigmask5
Verify that sigmask cannot be set to SIGKILL
- setsigmask6
Verify that sigmask cannot be set to SIGSTOP
Sponsored by <The NetBSD Foundation>
Introduce new API for debuggers to allow/prevent execution of the specified
thread.
New ptrace(2) operations:
PT_RESUME Allow execution of a specified thread, change its state
from suspended to continued. The addr argument is unused.
The data argument specifies the LWP ID.
This call is equivalent to _lwp_continue(2) called by a
traced process. This call does not change the general
process state from stopped to continued.
PT_SUSPEND Prevent execution of a specified thread, change its state
from continued to suspended. The addr argument is unused.
The data argument specifies the requested LWP ID.
This call is equivalent to _lwp_suspend(2) called by a
traced process. This call does not change the general
process state from continued to stopped.
This interface is modeled after FreeBSD, however with NetBSD specific arguments
passed to ptrace(2) -- FreeBSD passes only thread id, NetBSD passes process and
thread id.
Extend PT_LWPINFO operation in ptrace(2) to report suspended threads. In the
ptrace_lwpinfo structure in pl_event next to PL_EVENT_NONE and PL_EVENT_SIGNAL
add new value PL_EVENT_SUSPENDED.
Add new errno(2) value EDEADLK that might be returned by ptrace(2). It prevents
dead-locking in a scenario of resuming a process or thread that is prevented
from execution. This fixes bug that old API was vulnerable to this scenario.
Kernel bump delayed till introduction of PT_GETDBREGS/PT_SETDBREGS soon.
Add new ATF tests:
- resume1
Verify that a thread can be suspended by a debugger and later
resumed by the debugger
- suspend1
Verify that a thread can be suspended by a debugger and later
resumed by a tracee
- suspend2
Verify that the while the only thread within a process is
suspended, the whole process cannot be unstopped
Sponsored by <The NetBSD Foundation>
sections. They point to the same data in the file, but sections are
for linkers and are not necessarily present in an executable.
The original switch from phdrs to shdrs seems to be just a cop-out to
avoid parsing multiple notes per segment, which doesn't really avoid
the problem b/c sections also can contain multiple notes.
Add new interface to add ability to get/set signal mask of a tracee.
It has been inspired by Linux PTRACE_GETSIGMASK and PTRACE_SETSIGMASK, but
adapted for NetBSD API.
This interface is used for checkpointing software to set/restore context
of a process including signal mask like criu or just to track this property
in reverse-execution software like Record and Replay Framework (rr).
Add new ATF tests for this interface
====================================
getsigmask1:
Verify that plain PT_SET_SIGMASK can be called
getsigmask2:
Verify that PT_SET_SIGMASK reports correct mask from tracee
setsigmask1:
Verify that plain PT_SET_SIGMASK can be called with empty mask
setsigmask2:
Verify that sigmask is preserved between PT_GET_SIGMASK and
PT_SET_SIGMASK
setsigmask3:
Verify that sigmask is preserved between PT_GET_SIGMASK, process
resumed and PT_SET_SIGMASK
setsigmask4:
Verify that new sigmask is visible in tracee
Kernel ABI bump delayed as there are more interfaces to come in ptrace(2).
Sponsored by <The NetBSD Foundation>
Currently a tracer is prohibited to read and write memory of a tracee.
Prohibit reading and faking signal information.
Sponsored by <The NetBSD Foundation>
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.
Add interface in ptrace(2) to track thread (LWP) events:
- birth,
- termination.
The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.
This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.
Change the following structure:
typedef struct ptrace_state {
int pe_report_event;
pid_t pe_other_pid;
} ptrace_state_t;
to
typedef struct ptrace_state {
int pe_report_event;
union {
pid_t _pe_other_pid;
lwpid_t _pe_lwp;
} _option;
} ptrace_state_t;
#define pe_other_pid _option._pe_other_pid
#define pe_lwp _option._pe_lwp
This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.
Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.
Add two basic t_ptrace_wait* tests:
lwp_create1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_CREATE
lwp_exit1:
Verify that 1 LWP creation is intercepted by ptrace(2) with
EVENT_MASK set to PTRACE_LWP_EXIT
All tests are passing.
Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.
Sponsored by <The NetBSD Foundation>
PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.
PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.
Sponsored by <The NetBSD Foundation>
The SIGTRAP signal is thrown from the kernel if EVENT_MASK (ptrace_event)
enables PTRACE_FORK. This new si_code helps debuggers to distinguish the
exact source of signal delivered for a debugger.
Another purpose of TRAP_CHLD is to retain the same behavior inside the
NetBSD kernel for process child traps and have an interface to monitor it.
Retrieving exact event and extended properties of process child trap is
available with PT_GET_PROCESS_STATE.
There is no behavior change for existing software.
This si_code value is NetBSD extension.
Sponsored by <The NetBSD Foundation>
This removes dead code introduced with the following commit:
date: 2012-07-27 22:52:49 +0200; author: christos; state: Exp; lines: +8 -2;
revert racy vfork() parent-blocking-before-child-execs-or-exits code.
ok rmind