warnings:
1. this one: add a void * cast (which I think is the least intrusive)
2. add pragmas to elide the warning
3. add intermediate inline conversion functions
4. change the called function prototypes, adding unused arguments and
converting some of the pointer arguments to void *.
5. make the functions varyadic (which defeats the purpose of checking)
6. pass command line flags to elide the warning
I did try 3 and 4 and I was not pleased with the result (sys_ptrace_common.c)
(3) added too much code and defines, and (4) made the regular use clumsy.
sigswitch() can be called from exit1() through:
ttywait()->ttysleep()-> cv_timedwait_sig()->sleepq_block()->issignal()->sigswitch()
lwp_exit() called for the last LWP triggers exit1() and this causes a panic.
The debugger related signals have short-circuit demise paths in
eventswitch() and other functions, before calling sigswitch().
This change restores the original behavior, but there is an open question
whether the kernel crash is a red herring of misbehavior of ttywait().
This should fix PR kern/54618 by David H. Gutteridge
second argument, and the compiler is free to perform optimizations knowing
that this argument is never NULL.
In this particular case, it was harmless. But still good to fix.
Reported-by: syzbot+6f504255accb795eb6b7@syzkaller.appspotmail.com
For the PTRACE_LWP_EXIT event, the eventswitch() call is triggered from
lwp_exit(). In the case of setting the program status to PS_WEXIT, do not
try to demise in place, by calling lwp_exit() as it causes panic.
In this scenario bail out from the function and resume the lwp_exit()
procedure.
In case of sigswitchin away in issignal() and continuing the execution on
PT_CONTINUE (or equivalent call), there is a time window when another
thread could cause the process state to be changed to PS_STOPPING.
In the current logic, a thread would receive signal 0 (no-signal) and exit
from issignal(), returning to userland and never finishing the process of
stopping all LWPs. This causes hangs waitpid() waiting for SIGCHLD and
the callout polling for the state of the process in an infinite loop.
Instead of prompting for a returned signal from a debugger, repeat the
issignal() loop, this will cause checking the PS_STOPPING flag again and
sigswitching away in the scenario of stopping the process.
Make the function static as it is now local to kern_sig.c.
Rename the 'relock' argument to 'proc_lock_held' as it is more verbose.
This was suggested by mjg@freebsd. While there this flips the users between
true<->false.
Add additional KASSERT(9) calls here to validate whethe proc_lock is used
accordingly.
This field is not needed as it duplicated p_opptr that is alread safe to
use, unless proven otherwise.
eventswitch() already contained a check for != initproc (pid1).
Ride ABI bump for 9.99.16.
it may deadlock on suspension of this file system.
Add fstrans type LAZY and use it for VOP_STRATEGY().
Adress PR kern/53624 (dom0 freeze on domU exit) is still there
It works like:
- kill(SIGSTOP) for unstopped tracee
- ptrace(PT_CONTINUE,SIGSTOP) for stopped tracee
The child will be stopped and always possible to be waited (with wait(2)
like calls).
For stopped traccee kill(SIGSTOP) has no effect. PT_CONTINUE+SIGSTOP cannot
be used on an unstopped process (EBUSY).
This operation is modeled after PT_KILL that is similar for the SIGKILL
call. While there, allow PT_KILL on unstopped traced child.
This operation is useful in an abnormal exit of a debugger from a signal
handler, usually followed by waitpid(2) and ptrace(PT_DETACH).
Stop competing between threads which one emits event signal quicker and
overwriting the signal from another thread.
This fixes missed in action signals.
NetBSD truss can now report reliably all TRAP_SCE/SCX/etc events without
reports of missed ones.
his was one of the reasons why debuggee with multiple threads misbehaved
under a debugger.
This change is v.2 of the previously reverted commit for the same fix.
This version contains recovery path that stopps triggering event SIGTRAP
for a detached debugger.
friendly methods for sys/conf.h that needs it.
one alias per return type and first function are are needed,
though they can be stubbed to existing code. the only cost is
the symbol itself, the codegen it the same.
An alternative approach would be to check the valie in settime1(), but
it would result in multiple checks for valid tv_nsec, as there are
settime1() users that need to check the ranges earlier.
Reported-by: syzbot+96e5ce2c2c704d96c2f0@syzkaller.appspotmail.com
The condition would be rechecked later again after subtracting start time
and most invalid inputs rejected. In corner cases the current code can
accept certain invalid inputs that will pass checks later and behave like
valid ones (due to signed integer overflow).
Reported-by: syzbot+3a4a07b62558bbbd3baa@syzkaller.appspotmail.com
the map, and check the buffer on each bus_dmamap_sync. This allows us to
find DMA buffer overflows and UAFs, which couldn't be found before because
the device accesses to memory are outside of KASAN's control.
Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.
This was a Windows-style behavior that makes threading tracing fragile.
sizeof(pid) and sizeof(lwp) will unlikely ever change and the check can
confuse.
The assert has been moved to ATF t_ptrace_wait.c r.1.132.
Requested by <christos>
Solves kernel panic in NetBSD 8.1 amd64 on VirtualBox 6.0.12 r133076.
Triggered with an NVMe controller without any actual discs behind it:
nvme0 at pci0 dev 14 function 0: vendor 80ee product 4e56 (rev. 0x00)
nvme0: NVMe 1.2
nvme0: interrupting at ioapic0 pin 22
nvme0: ORCL-VBOX-NVME-VER12, firmware 1.0, serial VB1234-56789
ld0 at nvme0 nsid 1
ld0: 0, 0 cyl, 16 head, 63 sec, 1 bytes/sect x 0 sectors
Code path is reached 4 times during normal boot, each time after wd0a
is already mounted; this patch avoids a crash with a dirty filesystem.
Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.
Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.
Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.
Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.
Add a couple of compile-time asserts for assumptions in the code.
No functional change intended in existing ptrace(2) software.
All ATF ptrace(2) and ATF GDB tests pass.
This change improves reliability of the threading ptrace(2) code.
rather than discarding-after-assignment. Introduced from the
[pgoyette-compat] branch work.
Welcome to 9.99.14 !!! (Module hook routine prototype changed.)
Found by the lgtm bot, reported via private Email from maxv@
The new member is caled f_mntfromlabel and it is the dkw_wname
of the corresponding wedge. This is now used by df -W to display
the mountpoint name as NAME=
- get the vnode from the fd passed instead of calling namei() on the
path
- try to reverse resolve the vnode to extract the pathname
- deal with not having a resolved path available
- rename variable that was not a pathbuf
- compare with USHRT_MAX since the max length grew (we could make it NAME_MAX)
- use kmem_alloc for entries > NCHNAMLEN so the namecache contains all
possible entries.
Benefits:
- larger seeds -- a 128-bit key alone is not enough for `128-bit security'
- better resistance to timing side channels than AES
- a better-understood security story (https://eprint.iacr.org/2018/349)
- no loss in compliance with US government standards that nobody ever
got fired for choosing, at least in the US-dominated western world
- no dirty endianness tricks
- self-tests
Drawbacks:
- performance hit: throughput is reduced to about 1/3 in naive measurements
=> possible to mitigate by using hardware SHA-256 instructions
=> all you really need is 32 bytes to seed a userland PRNG anyway
=> if we just used ChaCha this would go away...
XXX pullup-7
XXX pullup-8
XXX pullup-9
behavior in wapbl_start() which extended int to size_t.
Error message was:
> UBSan: Undefined Behavior in ../../../../kern/vfs_wapbl.c:609:41, signed integer overflow: 3345138 * 1024 cannot be represented in type 'int'
> /* XXX maybe use filesystem fragment size instead of 1024 */
> /* XXX fix actual number of buffers reserved per filesystem. */
> wl->wl_bufcount_max = (buf_nbuf() / 2) * 1024;
Need more work?
MI pools on amd64 from linked lists to bitmaps, which have higher security
properties.
Then, change the computation of the size of the PH pools: take into account
the bitmap area available by default in the ph_u2 union, and don't go with
&phpool[>0] if &phpool[0] already has enough space to embed a bitmap.
The pools that are migrated in this change all use bitmaps small enough to
fit in &phpool[0], therefore there is no increase in memory consumption.
powerful, has much more coverage - far beyond just kmem(9) -, and also
consumes less memory.
KMEM_GUARD was a debug-only option that required special DDB tweaking, and
had no use in releases or even diagnostic kernels.
As a general rule, the policy now is to harden the pool layer by default
in GENERIC, and use kASan as a diagnostic/debug/fuzzing feature to verify
each memory allocation & access in the system.
the module's modcmd(CMD_FINI) code. If the modcmd() call returns an
error, we attempted to re-instate the module's sysctl stuff.
This doesn't work well for built-in modulesi (where "unload" actually
means "disable"), since they don't have any ``struct kobj''.
So check first, and don't try to find the __link_set_sysctl_funcs for
built-in modules.
the list of routines that need to be called for setting up sysctl
variables. This worked great for all code included in the kernel
itself, but didn't deal with modules that want to create their own
sysctl data. So, we ended up with a lot of #ifdef _MODULE blocks
so modules could explicitly call their setup functions when loaded
as non-built-in modules.
So today, we complete the task that was started so many years ago.
When modules are loaded, after we've called xxx_modcmd(INIT...) we
check if the module contains its own __link_set_sysctl_funcs, and
if so we call the functions listed. We add a struct sysctllog member
to the struct module so we can call sysctl_teardown() when the module
gets unloaded. (The sequence of events ensures that the sysctl stuff
doesn't get created until the rest of the module's init code does any
required memory allocation.)
So, no more need to explicitly call the sysctl setup routines when
built as a loadable module.
Anything we confirmed about the world before callout_halt may cease
to be true afterward, so make sure to start over in that case.
Add some comments explaining what's going on.
Reported-by: syzbot+d58da99969f58c1a024a@syzkaller.appspotmail.com
Strictly speaking, what we want to avoid is poisoning buffers that were
referenced in a global list as part of the ctor. But, if a buffer indeed
got referenced as part of the ctor, it necessarily has to be unreferenced
in the dtor; which implies it has to have a dtor. So we want both a ctor
and a dtor, and not just one of them.
Note that POOL_QUARANTINE already implicitly provides this increased
coverage.
directly, to immediately detect certain bugs that would otherwise have
been detected only later on the pool layer, if the buffer ever reached
the pool layer.
Replace lwp_delref() + mutex_enter() with: mutex_enter() + lwp_delref2().
This avoids extra taking and exiting from a mutex.
Add missing mutex_exit() for LW_SYSTEM.
Do not switch lwp for PT_SET_SIGINFO. This operation is not needed and
avoids panic for >2 LWPs as p_lock is attempted to be entered again in a
critical section.
Without this, a rumpkernel (appropriately modified) built with SCTP
enabled will try to assign the function pointers, but the targets
are only available in rumpnet. We cannot link the rumpkernel against
rumpnet because rumpnet is already linked against rumpkernel and we
would end up with a circular dependency.
As reported in private Email by rjs@
variable from rodata. The compound gets pushed on the stack, the padding
of the structure was therefore not initialized, and was getting leaked to
userland in sys___sigaltstack14().
an immediate, then the 64 bits of nnode.sysctl_data may not all be
initialized, since this is an union.
Obviously, this is harmless; but still a bug, so fix it.
there is no actual bug here, since the buffer is guaranteed to be NUL
terminated.
With KASAN we check the whole buffer to cover the "worst" case, and here
it triggered false positives because the buffer size was not filtered.
Stop competing between threads which one emits event signal quicker and
overwriting the signal from another thread.
This fixes missed in action signals.
NetBSD truss can now report reliably all TRAP_SCE/SCX/etc events without
reports of missed ones.
This was one of the reasons why debuggee with multiple threads misbehaved
under a debugger.
This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.
This property was removed in NetBSD-8.0 and had no users.
This change simplifies the signal code, removing dead branches.
NFCI
mknod with mode & S_IFIFO and dev=0 shall behave like mkfifo.
Update the documentation to reflect this state.
Add ATF tests.
This is an in-kernel implementation as typically user-space programs use
mkfifo(2) directly, however whenever there is need to bypass libc (like in
valgrind) then portable POSIX software calls the mknod syscall.
Noted on tech-kern@ by Greg Troxel.
module's name still being available - it may be destroyed when
kobj_affix() unloads the object. So make a copy of the name first
so we can use it in a useful error message.
(Without this, I've have affix errors go into an infinite loop
trying to print the error message!)
Previously it was disabled due to vfork(2) synchronization issues.
These problems are now gone.
While there, set l_vforkwaiting to false in posix_spawn. This is not very
needed but it does not make harm to keep it initialized explicitly.
the thread currently suspending this mount.
Remove now unneeded state FSTRANS_EXCL.
It is now possible to suspend a file system from a thread
already holding fstrans locks. Use with care ...
In the previous behavior vforking parent was keeping pointer to a child
and checking whether it clears a PL_PPWAIT in its bitfield p_lflag. However
a child can go invalid between exec/exit event from child and waking up
vforked parent and this can cause invalid pointer read and in the worst
scenario kernel crash.
In the new behavior vforked child keeps a reference to vforked parent LWP
and sets a value l_vforkwaiting to false. This means that vforked child
can finish its work, exec/exit and be terminated and once parent will be
woken up it will read its own field whether its child is still blocking.
Add new field in struct lwp: l_vforkwaiting protected by proc_lock.
In future it should be refactored and all PL_PPWAIT users transformed to
l_vforkwaiting and next l_vforkwaiting probably transformed into a bit
field.
This is another attempt of fixing this bug after <rmind> from 2012 in
commit:
Author: rmind <rmind@NetBSD.org>
Date: Sun Jul 22 22:40:18 2012 +0000
fork1: fix use-after-free problems. Addresses PR/46128 from Andrew Doran.
Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
other LWP is undesirable, but this is enough for netbsd-6.
The new version no longer performs unsafe access in l_lflag changing the
LP_VFORKWAIT bit.
Verified with ATF t_vfork and t_ptrace* tests and they are no longer
causing any issues in my local setup.
Fixes PR/46128 by Andrew Doran
If a process is exiting and it was not asked to relock proc_lock, do not
free the mutex as it causes panic. This bug is a timing bug as the faulty
condition is not deterministic and fires only somtimes, but is quickly
triggerable when executed in an infinite loop.
Detected and reported with LLDB test-suite by <mgorny>
posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.
Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.
A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.
There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.