Commit Graph

10725 Commits

Author SHA1 Message Date
christos
0263994f06 regen 2019-10-21 14:23:53 +00:00
tnn
4d0eb0fca4 mcl_cache: align items to COHERENCY_UNIT
Because we do cache incoherent DMA to/from mbufs we cannot safely share
share cache lines with adjacent items that may be concurrently accessed.
2019-10-19 06:36:47 +00:00
christos
626e72c16b print which process asked for an unsupported event so we can fix it. 2019-10-18 19:43:49 +00:00
christos
176ada4b2b Add and use __FPTRCAST, requested by uwe@ 2019-10-16 18:29:49 +00:00
christos
d2348edc56 Add void * function pointer casts. There are different ways to "fix" those
warnings:
    1. this one: add a void * cast (which I think is the least intrusive)
    2. add pragmas to elide the warning
    3. add intermediate inline conversion functions
    4. change the called function prototypes, adding unused arguments and
       converting some of the pointer arguments to void *.
    5. make the functions varyadic (which defeats the purpose of checking)
    6. pass command line flags to elide the warning
I did try 3 and 4 and I was not pleased with the result (sys_ptrace_common.c)
(3) added too much code and defines, and (4) made the regular use clumsy.
2019-10-16 15:27:38 +00:00
kamil
29be9f8e91 Remove the short-circuit lwp_exit() path from sigswitch()
sigswitch() can be called from exit1() through:

   ttywait()->ttysleep()-> cv_timedwait_sig()->sleepq_block()->issignal()->sigswitch()

lwp_exit() called for the last LWP triggers exit1() and this causes a panic.

The debugger related signals have short-circuit demise paths in
eventswitch() and other functions, before calling sigswitch().

This change restores the original behavior, but there is an open question
whether the kernel crash is a red herring of misbehavior of ttywait().

This should fix PR kern/54618 by David H. Gutteridge
2019-10-15 13:59:57 +00:00
maxv
7b43da9e77 Add a check before the memcpy. memcpy is defined to never take NULL as
second argument, and the compiler is free to perform optimizations knowing
that this argument is never NULL.

In this particular case, it was harmless. But still good to fix.

Reported-by: syzbot+6f504255accb795eb6b7@syzkaller.appspotmail.com
2019-10-14 16:27:03 +00:00
christos
843ff516d3 cast nullop though void * 2019-10-13 22:31:19 +00:00
kamil
305335a1e9 Avoid double lwp_exit() in eventswitch()
For the PTRACE_LWP_EXIT event, the eventswitch() call is triggered from
lwp_exit(). In the case of setting the program status to PS_WEXIT, do not
try to demise in place, by calling lwp_exit() as it causes panic.

In this scenario bail out from the function and resume the lwp_exit()
procedure.
2019-10-13 03:50:26 +00:00
kamil
130e572a10 Fix one the the root causes of unreliability of the ptrace(2)ed threads
In case of sigswitchin away in issignal() and continuing the execution on
PT_CONTINUE (or equivalent call), there is a time window when another
thread could cause the process state to be changed to PS_STOPPING.

In the current logic, a thread would receive signal 0 (no-signal) and exit
from issignal(), returning to userland and never finishing the process of
stopping all LWPs. This causes hangs waitpid() waiting for SIGCHLD and
the callout polling for the state of the process in an infinite loop.

Instead of prompting for a returned signal from a debugger, repeat the
issignal() loop, this will cause checking the PS_STOPPING flag again and
sigswitching away in the scenario of stopping the process.
2019-10-13 03:19:57 +00:00
kamil
0998dd273e Add sigswitch_unlock_and_switch_away(), extracted from sigswitch()
Use sigswitch_unlock_and_switch_away() whenever there is no need for
sigswitch().
2019-10-13 03:10:22 +00:00
kamil
1249b6bf7e Refactor sigswitch()
Make the function static as it is now local to kern_sig.c.

Rename the 'relock' argument to 'proc_lock_held' as it is more verbose.
This was suggested by mjg@freebsd. While there this flips the users between
true<->false.

Add additional KASSERT(9) calls here to validate whethe proc_lock is used
accordingly.
2019-10-12 19:57:09 +00:00
kamil
c18c9a670f Avoid signed integer overflow for -lwp where lwp is INT_MIN
Reported-by: syzbot+68b80b44b898e66da3fc@syzkaller.appspotmail.com
2019-10-12 12:04:37 +00:00
kamil
b3bca7a74f Remove p_oppid from struct proc
This field is not needed as it duplicated p_opptr that is alread safe to
use, unless proven otherwise.

eventswitch() already contained a check for != initproc (pid1).

Ride ABI bump for 9.99.16.
2019-10-12 10:55:23 +00:00
hannken
1abb473536 Regen. 2019-10-11 08:05:19 +00:00
hannken
f8da5187dc As VOP_STRATEGY() usually calls itself on the file system holding "/dev"
it may deadlock on suspension of this file system.

Add fstrans type LAZY and use it for VOP_STRATEGY().

Adress PR kern/53624 (dom0 freeze on domU exit) is still there
2019-10-11 08:04:52 +00:00
maxv
1677a78849 Add KASAN instrumentation on ucas and ufetch. 2019-10-10 13:45:14 +00:00
chs
cf529c6de1 simpler fix for the race between shmat() and shmdt():
change shmat() to hold shm_lock until it is completely done.
2019-10-09 17:47:13 +00:00
chs
a851cc5747 revert rev 1.139 (fixing a race between shmat() and shmdt())
that approach turned out to be too complicated.
2019-10-09 17:44:45 +00:00
kamil
2e7e73e2ed Introduce new ptrace(2) operation PT_STOP
It works like:

 - kill(SIGSTOP) for unstopped tracee
 - ptrace(PT_CONTINUE,SIGSTOP) for stopped tracee

The child will be stopped and always possible to be waited (with wait(2)
like calls).

For stopped traccee kill(SIGSTOP) has no effect. PT_CONTINUE+SIGSTOP cannot
be used on an unstopped process (EBUSY).

This operation is modeled after PT_KILL that is similar for the SIGKILL
call. While there, allow PT_KILL on unstopped traced child.

This operation is useful in an abnormal exit of a debugger from a signal
handler, usually followed by waitpid(2) and ptrace(PT_DETACH).
2019-10-09 13:19:43 +00:00
skrll
208170f3b1 Traiing whitespace 2019-10-09 05:59:51 +00:00
christos
b0424b9dde - cast through void * for rump
- don't generate bogus filenames /dev/null.bottom etc.
2019-10-09 01:43:00 +00:00
kamil
f3a317a980 Enhance reliability of ptrace(2) in a debuggee with multiple LWPs
Stop competing between threads which one emits event signal quicker and
overwriting the signal from another thread.

This fixes missed in action signals.

NetBSD truss can now report reliably all TRAP_SCE/SCX/etc events without
reports of missed ones.

his was one of the reasons why debuggee with multiple threads misbehaved
under a debugger.


This change is v.2 of the previously reverted commit for the same fix.

This version contains recovery path that stopps triggering event SIGTRAP
for a detached debugger.
2019-10-08 18:02:46 +00:00
kamil
60274cdd77 Correct the same expression on both sides of |
PR sw-bug/54610 by David Binderman
2019-10-08 12:29:57 +00:00
mrg
a2fd483377 steal an idea from uwe@ and implement gcc-8 function type cast
friendly methods for sys/conf.h that needs it.

one alias per return type and first function are are needed,
though they can be stubbed to existing code.  the only cost is
the symbol itself, the codegen it the same.
2019-10-08 07:33:14 +00:00
kamil
815185c6dc Fix typo in a comment 2019-10-07 21:32:51 +00:00
uwe
edcef67ec2 xc_barrier - convenience function to xc_broadcast() a nop.
Make the intent more clear and also avoid a bunch of (xcfunc_t)nullop
casts that gcc 8 -Wcast-function-type is not happy about.
2019-10-06 15:11:16 +00:00
uwe
9d5b26a9e3 Define cpu_xc_* functions with unused second argument to make them
conform to xcfunc_t callback typedef (-Wcast-function-type).
Same object code is generated.
2019-10-06 02:04:26 +00:00
kamil
8e3fd5b698 Check for valid timespec in clock_settime1()
An alternative approach would be to check the valie in settime1(), but
it would result in multiple checks for valid tv_nsec, as there are
settime1() users that need to check the ranges earlier.

Reported-by: syzbot+96e5ce2c2c704d96c2f0@syzkaller.appspotmail.com
2019-10-05 12:57:40 +00:00
kamil
fa6363e636 Avoid -LONG_MIN msgtyp in msgrcv(2) and treat it as LONG_MAX
This logic (found in Linux) avoids undefined behavior.

Reported-by: syzbot+8af00519a8688d9903ca@syzkaller.appspotmail.com
2019-10-04 23:20:22 +00:00
kamil
ffd5d3e30b Avoid signed integer overflow in ts2timo() for ts->tv_nsec
The condition would be rechecked later again after subtracting start time
and most invalid inputs rejected. In corner cases the current code can
accept certain invalid inputs that will pass checks later and behave like
valid ones (due to signed integer overflow).

Reported-by: syzbot+3a4a07b62558bbbd3baa@syzkaller.appspotmail.com
2019-10-04 14:17:07 +00:00
maxv
36beaf9ddd Add DMA instrumentation in KASAN. We note the original buffer and length in
the map, and check the buffer on each bus_dmamap_sync. This allows us to
find DMA buffer overflows and UAFs, which couldn't be found before because
the device accesses to memory are outside of KASAN's control.
2019-10-04 06:27:42 +00:00
kamil
96755fb8d4 Add two KASSERTS in the ptrace(2) kernel code
Verify that we will never return empty ptrace_state for CHILD/LWP event.
2019-10-03 23:11:11 +00:00
kamil
a35a4fe3b8 Separate flag for suspended by _lwp_suspend and suspended by a debugger
Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.
2019-10-03 22:48:44 +00:00
kamil
2b5fbe86ad Remove compile-time asserts checking whether intptr_t and void* are compat
The checks were requested by core@ as a prerequisite for kevent::udata type
switch from intptr_t to void*.
2019-10-03 22:29:17 +00:00
kamil
2f629ee89a Remove 2 static asserts from the kernel ptrace code
sizeof(pid) and sizeof(lwp) will unlikely ever change and the check can
confuse.

The assert has been moved to ATF t_ptrace_wait.c r.1.132.

Requested by <christos>
2019-10-01 21:49:50 +00:00
kamil
c1b8181461 Restore the old behavior in PT_GET_PROCESS_STATE
For !child and !lwp events return zeroed struct ptrace_state.

There is code that depends on it (GDB).

Fixes PR toolchain/54590 by martin@
2019-10-01 18:44:22 +00:00
chs
db38f3713d in shmdt(), wait until shmat() completes before detaching.
Reported-by: syzbot+8f470a1bf36b47ae0040@syzkaller.appspotmail.com
Reported-by: syzbot+45810b4c41ed65d9148d@syzkaller.appspotmail.com
2019-10-01 16:36:58 +00:00
cnst
da5825f8ed kern/subr_disk: bounds_check_with_label: really protect against div by zero
Solves kernel panic in NetBSD 8.1 amd64 on VirtualBox 6.0.12 r133076.

Triggered with an NVMe controller without any actual discs behind it:

nvme0 at pci0 dev 14 function 0: vendor 80ee product 4e56 (rev. 0x00)
nvme0: NVMe 1.2
nvme0: interrupting at ioapic0 pin 22
nvme0: ORCL-VBOX-NVME-VER12, firmware 1.0, serial VB1234-56789
ld0 at nvme0 nsid 1
ld0: 0, 0 cyl, 16 head, 63 sec, 1 bytes/sect x 0 sectors

Code path is reached 4 times during normal boot, each time after wd0a
is already mounted; this patch avoids a crash with a dirty filesystem.
2019-09-30 23:23:59 +00:00
kamil
5e4bbc4985 Move TRAP_CHLD/TRAP_LWP ptrace information from struct proc to siginfo
Storing struct ptrace_state information inside struct proc was vulnerable
to synchronization bugs, as multiple events emitted in the same time were
overwritting other ones.

Cache the original parent process id in p_oppid. Reusing here p_opptr is
in theory prone to slight race codition.

Change the semantics of PT_GET_PROCESS_STATE, reutning EINVAL for calls
prompting for the value in cases when there wasn't registered an
appropriate event.

Add an alternative approach to check the ptrace_state information, directly
from the siginfo_t value returned from PT_GET_SIGINFO. The original
PT_GET_PROCESS_STATE approach is kept for compat with older NetBSD and
OpenBSD. New code is recommended to keep using PT_GET_PROCESS_STATE.

Add a couple of compile-time asserts for assumptions in the code.

No functional change intended in existing ptrace(2) software.

All ATF ptrace(2) and ATF GDB tests pass.

This change improves reliability of the threading ptrace(2) code.
2019-09-30 21:13:33 +00:00
rhialto
1c7f0224e7 Do all delta calculations strictly using uint32_t. Avoid integer
overflows in calculating absolute deltas by subtracting the right way
around.

Reported-by: syzbot+68c37d09c833f8ec1341@syzkaller.appspotmail.com
2019-09-29 12:07:52 +00:00
jmcneill
a20d501e5a mbstat_conver_to_user_cb -> mbstat_convert_to_user_cb 2019-09-28 16:02:12 +00:00
pgoyette
ccc3f35b62 Actually return the updated pointer-to-mbuf-pointer to the caller
rather than discarding-after-assignment.  Introduced from the
[pgoyette-compat] branch work.

Welcome to 9.99.14 !!!  (Module hook routine prototype changed.)

Found by the lgtm bot, reported via private Email from maxv@
2019-09-27 00:32:03 +00:00
christos
fc72d154af make nmountcompatnames unsigned (assigned from __arraycount, compared with
unsigned in compat code)
2019-09-26 01:34:16 +00:00
kamil
6f22d54e25 Add a temporary ctassert checking whether void* and intptr_t are compatible 2019-09-24 19:21:45 +00:00
skrll
aeef4b9a0b Enable POOL_REDZONE with DIAGNOSTIC.
The bug in the arm pmap was fixed long ago.
2019-09-23 05:39:59 +00:00
christos
ff17893526 regen 2019-09-22 23:03:20 +00:00
christos
02cdd248ec Add a new member to struct vfsstat and grow the unused members
The new member is caled f_mntfromlabel and it is the dkw_wname
of the corresponding wedge. This is now used by df -W to display
the mountpoint name as NAME=
2019-09-22 22:59:37 +00:00
maxv
7b2608b508 Fix KASAN on aarch64: the bus_space_* functions are macros, so we can't
redefine them. Introduce __HAVE_KASAN_INSTR_BUS, which indicates whether
to instrument the bus functions. Defined on amd64 only.
2019-09-22 10:35:12 +00:00
kamil
0af3675487 Validate usec ranges in sys___select50()
Later in the code selcommon() checks for proper timespec, check only
correct usec of timeval before type conversions.
2019-09-20 15:00:47 +00:00
kamil
8978d4e527 Validate usec ranges in settimeofday1() 2019-09-20 14:12:57 +00:00
kamil
43bc9355ea Validate usec ranges in do_sys_utimes()
sys/kern/vfs_syscalls.c:3939:4, signed integer overflow: 503923632 * 1000 cannot be represented in type 'int'

Reported-by: syzbot+4cfc86ffd30e8678f68d@syzkaller.appspotmail.com
2019-09-20 13:29:31 +00:00
maxv
9b0d65da51 Handle M_EXT with M_BUFADDR, and introduce M_BUFSIZE. Use them to dedup
code.
2019-09-18 16:18:12 +00:00
kamil
8a4b4e84ff Decorate percpu_cpu_swap() with __noubsan 2019-09-18 15:33:32 +00:00
christos
eb654c054f Add a boolean argument to indicate if we have a path/true (execve) or an
fd/false (fexecve). This is needed to differentiate between them because
NULL/-1 can be readily passed from userland.
2019-09-17 15:19:27 +00:00
christos
0e8ea4cdfb PR/54549: ng0: always initialize execname. 2019-09-16 11:11:34 +00:00
manu
d45bbebabd Accept root device specification as NAME=label 2019-09-16 00:01:16 +00:00
manu
a351ce9f52 Rollback change to accept NAME=label root device specification
As suggested by Michael van Elst, the operation should be done
int getwedgename()
2019-09-15 23:59:33 +00:00
christos
9246f6c782 Prevent O_EXEC for mq_open(2), and O_EXEC with a writable fd for open(2). 2019-09-15 20:51:03 +00:00
christos
5c0785385f set VEXEC if FEXEC is set. 2019-09-15 20:24:25 +00:00
christos
35de5f3a54 - Add support for fexecve
- get the vnode from the fd passed instead of calling namei() on the
	  path
	- try to reverse resolve the vnode to extract the pathname
	- deal with not having a resolved path available
- rename variable that was not a pathbuf
2019-09-15 20:23:50 +00:00
christos
09c169e26a adjust for new check_exec signature. 2019-09-15 20:21:12 +00:00
christos
ae5efbe2cd Don't set AT_SUN_EXECNAME if we don't have a fully resolved name. 2019-09-15 20:20:26 +00:00
maya
5516c5e1ec More indentation 2019-09-15 17:37:25 +00:00
maya
3afe36de52 indentation and whitespace 2019-09-15 17:36:43 +00:00
christos
0a76d2ed5a Add F_GETPATH, presented to tech-kern. 2019-09-15 16:25:57 +00:00
christos
ca2fce2e75 - add missing error check
- use '\0' for char const
2019-09-14 21:23:34 +00:00
mlelstv
e7e76ef7bf Fix build. 2019-09-14 15:06:33 +00:00
christos
f216ee4b1a PT/54527: Anthony Mallet: Don't clear socket errors for MSG_PEEK. 2019-09-14 14:09:54 +00:00
christos
3dca96614f - expose the now hidden namecache
- compare with USHRT_MAX since the max length grew (we could make it NAME_MAX)
- use kmem_alloc for entries > NCHNAMLEN so the namecache contains all
  possible entries.
2019-09-13 14:01:33 +00:00
manu
9749fdddc3 Accept root device specification as NAME=label 2019-09-13 01:33:20 +00:00
maxv
66e967eff0 Introduce sigaction_copy(), to copy sigaction structures without padding,
and use it in sigaction1(). This is to fix info leaks all at once in the
signal functions.
2019-09-08 07:00:20 +00:00
christos
2a0890e7cd - move quadruplicated code into a function
- delete #if 1 and #if 0 code
2019-09-07 15:34:44 +00:00
maxv
b589c54d24 Add KASAN instrumentation on the bus_space functions that handle buffers. 2019-09-07 10:24:01 +00:00
maxv
1f8d4ff48b Add KASAN instrumentation for memmove. 2019-09-07 09:46:07 +00:00
maxv
79654dda09 Reorder for clarity, and localify pool_allocator_big[], should not be used
outside.
2019-09-06 09:19:06 +00:00
maxv
d1f5019879 Add KASAN instrumentation on the atomic functions. Use macros to simplify.
These macros are prerequisites for future changes.
2019-09-05 16:19:16 +00:00
ryo
491e67c353 requires memory barrier before IPI ack.
Problem was seen on the aarch64 cpus.

Fixes PR/54009
2019-09-05 09:20:05 +00:00
riastradh
8e07b51739 Switch from NIST CTR_DRBG with AES to NIST Hash_DRBG with SHA-256.
Benefits:

- larger seeds -- a 128-bit key alone is not enough for `128-bit security'
- better resistance to timing side channels than AES
- a better-understood security story (https://eprint.iacr.org/2018/349)
- no loss in compliance with US government standards that nobody ever
  got fired for choosing, at least in the US-dominated western world
- no dirty endianness tricks
- self-tests

Drawbacks:

- performance hit: throughput is reduced to about 1/3 in naive measurements
  => possible to mitigate by using hardware SHA-256 instructions
  => all you really need is 32 bytes to seed a userland PRNG anyway
  => if we just used ChaCha this would go away...

XXX pullup-7
XXX pullup-8
XXX pullup-9
2019-09-02 20:09:29 +00:00
maxv
d6e22b2c07 Revert r1.254, put back || for KASAN, some destructors like lwp_dtor()
caused false positives. Needs more work.
2019-08-26 10:35:35 +00:00
msaitoh
a1f88951ea Change buf_nbuf()'s return value from int to u_int to avoid undefined
behavior in wapbl_start() which extended int to size_t.

Error message was:
> UBSan: Undefined Behavior in ../../../../kern/vfs_wapbl.c:609:41, signed integer overflow: 3345138 * 1024 cannot be represented in type 'int'

>        /* XXX maybe use filesystem fragment size instead of 1024 */
>         /* XXX fix actual number of buffers reserved per filesystem. */
>         wl->wl_bufcount_max = (buf_nbuf() / 2) * 1024;

Need more work?
2019-08-26 10:24:39 +00:00
maxv
aff86c9e3f Reject negative offsets, to prevent panics later in genfs_getpages(). 2019-08-26 10:19:08 +00:00
maxv
542f82ceb4 Fix stupid bugs in linux_sys_shmctl(): the index could be out of bound
(page fault) and there was no proper locking.

Maybe we should just remove LINUX_SHM_STAT, like compat_linux32.
2019-08-23 10:22:14 +00:00
msaitoh
638734ca31 Use unsigned to avoid undefined behavior. Found by kUBSan. 2019-08-20 01:56:21 +00:00
christos
5d96c08a38 If we could not start extattr for some reason, don't advertise extattr in the
mount.
2019-08-19 09:32:42 +00:00
mlelstv
4e7617c5be Align parsing of boot devices. This allows to specify device names
as strings in the boot loader and the kernel configuration.
2019-08-18 06:28:42 +00:00
maxv
31589aab59 Kernel Heap Hardening: use bitmaps on all off-page pools. This migrates 29
MI pools on amd64 from linked lists to bitmaps, which have higher security
properties.

Then, change the computation of the size of the PH pools: take into account
the bitmap area available by default in the ph_u2 union, and don't go with
&phpool[>0] if &phpool[0] already has enough space to embed a bitmap.

The pools that are migrated in this change all use bitmaps small enough to
fit in &phpool[0], therefore there is no increase in memory consumption.
2019-08-17 12:37:49 +00:00
maxv
d927327758 Initialize pp->pr_redzone to false. For some reason with KUBSAN GCC does
not eliminate the unused branch in pr_item_linkedlist_put(), and this
leads to a unused uninitialized access which triggers KUBSAN messages.
2019-08-16 10:41:35 +00:00
maxv
3808726a03 Unlink KMEM_GUARD leftovers. 2019-08-15 12:24:08 +00:00
maxv
46fb8844fe Retire KMEM_GUARD. It has been superseded by kASan, which is much more
powerful, has much more coverage - far beyond just kmem(9) -, and also
consumes less memory.

KMEM_GUARD was a debug-only option that required special DDB tweaking, and
had no use in releases or even diagnostic kernels.

As a general rule, the policy now is to harden the pool layer by default
in GENERIC, and use kASan as a diagnostic/debug/fuzzing feature to verify
each memory allocation & access in the system.
2019-08-15 12:06:42 +00:00
skrll
0b1a41caff More diagnostic 2019-08-15 09:04:22 +00:00
skrll
d065a43358 Indentation and wrap the resulting long line 2019-08-15 09:03:09 +00:00
pgoyette
0c750f3eef When modules are unloaded, we call sysctl_teardown() before calling
the module's modcmd(CMD_FINI) code.  If the modcmd() call returns an
error, we attempted to re-instate the module's sysctl stuff.

This doesn't work well for built-in modulesi (where "unload" actually
means "disable"), since they don't have any ``struct kobj''.

So check first, and don't try to find the __link_set_sysctl_funcs for
built-in modules.
2019-08-08 18:08:41 +00:00
mrg
c0545f4db1 mark a variable __diagused to fix this problem affecting many builds:
kern/kern_time.c:1413:6: error: variable 'error' set but not used [-Werror=unused-but-set-variable]
2019-08-07 07:22:12 +00:00
pgoyette
97b627eca5 Many years ago someone created a new __link_set_sysctl_funcs to hold
the list of routines that need to be called for setting up sysctl
variables.  This worked great for all code included in the kernel
itself, but didn't deal with modules that want to create their own
sysctl data.  So, we ended up with a lot of #ifdef _MODULE blocks
so modules could explicitly call their setup functions when loaded
as non-built-in modules.

So today, we complete the task that was started so many years ago.

When modules are loaded, after we've called xxx_modcmd(INIT...) we
check if the module contains its own __link_set_sysctl_funcs, and
if so we call the functions listed.  We add a struct sysctllog member
to the struct module so we can call sysctl_teardown() when the module
gets unloaded.  (The sequence of events ensures that the sysctl stuff
doesn't get created until the rest of the module's init code does any
required memory allocation.)

So, no more need to explicitly call the sysctl setup routines when
built as a loadable module.
2019-08-07 00:38:01 +00:00
riastradh
6eb7fd2b53 Acquire shmseg uobj reference while we hold shm_lock.
Otherwise nothing prevents it from being detached under our feet when
we drop shm_lock.

Reported-by: syzbot+a76c618a6808a0fda475@syzkaller.appspotmail.com
2019-08-06 15:48:06 +00:00
riastradh
80a06cecc7 Fix race in timer destruction.
Anything we confirmed about the world before callout_halt may cease
to be true afterward, so make sure to start over in that case.

Add some comments explaining what's going on.

Reported-by: syzbot+d58da99969f58c1a024a@syzkaller.appspotmail.com
2019-08-06 15:47:55 +00:00
maxv
20e0cdbea1 Replace || by && in KASAN, to increase the pool coverage.
Strictly speaking, what we want to avoid is poisoning buffers that were
referenced in a global list as part of the ctor. But, if a buffer indeed
got referenced as part of the ctor, it necessarily has to be unreferenced
in the dtor; which implies it has to have a dtor. So we want both a ctor
and a dtor, and not just one of them.

Note that POOL_QUARANTINE already implicitly provides this increased
coverage.
2019-08-03 09:31:07 +00:00
kamil
75a7ede06f Update our vm resource use for sysctl(3) call reading kinfo_proc*
Without this change RSS properties are zeroed unless a process exits or
calls getrusage(2).
2019-08-02 22:46:44 +00:00
maxv
588ff51eb6 Kernel Heap Hardening: perform certain sanity checks on the pool caches
directly, to immediately detect certain bugs that would otherwise have
been detected only later on the pool layer, if the buffer ever reached
the pool layer.
2019-08-02 05:22:14 +00:00
msaitoh
cbdf3e3a2d Whitespace fixes. No functional change. 2019-07-31 02:21:31 +00:00
maxv
e24cd28ffe Fix info leak: the padding after the header causes uninitialized heap
memory to be copied to userland in sys_recvmsg().
2019-07-29 09:42:17 +00:00
msaitoh
01b1ce77ab Set kcpuset's bit correctly to avoid undefined behavior. Found by KUBSan. 2019-07-26 05:39:55 +00:00
msaitoh
6caaf10168 Set sc_mask correctly in selsysinit() to avoid undefined behavior.
Found by KUBSan.
2019-07-26 05:37:59 +00:00
christos
3ce1b34236 add a register validation hook for ptrace on netbsd32 to be used for
64 -> 32 debugging.
2019-07-20 18:23:05 +00:00
kamil
615deb79c3 Enhance locking of ptrace_update_lwp
Replace lwp_delref() + mutex_enter() with: mutex_enter() + lwp_delref2().
This avoids extra taking and exiting from a mutex.

Add missing mutex_exit() for LW_SYSTEM.

Do not switch lwp for PT_SET_SIGINFO. This operation is not needed and
avoids panic for >2 LWPs as p_lock is attempted to be entered again in a
critical section.
2019-07-18 20:10:46 +00:00
hannken
823dcab2e3 Make namei() work with no root dir yet.
From David Holland with minor tweaks from me.

Should fix PR kern/54378 (panic with TLB miss when attempting to reboot)
2019-07-18 09:39:40 +00:00
pgoyette
40a27fe72c Move the assignment of SCTP-specific function hooks/pointers.
Without this, a rumpkernel (appropriately modified) built with SCTP
enabled will try to assign the function pointers, but the targets
are only available in rumpnet.  We cannot link the rumpkernel against
rumpnet because rumpnet is already linked against rumpkernel and we
would end up with a circular dependency.

As reported in private Email by rjs@
2019-07-16 22:57:55 +00:00
maxv
c88009ff0d Fix info leaks: the alignment of the structures causes uninitialized heap
memory to be copied to userland in sys_recvmsg().
2019-07-11 17:30:44 +00:00
maxv
3583c449f2 Fix info leak: instead of using SS_INIT as a literal compound, use a global
variable from rodata. The compound gets pushed on the stack, the padding
of the structure was therefore not initialized, and was getting leaked to
userland in sys___sigaltstack14().
2019-07-10 17:52:22 +00:00
maxv
85d8cf0368 Zero out 'cprng->cs_name' entirely. Otherwise the RND pool gets polluted
by uninitialized bits from the end of the string.
2019-07-10 17:32:37 +00:00
maxv
5d6ad4f735 The whole 'tv' structure gets added to the RND pool, so clear it first,
otherwise each random buffer gets tainted by uninitialized bytes from the
padding.
2019-07-07 15:12:59 +00:00
maxv
7d872de20d Fix bug: if seg == UIO_SYSSPACE, tv[] is not initialized. The branches
should depend on tptr[] instead.
2019-07-06 14:37:24 +00:00
maxv
469366add0 Fix (harmless) uninitialized variable. In the path
namei_tryemulroot -> namei_oneroot-> namei_start

There was a branch where 'ndp->ni_erootdir' was not initialized.
2019-07-06 14:27:38 +00:00
maxv
c68a43b15e Fix info leak. The padding of 'sigact' is not initialized, it gets copied
in the proc, and can later be obtained by userland.
2019-07-05 17:14:48 +00:00
maxv
54e07df912 Invert two conditions, to fix uninitialized memory access. If the node is
an immediate, then the 64 bits of nnode.sysctl_data may not all be
initialized, since this is an union.

Obviously, this is harmless; but still a bug, so fix it.
2019-07-03 17:31:32 +00:00
maxv
413b53f543 Restrict the size given to copyoutstr. It is safer to do that; even if
there is no actual bug here, since the buffer is guaranteed to be NUL
terminated.

With KASAN we check the whole buffer to cover the "worst" case, and here
it triggered false positives because the buffer size was not filtered.
2019-07-01 17:15:43 +00:00
maxv
e4c2eafeb5 Fix bug, don't release the reflock if we didn't take it in the first place.
Looks like there are other locking issues in here.

Reported-by: syzbot+81d2c90809163ab1e13c@syzkaller.appspotmail.com
2019-06-29 11:37:17 +00:00
maxv
e27db06e76 The big pool allocators use pool_page_alloc(), which allocates page-aligned
storage. So if we switch to a big pool, set PR_NOALIGN, because the address
of the storage is not aligned to the item size.

Should fix PR/54319.
2019-06-29 11:13:23 +00:00
maxv
7c8c87789c Fix this fucking shit once and for all, for fuck's sake. 2019-06-27 19:56:10 +00:00
christos
0bbbbda30f remove offs initialization and XXX gcc comment. Offs should always be
initialized. Pointed out by maxv.
2019-06-27 17:09:31 +00:00
christos
cfc6bad607 Return an error if the path was too long. Pointed out by maxv 2019-06-27 17:07:51 +00:00
maxv
a7232de6b1 Remove useless debugging messages which achieved nothing but hiding bugs. 2019-06-26 20:28:59 +00:00
christos
b68b72d797 whitespace around operators 2019-06-26 00:30:39 +00:00
christos
286517a8d8 Fail if getcwd fails. Pointed out by maxv@ 2019-06-25 21:32:58 +00:00
wiz
79f6debed4 Fix word (direct -> directory) in comment. 2019-06-25 19:47:35 +00:00
christos
3879522092 add a comment explaining what this does. 2019-06-25 18:06:29 +00:00
maxv
abb1684df1 Fix buffer overflow. It seems that some people need to go back to the
basics of C programming.

Reported-by: syzbot+8665827f389a9fac5cc9@syzkaller.appspotmail.com
2019-06-25 16:58:02 +00:00
rjs
8e33725173 Split out the prototypes for add/delete address into a separate header file. 2019-06-25 15:33:55 +00:00
christos
92a7bfb176 the tracer, not the tracee determine if we are going to convert the ptrace
data from 64 to 32.
2019-06-24 20:29:41 +00:00
skrll
f89a668ad5 Fix 'unknown' spellos 2019-06-24 06:24:33 +00:00
kamil
da062e4b24 Restore ability to create regular files with mknod(2)
This behavior is requested in ATF tests.
2019-06-21 14:58:32 +00:00
kamil
bd08c835ff Revert previous
There is fallout in gdb that will be investigated before relanding this.
2019-06-21 04:28:12 +00:00
kamil
14d51c2ac0 Enhance reliability of ptrace(2) in a debuggee with multiple LWPs
Stop competing between threads which one emits event signal quicker and
overwriting the signal from another thread.

This fixes missed in action signals.

NetBSD truss can now report reliably all TRAP_SCE/SCX/etc events without
reports of missed ones.

This was one of the reasons why debuggee with multiple threads misbehaved
under a debugger.
2019-06-21 04:02:57 +00:00
kamil
21a72dea25 Eliminate PS_NOTIFYSTOP remnants from the kernel
This flag used to be useful in /proc (BSD4.4-style) debugging semantics.
Traced child events were notified without signaling the parent.

This property was removed in NetBSD-8.0 and had no users.

This change simplifies the signal code, removing dead branches.

NFCI
2019-06-21 01:03:51 +00:00
kamil
177438b064 Add mkfifo{,at}(2) mode in mknod{,at}(2) as requested by POSIX
mknod with mode & S_IFIFO and dev=0 shall behave like mkfifo.

Update the documentation to reflect this state.

Add ATF tests.

This is an in-kernel implementation as typically user-space programs use
mkfifo(2) directly, however whenever there is need to bypass libc (like in
valgrind) then portable POSIX software calls the mknod syscall.

Noted on tech-kern@ by Greg Troxel.
2019-06-20 03:31:53 +00:00
pgoyette
e30b8a7c9d In case of error resolving symbol references, we cannot rely on the
module's name still being available - it may be destroyed when
kobj_affix() unloads the object.  So make a copy of the name first
so we can use it in a useful error message.

(Without this, I've have affix errors go into an infinite loop
trying to print the error message!)
2019-06-19 15:01:01 +00:00
kamil
75d0f3a753 Correct wrong type of uio_seg passed to do_sys_mknodat()
It was introduced by an accident in previous commit to this file.

Detected by syzbot:
https://syzkaller.appspot.com/text?tag=CrashLog&x=16635d9ea00000
2019-06-19 14:16:06 +00:00
kamil
f32d6f14d4 Add support for KTR logs of SIGTRAP for TRAP_CHILD events
Previously it was disabled due to vfork(2) synchronization issues.
These problems are now gone.

While there, set l_vforkwaiting to false in posix_spawn. This is not very
needed but it does not make harm to keep it initialized explicitly.
2019-06-18 23:53:55 +00:00
kamil
3c0a7f49e5 Drop unused retval pointer from do_sys_mknod{,at}()
No functional change intended.
2019-06-18 22:34:25 +00:00
christos
d284430a9c include #ifdefs in the syscalls autoload file and make it standalone.
XXX: This needs to be re-thought
2019-06-18 16:24:17 +00:00
christos
54eec0e664 remove XXX from the quota call. 2019-06-18 16:06:45 +00:00
hannken
7eef6f5745 Add an owner field to fstrans mount info and use it to hold
the thread currently suspending this mount.

Remove now unneeded state FSTRANS_EXCL.

It is now possible to suspend a file system from a thread
already holding fstrans locks.  Use with care ...
2019-06-17 08:07:27 +00:00
maxv
71e9a696c0 Add KASAN_PANIC, an option to turn KASAN warning into kernel panics,
requested by Siddharth. While here clarify a little.
2019-06-15 06:40:34 +00:00
kamil
6aa1291e37 Correct use-after-free issue in vfork(2)
In the previous behavior vforking parent was keeping pointer to a child
and checking whether it clears a PL_PPWAIT in its bitfield p_lflag. However
a child can go invalid between exec/exit event from child and waking up
vforked parent and this can cause invalid pointer read and in the worst
scenario kernel crash.

In the new behavior vforked child keeps a reference to vforked parent LWP
and sets a value l_vforkwaiting to false. This means that vforked child
can finish its work, exec/exit and be terminated and once parent will be
woken up it will read its own field whether its child is still blocking.

Add new field in struct lwp: l_vforkwaiting protected by proc_lock.
In future it should be refactored and all PL_PPWAIT users transformed to
l_vforkwaiting and next l_vforkwaiting probably transformed into a bit
field.

This is another attempt of fixing this bug after <rmind> from 2012 in
commit:

Author: rmind <rmind@NetBSD.org>
Date:   Sun Jul 22 22:40:18 2012 +0000

    fork1: fix use-after-free problems.  Addresses PR/46128 from Andrew Doran.
    Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
    other LWP is undesirable, but this is enough for netbsd-6.

The new version no longer performs unsafe access in l_lflag changing the
LP_VFORKWAIT bit.

Verified with ATF t_vfork and t_ptrace* tests and they are no longer
causing any issues in my local setup.

Fixes PR/46128 by Andrew Doran
2019-06-13 20:20:18 +00:00
christos
ed735e6546 make pool assertion messages consistent. 2019-06-13 01:13:12 +00:00
kamil
293d38fbef Correct inversed condition for dying process in sigswitch()
If a process is exiting and it was not asked to relock proc_lock, do not
free the mutex as it causes panic. This bug is a timing bug as the faulty
condition is not deterministic and fires only somtimes, but is quickly
triggerable when executed in an infinite loop.

Detected and reported with LLDB test-suite by <mgorny>
2019-06-13 00:07:19 +00:00
kamil
4c91c5e80e Add support for PTRACE_POSIX_SPAWN to report posix_spawn(3) events
posix_spawn(3) is a first class syscall in NetBSD, different to
(V)FORK+EXEC as these operations are executed in one go. This differs to
Linux and FreeBSD, where posix_spawn(3) is implemented with existing kernel
primitives (clone(2), vfork(2), exec(3)) inside libc.

Typically LLDB and GDB software is aware of FORK/VFORK events. As discussed
with the LLDB community, instead of slicing the posix_spawn(3) operation
into phases emulating (V)FORK+EXEC(+VFORK_DONE) and returning intermediate
state to the debugger, that might have abnormal state, introduce new event
type: PTRACE_POSIX_SPAWN.

A debugger implementor can easily map it into existing fork+exec semantics
or treat as a distinct event.

There is no functional change for existing debuggers as there was no
support for reporting posix_spawn(3) events on the kernel side.
2019-06-11 23:18:55 +00:00
pgoyette
2448a81742 Improve error message 2019-06-11 15:20:57 +00:00
chs
c7c4f4753b shmctl(SHM_LOCK) does not need to mess with mappings of the shm segment,
uvm_obj_wirepages() is sufficient.  this fixes the problem reported in
https://syzkaller.appspot.com/bug?id=71f9271d761f5b6ed517a18030dc04f0135e6179
2019-06-10 00:35:47 +00:00