Commit Graph

9849 Commits

Author SHA1 Message Date
riastradh f7b8b20d17 Initialize the in/out parameter vmin.
vmin is only an optional hint since we're not passing UVM_FLAG_FIXED,
but that doesn't mean we should use uninitialized stack garbage as
the hint.

Noted by chs@.
2017-10-20 19:06:46 +00:00
riastradh 4691bf4bd7 Carve out KVA for execargs on boot from an exec_map like we used to.
Candidate fix for PR kern/45718: `processes sometimes get stuck and
spin in vm_map', a problem that has been plaguing all our 32-bit
ports for years.

Since we currently use large (256k) buffers for execargs, and since
nobody has stepped up to tackle breaking them into bite-sized (or at
least page-sized) chunks, after KVA gets sufficiently fragmented we
can't allocate new execargs buffers from kernel_map.

Until 2008, we always carved out KVA for execargs on boot with a uvm
submap exec_map of kernel_map.  Then ad@ found that the uvm_km_free
call, to discard them when done, cost about 100us, which a pool
avoided:

https://mail-index.NetBSD.org/tech-kern/2008/06/25/msg001854.html
https://mail-index.NetBSD.org/tech-kern/2008/06/26/msg001859.html

ad@ _simultaneously_ introduced a pool _and_ eliminated the reserved
KVA in the exec_map submap.  This change preserves the pool, but
restores exec_map (with less code, by putting it in MI code instead
of copying it in every MD initialization routine).

Patch proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/19/msg022461.html

Patch tested by bouyer@:
https://mail-index.NetBSD.org/tech-kern/2017/10/20/msg022465.html

I previously discussed the issue on tech-kern before I knew of the
history around exec_map:
https://mail-index.NetBSD.org/tech-kern/2012/12/09/msg014695.html

The candidate workaround I proposed of using pool_setlowat to force
preallocation of KVA would also force preallocation of physical RAM,
which is a waste not incurred by using exec_map, and which is part of
why I never committed it.

There may remain a general problem that if thread A calls pool_get
and tries to service that request by a uvm_km_alloc call that hangs
because KVA is scarce, and thread B does pool_put, the pool_put in
thread B will not notify the pool_get in thread A that it doesn't
need to wait for KVA, and so thread A may continue to hang in
uvm_km_alloc.  However,

(a) That won't apply here, because there is exactly as much KVA
available in exec_map as exec_pool will ever try to use.

(b) It is possible that may not even matter in other cases as long as
the page daemon eventually tries to shrink the pool, which will cause
a uvm_km_free that can unhang the hung uvm_km_alloc.

XXX pullup-8
XXX pullup-7
XXX pullup-6
XXX pullup-5, perhaps...
2017-10-20 14:48:43 +00:00
martin f115d566a4 Make check_exec() errors print the name of the binary that fails to
execute.
2017-10-20 12:11:34 +00:00
bouyer d4ce271380 PR port-arm/52603:
There is a race here, as seen on arm with FPU:
LWP L is running but not on CPU, has its FPU state on CPU2 which
has not been released yet, so fpexc still has VFP_FPEXC_EN set in the PCB copy.

LWP L is scheduled on CPU1, CPU1 calls cpu_switchto() for L in mi_switch().
cpu_switchto() will set VFP_FPEXC_EN in the FPU's fpexc register per the
PCB fpexc copy.

Before CPU1 calls pcu_switchpoint() for L, CPU2 calls
pcu_do_op(PCU_CMD_SAVE | PCU_CMD_RELEASE) for L because it still holds its
FPU state and wants to load another lwp. This cause VFP_FPEXC_EN to
be cleared in the PCB copy, but not in CPU1's register. L's l_pcu_cpu is
set to NULL.

When CPU1 calls pcu_switchpoint() for L it see l_pcu_cpu is NULL, and doesn't
call the release callback.

Now CPU1 has its FPU enabled but with the wrong FPU state.

Fix by releasing the PCU even if l_pcu_cpu is NULL.
2017-10-16 15:03:57 +00:00
christos bb321f6151 Setting AT_BASE on static binaries breaks TLS because they assume that
it is 0, will fix it differently.
2017-10-16 01:50:55 +00:00
christos 3df3b581f3 For static PIE set the interpreter address to be the entry offset so we
don't lose it.
2017-10-08 15:00:40 +00:00
maxv 252ca9c54a Remove compat_linux32 from the autoload list and add a enable/disable
sysctl, like compat_linux.
2017-09-29 17:47:29 +00:00
maxv aef145dda9 Remove compat_linux from the autoload list, and add a sysctl to enable or
disable it - which defaults to disabled. The following command is now
required to use linux binaries:

	sysctl -w emul.linux.enabled=1

After a discussion on tech-kern@. All the other ideas to reduce the attack
surface have drawbacks, and this sysctl seems to be the best option.
2017-09-29 17:08:00 +00:00
joerg 0e5b5aa88a Fix non-DIAGNOSTICS build by adjusting _vstate_assert here too. 2017-09-22 06:05:20 +00:00
joerg 5db0939512 Change the VSTATE_ASSERT_UNLOCKED code by pushing the potential lock
handling into the backend and doing an optimistic (unlocked) check
first. Always taking the vnode interlock makes this assertion otherwise
very heavy for multi-processor machines. Ride the kernel version bump.
2017-09-21 18:19:44 +00:00
jakllsch 6b34528ad5 Initialize ex_lock and ex_cv only in the not-EX_EARLY case. 2017-09-18 13:22:56 +00:00
christos e7f0067cbe more const 2017-09-16 23:55:33 +00:00
christos c483c7cba9 more debug info 2017-09-16 23:55:16 +00:00
christos 9d349e2adb add missing const 2017-09-16 23:25:34 +00:00
sevan 684872c792 Remove support for VERIFIED_EXEC_FP_RMD160, VERIFIED_EXEC_FP_SHA1, and VERIFIED_EXEC_FP_MD5 options.
These algorithms are either broken or on their way to being broken.

Discussed on tech-security
http://mail-index.netbsd.org/tech-security/2017/08/21/msg000936.html

ok riastradh
2017-09-13 22:24:42 +00:00
joerg 69ab70f077 Fix a race between sysctl_unpcblist and closef. 2017-09-09 14:41:19 +00:00
pgoyette 1c284673f6 When adding a new veriexec_file_entry, if an entry already exists with
all the same values (except for the filename) just ignore it.  Otherwise
report the duplicate-entry error.

This allows the user to create a signature file with veriexegen(8) and
not worry about duplicate entries (due to hard-linked files) which will
otherwise cause /etc/rc.d/veriexec to report an error.

Fixes PR kern/52512

XXX Pull-up for -8
2017-08-31 08:47:19 +00:00
pgoyette c31e1d979d Revert previous changes. They are wrong. The intended clean-up
is already being handled by the call to veriexec_file_free() in
the "out:" path.
2017-08-29 12:48:50 +00:00
pgoyette f18bf91f4d One more resource to release - the filename, if we kept it. 2017-08-29 10:23:12 +00:00
pgoyette 8bdb86c1df Release any allocated resources if we take the error paths.
As posted on tech-kern and discussed on IRC.
2017-08-29 10:19:54 +00:00
dholland cec712d80b If we go to allocate and find someone else has at the same time, don't
trigger a refcount leak of the other guy's object. From mjg@freebsd.

While here also remove a bogus use of lbolt on the same path.
2017-08-28 04:57:11 +00:00
kamil a69b333e73 Remove the filesystem tracing feature
This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
 - /proc/#/ctl from mount_procfs(8)
 - P_FSTRACE note from the documentation of ps(1)
 - /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
 - KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
 - source code file miscfs/procfs/procfs_ctl.c
 - PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
 - KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
 - PSL_FSTRACE (0x00010000) from sys/sys/proc.h
 - P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>
2017-08-28 00:46:06 +00:00
kre 968e76ebe6 Build fix attempt ... changes affect !KERNEL (ie: userland, rump) version
of this file only.

Rather than adding meaningless {} around all uses of functions that
are #defined to nothing for userland, #define the funcs to something
that is functionally equivalent (but which appeases gcc).

Also, define KASSERT() to nothing for userland, which avoids the need
to add a #definee for mutex_owned which would otherwise be needed,
and simmultaneoiusly stops gcc from complaining about a lack of a prototype.
2017-08-24 17:18:55 +00:00
skrll 78070145bc Whitespace fix 2017-08-24 11:37:25 +00:00
jmcneill 7de85ed29e Add EX_EARLY flag for extent_create, which skips locking. Required for
using extent subsystem in early bootstrap code, before caches are enabled.
From skrll@
2017-08-24 11:33:28 +00:00
hannken 28650af9eb Change forced unmount to revert open device vnodes to anonymous devices. 2017-08-21 09:00:21 +00:00
hannken 7801661c06 No need to cache anonymous device vnodes, they will never be looked up.
Set key to (dead_rootmount, 0, NULL) and add assertions.
2017-08-21 08:56:45 +00:00
maxv c778810068 Remove compat_svr4, compat_svr4_32 and compat_ibcs2 from the list of
autoloaded modules. These options are disabled everywhere (except ibcs2
on Vax, but Vax does not support kernel modules, so doesn't matter),
therefore there is no issue in removing them from the list. Interested
users will now have to do a 'modload' first, or uncomment the entries in
GENERIC.
2017-08-08 16:57:32 +00:00
maxv 1d68b497f2 Remove compat_freebsd from the list of autoloaded modules. Interested users
will now have to type 'modload' to use it, or uncomment the entry in
GENERIC. I should have removed it when I disabled COMPAT_FREEBSD by
default, sorry about that.
2017-08-08 08:12:14 +00:00
christos 7869295617 use the same string for the log and uprintf. 2017-08-06 09:14:14 +00:00
mrg 65d1d4aa12 normalise a BIOHIST log message 2017-08-04 07:00:17 +00:00
riastradh 56272c962e Don't walk off the end of the dirent buffer.
From Ilja Van Sprundel.
2017-07-28 15:37:23 +00:00
riastradh cf5a000fe5 Clamp the length we use, not the length we don't.
Avoids uninitialized memory disclosure to userland.

From Ilja Van Sprundel.
2017-07-28 15:16:39 +00:00
martin f08cc415b0 Avoid integer overflow in kern_malloc(). Reported by Ilja Van Sprundel.
XXX Time to kill malloc() completely!
2017-07-28 12:28:48 +00:00
skrll 111cbb5944 Add a condition variable (ex_flwanted) to struct extent so that ex_flags
becomes an invariant.

Remove strange locking for ex_flags as a result.
2017-07-24 19:56:07 +00:00
maxv d245e6f22a Should be loadfactor(). 2017-07-14 13:23:48 +00:00
maxv bcdfaccefa Revert rev1.26. l_estcpu is increased by only one cpu, not all of them. 2017-07-14 13:02:20 +00:00
hannken 31624a0218 Regen. 2017-07-12 09:31:59 +00:00
hannken d29c150b3b As VOP_ADVLOCK() may block indefinitely we cannot take fstrans here.
Fixes PR kern/52364: System hangs not much before showing the login prompt
2017-07-12 09:31:07 +00:00
dholland 9a94872476 Fix vnode leak on error, introduced by the openat family changes in -r1.200.
From mjg@freebsd.
2017-07-09 22:48:44 +00:00
maxv 5dc461da23 explain a bit 2017-07-08 15:15:43 +00:00
christos c85be1e9c7 move the timestamp stuff to uipc_socket.c because it already has the compat
includes.
2017-07-06 17:42:39 +00:00
christos 2b50acc97b Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.
2017-07-06 17:08:57 +00:00
christos c3a5f17a00 don't print diagnostic for AF_LINK 2017-07-05 17:54:46 +00:00
riastradh 0a89dacf06 Add cv_timedwaitbt, cv_timedwaitbt_sig.
Takes struct bintime maximum delay, and decrements it in place so
that you can use it in a loop in case of spurious wakeup.

Discussed on tech-kern a couple years ago:

https://mail-index.netbsd.org/tech-kern/2015/03/23/msg018557.html

Added a parameter for expressing desired precision -- not currently
interpreted, but intended for a future tickless kernel with a choice
of high-resolution timers.
2017-07-03 02:12:47 +00:00
riastradh a18efaac6b Nix trailing whitespace. No functional change. 2017-07-03 00:53:33 +00:00
joerg 5f391f4ae2 Export the guard size of the main thread via vm.guard_size. Add a
complementary writable sysctl for the initial guard size of threads
created via pthread_create. Let the existing attribut accessors do the
right thing. Raise the default guard size for threads to 64KB.
2017-07-02 16:41:32 +00:00
christos 6d52cc85b8 don't warn about AF_LINK sockets with sa_len less than the size of the sockaddr 2017-07-02 02:39:18 +00:00
christos c4aed00fad fix file descriptor locking (from joerg).
fixes kernel crashes by running go
XXX: pullup-7
2017-07-01 20:08:56 +00:00
christos 7700e78cab put the code that returns the sizeof the socket by family in one place. 2017-07-01 16:59:12 +00:00