Commit Graph

10132 Commits

Author SHA1 Message Date
christos
dce2bc7064 grab a copy of the absolute pathbuf, before namei() munges it. 2017-11-13 22:01:45 +00:00
christos
c84ec9f755 Use the pathbuf which we pass to namei() (which is always absolute) as the
resolved pathname. We need this in the case of scripts where p_path needs
to point to the interpreter and not the script itself. Otherwise things
like perl script that depend on /proc/$$/exe to re-exec themselves end up
being fork bombs.

In reality we should be using the fully resolved/canonicalized path here, but
namei is not giving it back to us.
2017-11-13 20:38:31 +00:00
riastradh
227d009714 Apply same treatment to cv_timedwaitbt. 2017-11-12 20:04:51 +00:00
riastradh
4800a332d3 Clarify interpretation of timeout/epsilon in cv_timedwaitbt. 2017-11-12 19:46:34 +00:00
christos
b328643cbf Don't add kevents to closing file descriptors (from riastradh) 2017-11-11 03:58:01 +00:00
riastradh
509483453b Assert KM_SLEEP xor KM_NOSLEEP in all kmem allocation. 2017-11-09 23:20:12 +00:00
christos
17c894d283 Add assertions that either PR_WAITOK or PR_NOWAIT are set. 2017-11-09 22:52:26 +00:00
christos
b3254cd976 Don't use 0 for PR_NOWAIT 2017-11-09 22:21:27 +00:00
christos
b368d720c2 don't pass 0 to the pool flags 2017-11-09 21:57:06 +00:00
christos
9b7a6414b8 Add O_REGULAR to enforce opening of only regular files
(like we have O_DIRECTORY for directories).
This is better than open(, O_NONBLOCK), fstat()+S_ISREG() because opening
devices can have side effects.
2017-11-09 20:30:01 +00:00
christos
948108c143 Handle the ERESTART case from pool_grow() 2017-11-09 19:34:17 +00:00
christos
a20c95d549 make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan' 2017-11-09 15:53:40 +00:00
christos
56ae922037 Since pr_lock is now used to wait for two things now (PR_GROWING and
PR_WANTED) we need to loop for the condition we wanted.
2017-11-09 15:40:23 +00:00
christos
f890274f96 add a "booted_method" string to aid in debugging double boot matches. 2017-11-09 01:02:55 +00:00
christos
0891190b55 hack around namei problem. 2017-11-07 20:58:23 +00:00
christos
0011aa658c Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
  always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
  and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
  NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
   just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
   vnode and then using getcwd() on the parent directory?
2017-11-07 19:44:04 +00:00
christos
3afe107bee Add two utility functions to help use kmem with strings: kmem_strdupsize,
kmem_strfree.
2017-11-07 18:35:57 +00:00
christos
cd1c6201df We computed the length of the string already, so use it... 2017-11-07 15:57:38 +00:00
riastradh
d6585e3401 Assert that pool_get failure happens only with PR_NOWAIT.
This would have caught the mistake I made last week leading to null
pointer dereferences all over the place, a mistake which I evidently
poorly scheduled alongside maxv's change to the panic message on x86
for null pointer dereferences.
2017-11-06 18:41:22 +00:00
mlelstv
cc92bcd96f pool_grow can now fail even when sleeping is ok. Catch this case in pool_get
and retry.
2017-11-05 07:49:45 +00:00
christos
ad97afb146 use Elf_Sym ** instead of casting. 2017-11-04 22:17:55 +00:00
martin
640f0abac6 Make kobj_sym_lookup's result type an Elf_Addr.
Fixes the arm builds.
2017-11-04 12:14:41 +00:00
pgoyette
64ae8753d8 Remove the ABI version-and-length check that was recently introduced;
sysctl(9) ABIs should be stable across versions.

XXX Pull-up to -8
2017-11-03 22:45:14 +00:00
maxv
4e8a8f71db Handle absolute relocations coming from the kernel: preserve SHN_ABS in
the kernel and module symbols, and when relocating a symbol that has
SHN_ABS, take its value as-is and don't return an error if it equals zero.

Sent on tech-kern@.
2017-11-03 09:59:07 +00:00
riastradh
50a782dc6e C99ify initialization of dummy_timecounter. 2017-11-02 15:28:23 +00:00
martin
a6bab1a764 Allow architectures to define a macro PROC_MACHINE_ARCH(P) and
PROC_MACHINE_ARCH32(P) to override the value for sysctl hw.machine_arch
(native and netbsd32 commpat resp.).

Use these for arm and mips instead of the (not working, noisy, in case
of arm) sysctl override and #ifdef __mips__ in architecture neutral
code.
2017-10-31 12:37:23 +00:00
riastradh
aca2a29cb6 Allow only one pending call to a pool's backing allocator at a time.
Candidate fix for problems with hanging after kva fragmentation related
to PR kern/45718.

Proposed on tech-kern:

https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html

Tested by bouyer@ on i386.

This makes one small change to the semantics of pool_prime and
pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if
there is a pending call to the backing allocator in another thread but
we are not actually out of memory.  That is unlikely because nearly
always these are used during initialization, when the pool is not in
use.

XXX pullup-8
XXX pullup-7
XXX pullup-6 (requires tweaking the patch)
XXX pullup-5...
2017-10-28 17:06:43 +00:00
pgoyette
cb32a134a5 Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
  the kernel and in the structures used for exporting the history data
  to userland via sysctl(9).  This avoids problems on some architectures
  where passing a 64-bit (or larger) value to printf(3) can cause it to
  process the value as multiple arguments.  (This can be particularly
  problematic when printf()'s format string is not a literal, since in
  that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
  include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
  updated.  Each format specifier now includes an explicit length
  modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
  updated to replace uses of "%p" with "%#jx", and the pointer
  arguments are now cast to (uintptr_t) before being subsequently cast
  to (uintmax_t).  This is needed to avoid compiler warnings about
  casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
  "%c" format strings replaced with numeric formats; several instances
  of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
  history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
  the -u option does not exist (previously, this condition was silently
  ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
  data exported via sysctl(9) and exits if they do not match the values
  with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
  requirements imposed on the format strings, along with several other
  minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
    uint64_t) for the history arguments.  But that would require another
    "rototill" of all the users in the future when we add support for an
    architecture that supports a larger size.  Also, the printf(3) format
    specifiers for explicitly-sized values, such as "%"PRIu64, are much
    more verbose (and less aesthetically appealing, IMHO) than simply
    using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
    but it is possible that I've missed some of them.  I would be glad to
    update any stragglers that anyone identifies.
2017-10-28 00:37:11 +00:00
joerg
e64612f440 Revert printf return value change. 2017-10-27 12:25:14 +00:00
utkarsh009
f11595bab5 [syzkaller] Cast all the printf's to (void *)
> as a result of new printf(9) declaration.
2017-10-27 09:59:16 +00:00
maya
18b796d442 Use C99 initializer for filterops
Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- 	{ a, b, c, d
+ 	{
+ 	.f_isfd = a,
+ 	.f_attach = b,
+ 	.f_detach = c,
+ 	.f_event = d,
};
2017-10-25 08:12:37 +00:00
riastradh
2a7a645aaa Document lock order and locking rules. 2017-10-25 06:02:40 +00:00
jdolecek
d3e642e387 remove counter for 'journal I/O bufs biowait' - it's (total - async), so
superfluous; adjust the description of the the other counters a bit to make
them more clear
2017-10-23 19:03:40 +00:00
riastradh
f7b8b20d17 Initialize the in/out parameter vmin.
vmin is only an optional hint since we're not passing UVM_FLAG_FIXED,
but that doesn't mean we should use uninitialized stack garbage as
the hint.

Noted by chs@.
2017-10-20 19:06:46 +00:00
riastradh
4691bf4bd7 Carve out KVA for execargs on boot from an exec_map like we used to.
Candidate fix for PR kern/45718: `processes sometimes get stuck and
spin in vm_map', a problem that has been plaguing all our 32-bit
ports for years.

Since we currently use large (256k) buffers for execargs, and since
nobody has stepped up to tackle breaking them into bite-sized (or at
least page-sized) chunks, after KVA gets sufficiently fragmented we
can't allocate new execargs buffers from kernel_map.

Until 2008, we always carved out KVA for execargs on boot with a uvm
submap exec_map of kernel_map.  Then ad@ found that the uvm_km_free
call, to discard them when done, cost about 100us, which a pool
avoided:

https://mail-index.NetBSD.org/tech-kern/2008/06/25/msg001854.html
https://mail-index.NetBSD.org/tech-kern/2008/06/26/msg001859.html

ad@ _simultaneously_ introduced a pool _and_ eliminated the reserved
KVA in the exec_map submap.  This change preserves the pool, but
restores exec_map (with less code, by putting it in MI code instead
of copying it in every MD initialization routine).

Patch proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/19/msg022461.html

Patch tested by bouyer@:
https://mail-index.NetBSD.org/tech-kern/2017/10/20/msg022465.html

I previously discussed the issue on tech-kern before I knew of the
history around exec_map:
https://mail-index.NetBSD.org/tech-kern/2012/12/09/msg014695.html

The candidate workaround I proposed of using pool_setlowat to force
preallocation of KVA would also force preallocation of physical RAM,
which is a waste not incurred by using exec_map, and which is part of
why I never committed it.

There may remain a general problem that if thread A calls pool_get
and tries to service that request by a uvm_km_alloc call that hangs
because KVA is scarce, and thread B does pool_put, the pool_put in
thread B will not notify the pool_get in thread A that it doesn't
need to wait for KVA, and so thread A may continue to hang in
uvm_km_alloc.  However,

(a) That won't apply here, because there is exactly as much KVA
available in exec_map as exec_pool will ever try to use.

(b) It is possible that may not even matter in other cases as long as
the page daemon eventually tries to shrink the pool, which will cause
a uvm_km_free that can unhang the hung uvm_km_alloc.

XXX pullup-8
XXX pullup-7
XXX pullup-6
XXX pullup-5, perhaps...
2017-10-20 14:48:43 +00:00
martin
f115d566a4 Make check_exec() errors print the name of the binary that fails to
execute.
2017-10-20 12:11:34 +00:00
bouyer
d4ce271380 PR port-arm/52603:
There is a race here, as seen on arm with FPU:
LWP L is running but not on CPU, has its FPU state on CPU2 which
has not been released yet, so fpexc still has VFP_FPEXC_EN set in the PCB copy.

LWP L is scheduled on CPU1, CPU1 calls cpu_switchto() for L in mi_switch().
cpu_switchto() will set VFP_FPEXC_EN in the FPU's fpexc register per the
PCB fpexc copy.

Before CPU1 calls pcu_switchpoint() for L, CPU2 calls
pcu_do_op(PCU_CMD_SAVE | PCU_CMD_RELEASE) for L because it still holds its
FPU state and wants to load another lwp. This cause VFP_FPEXC_EN to
be cleared in the PCB copy, but not in CPU1's register. L's l_pcu_cpu is
set to NULL.

When CPU1 calls pcu_switchpoint() for L it see l_pcu_cpu is NULL, and doesn't
call the release callback.

Now CPU1 has its FPU enabled but with the wrong FPU state.

Fix by releasing the PCU even if l_pcu_cpu is NULL.
2017-10-16 15:03:57 +00:00
christos
bb321f6151 Setting AT_BASE on static binaries breaks TLS because they assume that
it is 0, will fix it differently.
2017-10-16 01:50:55 +00:00
christos
3df3b581f3 For static PIE set the interpreter address to be the entry offset so we
don't lose it.
2017-10-08 15:00:40 +00:00
maxv
252ca9c54a Remove compat_linux32 from the autoload list and add a enable/disable
sysctl, like compat_linux.
2017-09-29 17:47:29 +00:00
maxv
aef145dda9 Remove compat_linux from the autoload list, and add a sysctl to enable or
disable it - which defaults to disabled. The following command is now
required to use linux binaries:

	sysctl -w emul.linux.enabled=1

After a discussion on tech-kern@. All the other ideas to reduce the attack
surface have drawbacks, and this sysctl seems to be the best option.
2017-09-29 17:08:00 +00:00
joerg
0e5b5aa88a Fix non-DIAGNOSTICS build by adjusting _vstate_assert here too. 2017-09-22 06:05:20 +00:00
joerg
5db0939512 Change the VSTATE_ASSERT_UNLOCKED code by pushing the potential lock
handling into the backend and doing an optimistic (unlocked) check
first. Always taking the vnode interlock makes this assertion otherwise
very heavy for multi-processor machines. Ride the kernel version bump.
2017-09-21 18:19:44 +00:00
jakllsch
6b34528ad5 Initialize ex_lock and ex_cv only in the not-EX_EARLY case. 2017-09-18 13:22:56 +00:00
christos
e7f0067cbe more const 2017-09-16 23:55:33 +00:00
christos
c483c7cba9 more debug info 2017-09-16 23:55:16 +00:00
christos
9d349e2adb add missing const 2017-09-16 23:25:34 +00:00
sevan
684872c792 Remove support for VERIFIED_EXEC_FP_RMD160, VERIFIED_EXEC_FP_SHA1, and VERIFIED_EXEC_FP_MD5 options.
These algorithms are either broken or on their way to being broken.

Discussed on tech-security
http://mail-index.netbsd.org/tech-security/2017/08/21/msg000936.html

ok riastradh
2017-09-13 22:24:42 +00:00
joerg
69ab70f077 Fix a race between sysctl_unpcblist and closef. 2017-09-09 14:41:19 +00:00
pgoyette
1c284673f6 When adding a new veriexec_file_entry, if an entry already exists with
all the same values (except for the filename) just ignore it.  Otherwise
report the duplicate-entry error.

This allows the user to create a signature file with veriexegen(8) and
not worry about duplicate entries (due to hard-linked files) which will
otherwise cause /etc/rc.d/veriexec to report an error.

Fixes PR kern/52512

XXX Pull-up for -8
2017-08-31 08:47:19 +00:00
pgoyette
c31e1d979d Revert previous changes. They are wrong. The intended clean-up
is already being handled by the call to veriexec_file_free() in
the "out:" path.
2017-08-29 12:48:50 +00:00
pgoyette
f18bf91f4d One more resource to release - the filename, if we kept it. 2017-08-29 10:23:12 +00:00
pgoyette
8bdb86c1df Release any allocated resources if we take the error paths.
As posted on tech-kern and discussed on IRC.
2017-08-29 10:19:54 +00:00
dholland
cec712d80b If we go to allocate and find someone else has at the same time, don't
trigger a refcount leak of the other guy's object. From mjg@freebsd.

While here also remove a bogus use of lbolt on the same path.
2017-08-28 04:57:11 +00:00
kamil
a69b333e73 Remove the filesystem tracing feature
This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
 - /proc/#/ctl from mount_procfs(8)
 - P_FSTRACE note from the documentation of ps(1)
 - /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
 - KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
 - source code file miscfs/procfs/procfs_ctl.c
 - PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
 - KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
 - PSL_FSTRACE (0x00010000) from sys/sys/proc.h
 - P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>
2017-08-28 00:46:06 +00:00
kre
968e76ebe6 Build fix attempt ... changes affect !KERNEL (ie: userland, rump) version
of this file only.

Rather than adding meaningless {} around all uses of functions that
are #defined to nothing for userland, #define the funcs to something
that is functionally equivalent (but which appeases gcc).

Also, define KASSERT() to nothing for userland, which avoids the need
to add a #definee for mutex_owned which would otherwise be needed,
and simmultaneoiusly stops gcc from complaining about a lack of a prototype.
2017-08-24 17:18:55 +00:00
skrll
78070145bc Whitespace fix 2017-08-24 11:37:25 +00:00
jmcneill
7de85ed29e Add EX_EARLY flag for extent_create, which skips locking. Required for
using extent subsystem in early bootstrap code, before caches are enabled.
From skrll@
2017-08-24 11:33:28 +00:00
hannken
28650af9eb Change forced unmount to revert open device vnodes to anonymous devices. 2017-08-21 09:00:21 +00:00
hannken
7801661c06 No need to cache anonymous device vnodes, they will never be looked up.
Set key to (dead_rootmount, 0, NULL) and add assertions.
2017-08-21 08:56:45 +00:00
maxv
c778810068 Remove compat_svr4, compat_svr4_32 and compat_ibcs2 from the list of
autoloaded modules. These options are disabled everywhere (except ibcs2
on Vax, but Vax does not support kernel modules, so doesn't matter),
therefore there is no issue in removing them from the list. Interested
users will now have to do a 'modload' first, or uncomment the entries in
GENERIC.
2017-08-08 16:57:32 +00:00
maxv
1d68b497f2 Remove compat_freebsd from the list of autoloaded modules. Interested users
will now have to type 'modload' to use it, or uncomment the entry in
GENERIC. I should have removed it when I disabled COMPAT_FREEBSD by
default, sorry about that.
2017-08-08 08:12:14 +00:00
christos
7869295617 use the same string for the log and uprintf. 2017-08-06 09:14:14 +00:00
mrg
65d1d4aa12 normalise a BIOHIST log message 2017-08-04 07:00:17 +00:00
riastradh
56272c962e Don't walk off the end of the dirent buffer.
From Ilja Van Sprundel.
2017-07-28 15:37:23 +00:00
riastradh
cf5a000fe5 Clamp the length we use, not the length we don't.
Avoids uninitialized memory disclosure to userland.

From Ilja Van Sprundel.
2017-07-28 15:16:39 +00:00
martin
f08cc415b0 Avoid integer overflow in kern_malloc(). Reported by Ilja Van Sprundel.
XXX Time to kill malloc() completely!
2017-07-28 12:28:48 +00:00
skrll
111cbb5944 Add a condition variable (ex_flwanted) to struct extent so that ex_flags
becomes an invariant.

Remove strange locking for ex_flags as a result.
2017-07-24 19:56:07 +00:00
maxv
d245e6f22a Should be loadfactor(). 2017-07-14 13:23:48 +00:00
maxv
bcdfaccefa Revert rev1.26. l_estcpu is increased by only one cpu, not all of them. 2017-07-14 13:02:20 +00:00
hannken
31624a0218 Regen. 2017-07-12 09:31:59 +00:00
hannken
d29c150b3b As VOP_ADVLOCK() may block indefinitely we cannot take fstrans here.
Fixes PR kern/52364: System hangs not much before showing the login prompt
2017-07-12 09:31:07 +00:00
dholland
9a94872476 Fix vnode leak on error, introduced by the openat family changes in -r1.200.
From mjg@freebsd.
2017-07-09 22:48:44 +00:00
maxv
5dc461da23 explain a bit 2017-07-08 15:15:43 +00:00
christos
c85be1e9c7 move the timestamp stuff to uipc_socket.c because it already has the compat
includes.
2017-07-06 17:42:39 +00:00
christos
2b50acc97b Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.
2017-07-06 17:08:57 +00:00
christos
c3a5f17a00 don't print diagnostic for AF_LINK 2017-07-05 17:54:46 +00:00
riastradh
0a89dacf06 Add cv_timedwaitbt, cv_timedwaitbt_sig.
Takes struct bintime maximum delay, and decrements it in place so
that you can use it in a loop in case of spurious wakeup.

Discussed on tech-kern a couple years ago:

https://mail-index.netbsd.org/tech-kern/2015/03/23/msg018557.html

Added a parameter for expressing desired precision -- not currently
interpreted, but intended for a future tickless kernel with a choice
of high-resolution timers.
2017-07-03 02:12:47 +00:00
riastradh
a18efaac6b Nix trailing whitespace. No functional change. 2017-07-03 00:53:33 +00:00
joerg
5f391f4ae2 Export the guard size of the main thread via vm.guard_size. Add a
complementary writable sysctl for the initial guard size of threads
created via pthread_create. Let the existing attribut accessors do the
right thing. Raise the default guard size for threads to 64KB.
2017-07-02 16:41:32 +00:00
christos
6d52cc85b8 don't warn about AF_LINK sockets with sa_len less than the size of the sockaddr 2017-07-02 02:39:18 +00:00
christos
c4aed00fad fix file descriptor locking (from joerg).
fixes kernel crashes by running go
XXX: pullup-7
2017-07-01 20:08:56 +00:00
christos
7700e78cab put the code that returns the sizeof the socket by family in one place. 2017-07-01 16:59:12 +00:00
snj
4e609ee710 fix typo 2017-06-25 04:10:47 +00:00
joerg
b77121f193 Recommit exec_subr.c revision 1.79:
Always include a 1MB guard area beyond the end of stack. While ASLR will
  normally create a guard area as well, this provides a deterministic area
  for all binaries.

  Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from
  Qualys.

Additionally, change VM_DEFAULT_ADDRESS_TOPDOWN to include
user_stack_guard_size in the size reservation.
2017-06-23 21:28:38 +00:00
skrll
34397172e3 Unwrap two lines. NFC. 2017-06-22 09:05:09 +00:00
martin
8ee7e18703 Change a KASSERT to KASSERTMSG and print the xcall function to be
invoked as a debugging help.
2017-06-21 07:39:04 +00:00
christos
f4961bd8ed Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.
2017-06-20 20:34:49 +00:00
joerg
2e851f5508 Revert for the moment, creates problems on i386. 2017-06-19 19:02:16 +00:00
joerg
5bcc4a51d6 Always include a 1MB guard area beyond the end of stack. While ASLR will
normally create a guard area as well, this provides a deterministic area
for all binaries.

Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from
Qualys.
2017-06-19 15:53:16 +00:00
hannken
a94bf97d25 Make the fast path of fstrans_get_lwp_info() "static inline". 2017-06-18 14:00:17 +00:00
hannken
90e2dee24a Clear fstrans entries whose mount is gone from the last fstrans_done() only. 2017-06-18 13:59:45 +00:00
chs
2b3f157429 create an nmap table for module symtabs too.
needed by dtrace.
2017-06-14 00:52:37 +00:00
riastradh
26bd73f202 Add heading comment for private localcount_adjust subroutine. 2017-06-12 21:08:34 +00:00
riastradh
44df486bb8 Move forward declaration to top of file.
Keep header comment above localcount_init adjoined to it.

No functional change.
2017-06-12 21:07:14 +00:00
chs
20bf3061d4 define a copy of getnanotime() named dtrace_getnanotime() so that
dtrace can know from the name that it should not allow setting
fbt probes on it.  needed by dtrace.
2017-06-09 01:16:33 +00:00
chs
3756187172 add some pool_allocators for pool item sizes larger than PAGE_SIZE.
needed by dtrace.
2017-06-08 04:00:01 +00:00
chs
ec5ea71a90 move some buffer cache internals declarations from buf.h to vfs_bio.c.
this is needed to avoid name conflicts with ZFS and also
makes it clearer that other code shouldn't be messing with these.
remove the LFS debug code that poked around in bufqueues and
remove the BQ_EMPTY bufqueue since nothing uses it anymore.
provide a function to let LFS and wapbl read the value of nbuf for now.
2017-06-08 01:23:01 +00:00
chs
67c81802f1 allow cv_signal() immediately followed by cv_destroy().
this sequence is used by ZFS in a couple places and by supporting it
natively we can undo our local ZFS changes that avoided it.
note that this is only legal when all of the waiters use cv_wait()
and not any of the other variations, and lockdebug will catch
any violations of this rule.
2017-06-08 01:09:52 +00:00
hannken
287643b0da Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.
2017-06-04 08:05:41 +00:00
hannken
775d23a76b Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.
2017-06-04 08:03:26 +00:00
hannken
f5647f853e Locking a layer vnode using the regular bypass routine is no longer
racy.  Undo the change from 2017-03-30 11:16:52, commitid eurqbzuGxGRlryLz
and make vi_lock a krwlock_t again.
2017-06-04 08:02:26 +00:00
hannken
48c67e7912 Regen. 2017-06-04 08:00:27 +00:00
hannken
dfcc54aa9c Add "FSTRANS=LOCK" and "FSTRANS=UNLOCK" to vop_lock and vop_unlock.
Add two "static inline" functions to vnode_if.c to handle MPSAFE
and FSTRANS before and after the "VCALL()".

Take FSTRANS and handle error before "VCALL(...vop_lock...)" and
release it after "VCALL(...vop_unlock...)".
2017-06-04 07:59:17 +00:00
hannken
8e1cefd98c A vnode is usually called "active", if it has an associated file system
node and a usecount greater zero.  Therefore rename state "VS_ACTIVE"
to "VS_LOADED" and add a new synthetic state "VS_ACTIVE" for VSTATE_ASSERT()
to assert an active vnode.

Add VSTATE_ASSERT_UNLOCKED() to be used with v_interlock unheld and
move the state assertion macros to sys/vnode_impl.h.
2017-06-04 07:58:29 +00:00
chs
ffb3d80455 localcount_init() can't fail because percpu_alloc() can't fail.
remove the check and change the return type to void.
2017-06-02 00:32:12 +00:00
chs
fd34ea77eb remove checks for failure after memory allocation calls that cannot fail:
kmem_alloc() with KM_SLEEP
  kmem_zalloc() with KM_SLEEP
  percpu_alloc()
  pserialize_create()
  psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
2017-06-01 02:45:05 +00:00
chs
1f0e167178 vmem_alloc() with VM_SLEEP cannot fail, so percpu_alloc() cannot fail either. 2017-05-31 23:54:17 +00:00
chs
c85613c074 assert that vmem_alloc() with VM_SLEEP does not fail. 2017-05-31 23:53:30 +00:00
hannken
e4e82d96c7 Restrict vgone() to suspended file systems only.
Welcome to 7.99.75, old file system modules would cause a diagnostic
assertion with new kernel.
2017-05-28 16:39:41 +00:00
hannken
a8045334ce Add a helper to propagate file system suspension for vrevoke().
Take care to retry suspension on interrupt as vrevoke must succeed.
2017-05-28 16:35:47 +00:00
bouyer
6e4cb2b9ab merge the bouyer-socketcan branch to HEAD.
CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.

This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).

This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).

There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.

At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
2017-05-27 21:02:54 +00:00
riastradh
c921bd9b79 Check VOP_INACTIVE contract with a judicious assert. 2017-05-26 14:40:09 +00:00
riastradh
51e152b5ce Clarify comment. 2017-05-26 14:39:20 +00:00
riastradh
93562e3f53 Eliminate crusty debugging sludge.
We have a mostly sane vnode lifecycle now.  If this needs debugging,
it should be done once at the call site of VOP_RECLAIM.
2017-05-26 14:34:19 +00:00
riastradh
f4ad397b3e regen 2017-05-26 14:21:54 +00:00
riastradh
7f7aad09bd Make VOP_RECLAIM do the last unlock of the vnode.
VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
2017-05-26 14:20:59 +00:00
christos
9aa2075330 switch to a switch 2017-05-25 20:42:36 +00:00
pgoyette
3b2df19edf When logging a history record for biowait(), include the return address
as a parameter, to identify to which of the many calls to biowait() the
record refers.
2017-05-25 02:28:07 +00:00
hannken
69174779b1 With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73
2017-05-24 09:53:55 +00:00
hannken
c2c49e1ed2 Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.
2017-05-24 09:52:59 +00:00
pgoyette
cb99404632 Fix a comment - in localcount_fini(), we don't care whether it was the
caller or some other code that drained the localcount;  all we care is
that it has been drained.
2017-05-19 02:20:24 +00:00
pgoyette
a372bceac2 Introduce new localcount(9) reference-count primitives. 2017-05-19 00:01:33 +00:00
hannken
9fc3ca45b3 Suspend file system while revoking a vnode. This way no operations run
on the mounted file system during revoke and all operations see
the state before or after the revoke.
2017-05-17 12:46:14 +00:00
hannken
677cf1d8b4 Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.
2017-05-17 12:45:03 +00:00
christos
f6b964d39b protect against NULL, from PaulG 2017-05-11 23:50:17 +00:00
nat
5e34165f16 Explicitly set the flags instead of masking set values in.
This fixes FNONBLOCK weirdness seen in audio.c

OK christos@ and martin@.
2017-05-11 22:38:56 +00:00
riastradh
9c32900485 regen 2017-05-10 06:19:47 +00:00
riastradh
913618cd04 Forward-declare struct lwp' so we can use struct lwp *' here. 2017-05-10 06:08:56 +00:00
christos
21e6c9452c fp == NULL in the DIAGNOSTIC, so use the real fp and also print the errno. 2017-05-09 21:18:51 +00:00
christos
1e7fb326f1 de-triplicate. 2017-05-07 22:54:54 +00:00
hannken
4f4cfe27b2 Enter fstrans from _vfs_busy() and leave from vfs_unbusy().
Adapt sched_sync() and do_sys_sync().
2017-05-07 08:26:58 +00:00
hannken
01d31ceb6d Return ENOENT if trying to suspend an unmounted file system. 2017-05-07 08:25:54 +00:00
hannken
c18a56f135 Move fstrans initialization to vfs_mountalloc(). 2017-05-07 08:24:20 +00:00
hannken
12ad3b05fd Handle the case where the mount is gone and its mnt_transinfo is NULL. 2017-05-07 08:23:28 +00:00
hannken
853d034c97 Remove now invalid comment. 2017-05-07 08:21:08 +00:00
joerg
4f77b889d0 Extend the mmap(2) interface to allow requesting protections for later
use with mprotect(2), but without enabling them immediately.

Extend the mremap(2) interface to allow duplicating mappings, i.e.
create a second range of virtual addresses references the same physical
pages. Duplicated mappings can have different effective protections.

Adjust PAX mprotect logic to disallow effective protections of W&X, but
allow one mapping W and another X protections. This obsoletes using
temporary files for purposes like JIT.

Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested
and not silently drop the X protection.

Improve test cases to ensure correct operation of the changed
interfaces.
2017-05-06 21:34:51 +00:00
kamil
1627fdf3a4 Set clear comment about EI_OSABI and EI_ABIVERSION
/*
 * NetBSD sets generic SYSV OSABI and ABI version 0
 * Native ELF files are distinguishable with NetBSD specific notes
 */

No functional change.
2017-05-04 11:12:23 +00:00
kamil
ec80600208 Use consistently "bufq_private(bufq)" instead of "bufq->bq_private"
No functional change.
2017-05-04 11:03:27 +00:00
kamil
df97a42593 Correct typo in the comment
No functional change.
2017-05-04 11:01:16 +00:00
kamil
88e477a387 Fix kernel panic triggered with LLDB
PT_SETSTEP and PT_CLEARSTEP in the current design must unlock proc_lock and
t->p_lock. These functions use lwp_delref() for a tracee with more than one
LWP. This function internally lock (t->)p_lock and this is lock against
self.

There are coming new ATF test with PT_*STEP with multiple LWPs to catch
these bugs in future changes.

Sponsored by <The NetBSD Foundation>
2017-05-03 15:53:31 +00:00
pgoyette
48e395b1b8 Introduce mutex_ownable() to determine if it is possible for the current
process to acquire a mutex.
2017-05-01 21:35:25 +00:00
ryo
d9ee24f798 whitespace police 2017-05-01 10:00:43 +00:00
abhinav
39132b9e2d Rearrange the if conditions in order to get rid of unnecessary indentation.
No functional change intended. ok christos@
2017-04-27 16:52:22 +00:00
riastradh
8e5c8dbff1 regen 2017-04-26 03:04:24 +00:00
riastradh
6fa7b15833 Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.
No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
2017-04-26 03:02:47 +00:00
pgoyette
ca22f64915 Add a check to ensure that a new sysctl node was attached in the tree
at the place we expected it to be attached!

As mentioned several times (on tech-kern@ mailing list) over the past
18 months or so, I've seen a few instances where this will trigger,
although I've been unable to reproduce them.  Hopefully some wider
exposure will reveal the under-lying cause of this rare phenomenon.

Commit was proposed on tech-kern list, and no objections raised.
2017-04-25 22:07:10 +00:00
pgoyette
ab5e69493e Use __func__ for routine name in printf() calls. NFC intended. 2017-04-25 08:46:38 +00:00
kamil
795febebbd Try to fix build of sys_lwp.c
lwp_create() has been acquired more arguments, there was missing the latest
one. Per analogiam with changes in the same commit to other source files,
go for &SS_INIT.
2017-04-21 19:38:35 +00:00
christos
d7746f2ee3 - Propagate the signal mask from the ucontext_t to the newly created thread
as specified by _lwp_create(2)
- Reset the signal stack for threads created with _lwp_create(2)
2017-04-21 15:10:34 +00:00
kamil
34e270cb64 Enhance verbosity of debug message for ELF magic mismatch
Print e_ident[EI_MAG3] (it was missed)
Print e_ident[EI_CLASS] as it is used do determine correct ELF magic.

No functional change for non-debug (without option DEBUG_ELF) build.
2017-04-21 13:17:42 +00:00
christos
5d75b0065e simplify. 2017-04-19 15:54:45 +00:00
pgoyette
05aa8c5f12 Be consistent about checking for text section address being 0, and
don't ignore errors by falling through to the next section(s).

As discussed on tech-kern@
2017-04-19 06:19:02 +00:00
christos
6ef342f61a PR/52174: Remove root test, it is too verbose. XXX: need to come up with
something better.
2017-04-18 18:07:29 +00:00
hannken
bd152b56b5 Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer. 2017-04-17 08:34:27 +00:00
hannken
eb8533a8b6 No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?
2017-04-17 08:32:55 +00:00
hannken
20bb034f5b Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
2017-04-17 08:32:00 +00:00
hannken
ebb8f73b4b Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount.  Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
2017-04-17 08:31:01 +00:00
hannken
256581e1f9 Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.
2017-04-17 08:29:58 +00:00
riastradh
629022bd8f regen to confirm no functional change 2017-04-16 17:18:54 +00:00
riastradh
f2ed57297a Count vnode arguments correctly.
Don't count arguments that have WILLRELE/WILLPUT; count arguments
that are struct vnode *.

No functional change currently because it happens that every released
or put vnode argument comes first or after other ones.
2017-04-16 17:18:28 +00:00
riastradh
d08e9ec7c8 regen 2017-04-16 16:49:25 +00:00
riastradh
6f8a4faacd Back out previous.
Breaks file systems for which VOP_UNLOCK doesn't work on a reclaimed
vnode.

The only case in tree right now is sys/fs/union -- most file systems
use genfs_unlock, which does work on a reclaimed vnode.

Maybe we can work around this -- and still enable VOP_RECLAIM's
callees to assert lock ownership -- by having VOP_RECLAIM unlock the
vnode instead.
2017-04-16 16:48:08 +00:00
riastradh
5a3d793f2a regen to confirm no functional change 2017-04-15 23:21:46 +00:00
riastradh
ce1c68db98 Keep vnode locked during VOP_RECLAIM.
No bump because it wouldn't have been possible to acquire the lock in
VOP_RECLAIM anyway -- instant deadlock because vn_lock waits to
transition out of the RECLAIMING state first.  Benefit is that we can
now assert ownership of the lock in any operations called by
VOP_RECLAIM.

Discussed on tech-kern:

https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html
2017-04-15 23:16:53 +00:00
skrll
070497e366 Paranoia... keep vmspace reference while doing pmap_procwr 2017-04-13 07:58:45 +00:00
christos
cd306a0c3c use opt_kmem.h for the KMEM_ variables. 2017-04-12 20:05:54 +00:00
hannken
e08a8c4104 Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.
Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().
2017-04-12 10:35:10 +00:00
hannken
6058fea9b5 Switch veriexec_dump() and veriexec_flush() to mountlist iterator. 2017-04-12 10:30:02 +00:00
hannken
a315c73868 Switch do_sys_sync() and do_sys_getvfsstat() to mountlist iterator. 2017-04-12 10:28:39 +00:00
hannken
3137e0cee1 Switch vfs_vnode_lock_print() and printlockedvnodes() to _mountlist_next().
Switch sched_sync() and sysctl_kern_vnode() to mountlist iterator.
2017-04-12 10:26:33 +00:00
hannken
5ff843c227 Switch fstrans_dump() to _mountlist_next(). 2017-04-12 10:23:35 +00:00
christos
d8c52c37b1 use a different root vnode variable to appease the rump gods. 2017-04-11 21:15:57 +00:00
riastradh
6d3ccf9762 Simplify: eliminate a now-needless unlock/lock cycle. 2017-04-11 14:45:46 +00:00
christos
b23251f1fa return EPERM like the other failures. 2017-04-11 14:37:07 +00:00
christos
e85d5cbc14 Don't try to autoload modules before root is mounted. 2017-04-11 14:31:55 +00:00
riastradh
b7fb52a55b regen to confirm no functional change 2017-04-11 14:30:33 +00:00
riastradh
d20cc14aa7 Eliminate now-unused WILLUNLOCK vop flag. 2017-04-11 14:29:32 +00:00
riastradh
2b4f5f70bd regen 2017-04-11 14:26:13 +00:00
riastradh
87fb32292e Make VOP_INACTIVE preserve vnode lock on return.
Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
2017-04-11 14:24:59 +00:00
hannken
2f4fa4f94f Add an iterator over the currently mounted file systems.
Ride 7.99.68
2017-04-11 07:46:37 +00:00
jdolecek
6ef596151b rename allow_fuadpo to allow_dpofua, so it's the same order as the SCSI flag 2017-04-10 21:36:05 +00:00
jdolecek
75f6d4fd1a improve performance of journal writes by parallelizing the I/O - use 4 bufs
by default, add sysctl vfs.wapbl.journal_iobufs to control it

this also removes need to allocate iobuf during commit, so it
might help to avoid deadlock during memory shortages like PR kern/47030
2017-04-10 21:34:37 +00:00
jdolecek
946ca69f6d change b_wapbllist to TAILQ, to preserve the LRU order 2017-04-10 19:52:38 +00:00
kamil
05ffc73c35 Add new ptrace(2) API: PT_SETSTEP & PT_CLEARSTEP
These operations allow to mark thread as a single-stepping one.

This allows to i.a.:
 - single step and emit a signal (PT_SETSTEP & PT_CONTINUE)
 - single step and trace syscall entry and exit (PT_SETSTEP & PT_SYSCALL)

The former is useful for debuggers like GDB or LLDB. The latter can be used
to singlestep a usermode kernel. These examples don't limit use-cases of
this interface.

Define PT_*STEP only for platforms defining PT_STEP.

Add new ATF tests setstep[1234].

These ptrace(2) operations first appeared in FreeBSD.

Sponsored by <The NetBSD Foundation>
2017-04-08 00:25:49 +00:00
jdolecek
046f6d9783 optionally use FUA instead of full cache sync, and DPO for journal writes,
when supported by disk device; controlled by sysctl vfs.wapbl.allow_fuadpo,
default off for now

discussed on tech-kern
2017-04-05 20:38:53 +00:00
jdolecek
6801660c77 expose disk device FUA/DPO support via DIOCGCACHE, and allow the flags
to be set for I/O; implement support in sd(4) and nvme(4)

discussed on tech-kern
2017-04-05 20:15:49 +00:00
skrll
bdf6985b50 spaces to tab 2017-03-31 08:50:54 +00:00
martin
1fd4f01ae0 PR kern/52117: move stop code for debuged children after fork into MI code.
XXX we might want to revisit this when handling the same event for vfork
better.
2017-03-31 08:47:04 +00:00
msaitoh
eabd5e1de9 Remove extra 0x. This bug was added when replacing bitmask_snprintf(9) with
snprintb(3) (in between NetBSD 5 and 6). Old bitmask_snprint(9) didn't add
0x" automatically for hexadecimal value, so old code used it with "0x%s".
2017-03-31 08:38:13 +00:00
msaitoh
913e06bbd4 Remove extra 0x in m_print(). 2017-03-31 05:44:05 +00:00
christos
6e0bd5329a factor out getauxv code. 2017-03-30 20:17:11 +00:00
hannken
d0dc55acf0 Locking a layer vnode is racy as it may become reclaimed before
calling the operation on the lower vnode.

Replace vi_lock with a rw_obj and change layered file systems
to share the lock with the lower vnode.

Layered file systems now use genfs_lock()/_unlock/_islocked().

Welcome to 7.99.67
2017-03-30 09:16:52 +00:00
hannken
799c5cfefa Change the operations vector before changing the mount.
Vnode operations enter the mount before using the vector.
2017-03-30 09:15:51 +00:00
hannken
1a31dbf3eb Change vrelel() to defer the test for a reclaimed vnode until
we hold both the interlock and the vnode lock.

Add a common operation to deallocate a vnode in state LOADING.
2017-03-30 09:14:59 +00:00
hannken
cf9ded4af4 Add flag VRELEL_FORCE_RELE to vrelel() to force release and
use it from vdrain_vrele() and vrele_flush() to prevent a
possible live lock from vrele_flush().
2017-03-30 09:14:08 +00:00
hannken
a644d1ecc8 Lock the vnode before changing its writecount. 2017-03-30 09:13:37 +00:00
hannken
dd67c605a3 Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.
2017-03-30 09:13:01 +00:00
hannken
3332a1029e Change last users of FSTRANS_LAZY to FSTRANS_SHARED and change
genfs_suspendctl() to move from FSTRANS_NORMAL to FSTRANS_SUSPENDED
and vice versa.
2017-03-30 09:12:21 +00:00
kamil
7c54169f6c Revert previous.
Pointed out by Christous Zoulas that ELF_AUX_ENTRIES * sizeof(AuxInfo)
assumption is incomplete. There is emulation code that can use different
values (smaller and larger).
2017-03-29 22:48:03 +00:00
kamil
1b2bf0e7ae Generate ELF AUXV for core(5) and ptrace(2) limited to the vector TYPE x V
Previously PT_DUMPCORE and PIOD_READ_AUXV and regular core dumping retrieved
the vector of AuxInfo {a_type, a_v} + MAXPATHLEN + ALIGN(1).

The extra data is not actually needed in the returned chunk. It can be
retrieved with PT_READ_I operations and it's the preferred way to access
them as the AuxInfo fields contain pointers (void* format) to them.

This changes the behavior of the kernel, no stable releases are affected
with this move. Current software is not affected as other systems already
stop generating data on AT_NULL. This streamlines the NetBSD behavior with
other ELF format OSes. This move also simplifies determination if we got
all the needed data inside the debugger and we no longer need to eliminate
the unneeded chunk at the end.

Sponsored by <The NetBSD Foundation>
2017-03-29 19:52:30 +00:00
pgoyette
af651997d7 Add new sysctl variable proc.curproc.paxflags so a process can determine
which flags were set for it.  Define some values for the variable:

	CTL_PROC_PAXFLAGS_{ASLR,MPROTECT,GUARD}
2017-03-24 21:43:20 +00:00
christos
5915ff2ca3 Instead of copying parts of sigswitch to process_stoptrace, use it directly.
Rename process_stoptrace -> proc_stoptrace and put it in kern_sig.c so we
don't need to expose any more functions from it.
2017-03-24 17:40:44 +00:00
christos
02a58fe84b kern/5201{2,8,9}: Fix PT_SYSCALL stopping.
1. Supply the siginfo we expect TRAP_SC{E,X} to process_stoptrace() and set it.
2. Change the second argument of proc_stop from notify, to now meaning that
   we want to stop right now. Wait in process_stoptrace until that has happened.
3. While here, fix the locking order in process_stoptrace().
2017-03-23 21:59:54 +00:00
skrll
f2b6c11aeb Reduce #ifdefs 2017-03-22 22:11:47 +00:00
skrll
fa90529ca0 Use brelsel while the bufcache_lock is held rather than dropping it
and re-taking / dropping it in brelse
2017-03-21 10:46:49 +00:00
riastradh
27568595fc #if DIAGNOSTIC panic ---> KASSERT; __diagused police 2017-03-20 01:24:06 +00:00
riastradh
fe6928ab88 Gather alldevs into a cacheline-aligned struct. 2017-03-20 01:13:07 +00:00
riastradh
8a00741dd2 Omit needless volatile qualifiers.
All these variables are used exclusively with alldevs_mtx held, not
atomics.  No need for volatile voodoo here.
2017-03-20 01:06:29 +00:00
riastradh
1d7e0698c7 Assert ownership of alldevs_mtx, as required for config_makeroom.
The one caller in config_unit_alloc guarantees ownership, via
config_alldevs_enter and preserved by config_makeroom.
2017-03-20 01:05:03 +00:00
riastradh
f7e6ff90f6 Make sure we hold alldevs_mtx for access to alldevs in deviter.
- Extend alldevs_mtx section in deviter_init.
- Assert ownership of alldevs_mtx in private functions:
  . deviter_reinit
  . deviter_next1
  . deviter_next2
- Acquire alldevs_mtx in deviter_next.

(alldevs_mtx is not relevant to the struct deviter object, which is
private to the caller who must guarantee exclusive access to it.)
2017-03-20 00:30:03 +00:00
riastradh
4829d818c5 Summarize lifetime of cache entries. 2017-03-18 22:36:56 +00:00
riastradh
6eda01c527 Omit duplicate forward declaration of cache_invalidate. 2017-03-18 22:04:52 +00:00
riastradh
e156548b77 Fix lock order statement. Annotate with references to examples. 2017-03-18 22:02:11 +00:00
riastradh
026f3c0e6d Rework namecache locking commentary.
- Annotate struct namecache declaration in namei.src/namei.h.
- Bulleted lists, not walls of texts.
- Identify lock order too.
- Note subtle exceptions.
2017-03-18 21:03:28 +00:00
riastradh
5434f6e180 Nix trailing whitespace. 2017-03-18 20:01:44 +00:00
riastradh
db0eb34479 Sort #includes. 2017-03-18 20:00:10 +00:00
riastradh
f1f6969b4f Omit vestigial comment.
- We have not dropped the cache entry on vget failure since 2008.
- We have not had `generation numbers' since 2001.
2017-03-18 19:59:20 +00:00
riastradh
67ff6e1480 Make cache_lookup return bool for clarity. 2017-03-18 19:43:31 +00:00
riastradh
2a3b9d8872 Need membar_datadep_consumer here. 2017-03-18 05:49:56 +00:00
riastradh
068914dcb9 Nix trailing whitespace. 2017-03-18 05:45:48 +00:00
riastradh
29079373a1 Back out part of previous: missed a caller of wapbl_write_inodes. 2017-03-17 03:19:46 +00:00
riastradh
ba1ca8c6fb Nix trailing whitespace. 2017-03-17 03:17:07 +00:00
riastradh
eafae67faa Sort includes. 2017-03-17 03:16:29 +00:00
riastradh
76681fe093 Assert write lock in wapbl_write_revocations, wapbl_write_inodes.
Only one call site, so trivial to prove correct.
2017-03-17 03:06:17 +00:00
chs
877a3ccf7c allow pcu_save() and pcu_discard() to be called on other threads,
ptrace needs to use it that way.
2017-03-16 16:13:19 +00:00
ozaki-r
0eaf4e5356 Use if_acquire and if_release instead of using psref API directly
- Provide if_release for consistency to if_acquire
- Use if_acquire and if_release for ifp iterations
- Make ifnet_psref_class static
2017-03-14 09:03:08 +00:00
riastradh
6a970e990b #if DIAGNOSTIC panic ---> KASSERT
- Omit mutex_exit before panic.  No need.
- Sprinkle some more information into a few messages.
- Prefer __diagused over #if DIAGNOSTIC for declarations,
  to reduce conditionals.

ok mrg@
2017-03-14 03:13:50 +00:00
hannken
4ac834d923 Fix a logic error introduced with Rev. 1.507: defer setting MNT_RDONLY
only if going from read-write to read-only.

Should fix PR kern/52045 (panic: ffs_sync: rofs mod, fs=/ after fsck)
2017-03-07 11:54:16 +00:00
hannken
6e4272e4af Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.
2017-03-06 10:11:21 +00:00
hannken
0f10eb2124 Deny unmounting file systems below layered file systems. 2017-03-06 10:10:43 +00:00
hannken
6caedad35c Change vrecycle() and vgone() to lock with LK_RETRY. If this node is
a layerfs node the lower node(s) may already be reclaimed.
2017-03-06 10:07:52 +00:00
mlelstv
ba576b71a7 Enhance disk metrics by calculating a weighted sum that is incremented
by the number of concurrent I/O requests. Also introduce a new disk_wait()
function to measure requests waiting in a bufq.
iostat -y now reports data about waiting and active requests.

So far only drivers using dksubr and dk, ccd, wd and xbd collect data about
waiting requests.
2017-03-05 23:07:12 +00:00
mrg
e99ba17226 add missing sys/evcnt.h include. 2017-03-05 20:45:49 +00:00
jdolecek
54d8d0371a add some event counters, for commits, writes, cache flush 2017-03-05 13:57:29 +00:00
hannken
a57a3961af Add an operation to test a mount for fstrans support and use it for
_fstrans_start(), fstrans_done(), fstrans_is_owner(), vfs_suspend()
and vfs_resume().

Test for fstrans support before ASSERT_SLEEPABLE().
2017-03-02 10:41:27 +00:00
hannken
b038eea5e1 Suspend the mounted file system while updating. 2017-03-01 10:45:24 +00:00
hannken
90ead62d2f Change the protocol to update a mounted file system from read-write
to read-only and vice versa:

- Add an internal flag IMNT_WANTRDONLY.
- Set either IMNT_WANTRDWR or IMNT_WANTRDONLY if going from or to read-only.
- After successfull call to VFS_MOUNT() set or clear MNT_RDONLY.

Adapt tmpfs and rumpfs to the new protocol.  Other file systems will be
updated when they get the IMNT_CAN_RWTORO property.

Welcome to 7.99.64
2017-03-01 10:44:47 +00:00
hannken
0d6fbaf0a0 Must always lock the parent -> lock the child -> unlock the parent. 2017-03-01 10:43:37 +00:00
jakllsch
aa28e4fbed pi_bsize must be at least pi_secsize
Allows block device accesses to 4KiB logical sector disks to function on the
vast majority of ports with 2KiB BLKDEV_IOSIZE.
2017-02-28 00:33:36 +00:00
hannken
d0ef892c64 Test for fstrans support before trying to allocate per-thread info.
PR kern/51996 (kmem_alloc called from intr context in fstrans_get_lwp_info)
2017-02-23 11:23:22 +00:00
kamil
5c4cff4517 Fix build of ports without PT_STEP
Fallout after PT_*DBREGS introduction.

Sponsored by <The NetBSD Foundation>
2017-02-23 04:48:36 +00:00
kamil
988eb7ed71 Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64
This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
 - exec() (TRAP_EXEC event) must remove debug registers from LWP
 - debug registers are only per-LWP, not per-process globally
 - debug registers must not be inherited after (v)forking a process
 - debug registers must not be inherited after forking a thread
 - a debugger is responsible to set global watchpoints/breakpoints with the
   debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
   monitoring function is designed to be used
 - debug register traps must generate SIGTRAP with si_code TRAP_DBREG
 - debugger is responsible to retrieve debug register state to distinguish
   the exact debug register trap (DR6 is Status Register on x86)
 - kernel must not remove debug register traps after triggering a trap event
   a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
   call (DR7 is Control Register on x86)
 - debug registers must not be exposed in mcontext
 - userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
 - the initial state of debug register is retrieved on boot and this value is
   stored in a local copy (initdbregs), this value is used to initialize dbreg
   context after PT_GETDBREGS
 - struct dbregs is stored in pcb as a pointer and by default not initialized
 - reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
 - restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig	2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

 #ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define	DBREG_DRX(d,x)	((d)->dr[(x)])
+#endif
+
 static unsigned long
 amd64bsd_dr_get (ptid_t ptid, int regnum)
 {


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
  (gdb) c
  Continuing.

  Watchpoint 2: traceme

  Old value = 0
  New value = 16
  main (argc=1, argv=0x7f7fff79fe30) at test.c:8
  8               printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>
2017-02-23 03:34:22 +00:00
kamil
fb0af2ac33 Improve PT_SET_SIGMASK and PT_GET_SIGMASK API in ptrace(2)
Use proper check for LW_SYSTEM, don't depend on PT_GETREGS/PT_SETREGS.
Don't allow to mask SA_CANTMASK signals with PT_SET_SIGMASK (this covers
SIGSTOP and SIGKILL).

Add new ATF tests:
 - setsigmask5
   Verify that sigmask cannot be set to SIGKILL

 - setsigmask6
   Verify that sigmask cannot be set to SIGSTOP

Sponsored by <The NetBSD Foundation>
2017-02-23 00:50:09 +00:00
kamil
f9b2093d06 Introduce new ptrace(2) API to allow/prevent exection of LWP
Introduce new API for debuggers to allow/prevent execution of the specified
thread.

New ptrace(2) operations:

     PT_RESUME     Allow execution of a specified thread, change its state
                   from suspended to continued.  The addr argument is unused.
                   The data argument specifies the LWP ID.

                   This call is equivalent to _lwp_continue(2) called by a
                   traced process.  This call does not change the general
                   process state from stopped to continued.

     PT_SUSPEND    Prevent execution of a specified thread, change its state
                   from continued to suspended.  The addr argument is unused.
                   The data argument specifies the requested LWP ID.

                   This call is equivalent to _lwp_suspend(2) called by a
                   traced process.  This call does not change the general
                   process state from continued to stopped.

This interface is modeled after FreeBSD, however with NetBSD specific arguments
passed to ptrace(2) -- FreeBSD passes only thread id, NetBSD passes process and
thread id.

Extend PT_LWPINFO operation in ptrace(2) to report suspended threads. In the
ptrace_lwpinfo structure in pl_event next to PL_EVENT_NONE and PL_EVENT_SIGNAL
add new value PL_EVENT_SUSPENDED.

Add new errno(2) value EDEADLK that might be returned by ptrace(2). It prevents
dead-locking in a scenario of resuming a process or thread that is prevented
from execution. This fixes bug that old API was vulnerable to this scenario.

Kernel bump delayed till introduction of PT_GETDBREGS/PT_SETDBREGS soon.

Add new ATF tests:
 - resume1
   Verify that a thread can be suspended by a debugger and later
   resumed by the debugger

 - suspend1
   Verify that a thread can be suspended by a debugger and later
   resumed by a tracee

 - suspend2
   Verify that the while the only thread within a process is
   suspended, the whole process cannot be unstopped

Sponsored by <The NetBSD Foundation>
2017-02-22 23:43:43 +00:00
hannken
a378d58ecb Enable fstrans on all file systems.
Welcome to 7.99.61
2017-02-22 09:50:13 +00:00
hannken
8c2ff4e99d Regen. 2017-02-22 09:47:18 +00:00
hannken
99694efaee Prepare to move fstrans into vnode_if.c, allow "FSTRANS=YES"
and "FSTRANS=NO" in the vop description.
Add fstrans_start()/fstrans_done() to all vops that have FSTRANS=YES
or have the first vnode unlocked.
2017-02-22 09:45:51 +00:00
rin
ede747a0c4 PR kern/51208
Add DISKLABEL_EI (``Endian-Independent'' disklabel) kernel option to machines
that support Master Boot Record (MBR)
2017-02-19 07:43:42 +00:00
chs
006dc29ca6 obey the executable's ELF alignment constraints for PIE.
this fixes gdb of PIE binaries on mac68k (and other platforms
which use an ELF alignment that is larger than PAGE_SIZE).
2017-02-18 01:29:09 +00:00
hannken
7599fb1f37 Bring back vrele_flush() to flush deferred vrele() o an suspended file system. 2017-02-17 08:30:00 +00:00
hannken
4f18a321ca Make sure vcache_reclaim() will complete before file system suspension. 2017-02-17 08:27:58 +00:00
hannken
90afdff6e3 Take fstrans_start before syncing a file system. 2017-02-17 08:26:07 +00:00
hannken
a863cd745e Let syncer try fstrans_start() before running VFS_SYNC() to get rid
of the syncer lock/unlock from vfs_suspend().
2017-02-17 08:25:15 +00:00
hannken
b62f0c07fe Protect attaching and detaching lwp_info to mount with a mutex. 2017-02-17 08:24:07 +00:00
zafer
6914423cda fix number of arguments of kmem_alloc and kmem_zalloc macro. ok skrll. 2017-02-13 16:53:41 +00:00
uwe
1159401280 netbsd_elf_signature - look at note segments (phdrs) not note
sections.  They point to the same data in the file, but sections are
for linkers and are not necessarily present in an executable.

The original switch from phdrs to shdrs seems to be just a cop-out to
avoid parsing multiple notes per segment, which doesn't really avoid
the problem b/c sections also can contain multiple notes.
2017-02-12 21:52:46 +00:00
maxv
8fdaa9399d Add a KASSERT, otherwise it looks like a NULL deref; from Mootja. 2017-02-12 18:43:56 +00:00
kamil
61aff29627 Introduce new interface in ptrace(2) - PT_GET_SIGMASK and PT_SET_SIGMASK
Add new interface to add ability to get/set signal mask of a tracee.
It has been inspired by Linux PTRACE_GETSIGMASK and PTRACE_SETSIGMASK, but
adapted for NetBSD API.

This interface is used for checkpointing software to set/restore context
of a process including signal mask like criu or just to track this property
in reverse-execution software like Record and Replay Framework (rr).


Add new ATF tests for this interface
====================================
getsigmask1:
    Verify that plain PT_SET_SIGMASK can be called

getsigmask2:
    Verify that PT_SET_SIGMASK reports correct mask from tracee

setsigmask1:
    Verify that plain PT_SET_SIGMASK can be called with empty mask

setsigmask2:
    Verify that sigmask is preserved between PT_GET_SIGMASK and
    PT_SET_SIGMASK

setsigmask3:
    Verify that sigmask is preserved between PT_GET_SIGMASK, process
    resumed and PT_SET_SIGMASK

setsigmask4:
    Verify that new sigmask is visible in tracee


Kernel ABI bump delayed as there are more interfaces to come in ptrace(2).

Sponsored by <The NetBSD Foundation>
2017-02-12 06:09:52 +00:00
kamil
9a6383f067 Be paranoid about PT_SET_SIGINFO and PT_GET_SIGINFO in ptrace(2)
Currently a tracer is prohibited to read and write memory of a tracee.
Prohibit reading and faking signal information.

Sponsored by <The NetBSD Foundation>
2017-02-11 19:32:41 +00:00
christos
b4abffbdeb expose sendmsg_so and recvmsg_so. 2017-02-03 16:06:45 +00:00
christos
8c06ed2feb expose copyout_sockname_sb 2017-02-02 15:37:42 +00:00
maya
1aa8013394 restore r1.118 2017-02-01 01:51:07 +00:00
christos
a4ac56487b We need to define COMPAT_NETBSD32 before we include other files;
otherwise things like ucontext32_t will be missing.
2017-01-28 16:43:59 +00:00
hannken
748bb65685 Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references.  On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.
2017-01-27 10:50:10 +00:00
hannken
8e09b56de2 When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.
2017-01-27 10:46:18 +00:00
christos
eb4e2ff6fe rump does not have ucontext32_t 2017-01-27 03:53:01 +00:00
christos
914b3cbf1a use __HAVE_COMPAT_NETBSD32 2017-01-26 15:54:31 +00:00
martin
ee2ea00dd5 Restrict the forcing of COMPAT_NETBSD32 to _LP64 kernels - this is probably
not the right thing to do, but unbreaks the build for now.
2017-01-26 08:09:27 +00:00
martin
d57a70bd13 No COMPAT_NETBSD32 for rump 2017-01-26 07:54:05 +00:00
christos
9be065fb89 For LOCKDEBUG:
Always provide the location of the caller of the lock as __func__, __LINE__.
2017-01-26 04:11:56 +00:00
christos
655a10972a always compile in the COMPAT32 code; it is tiny and if we don't it breaks
the modules.
2017-01-26 03:54:54 +00:00
christos
46149e83cc don't return early holding a lock! 2017-01-26 03:54:01 +00:00
christos
4705defbf3 es_arglen is already in bytes... 2017-01-25 17:57:14 +00:00
christos
44c43f62df The argument length is in bytes; don't use howmany() 2017-01-25 17:56:45 +00:00
christos
908d408b7e PR/51916: Kamil Rytarowski: Don't multiply es_arglen with ptrsz since it is
already in bytes and contains the maximum possible size:
	ELF_AUX_ENTRIES * sizeof(auxv) + MAXPATHLEN + ALIGN
2017-01-25 17:55:47 +00:00
skrll
c8226a8b4f Fix build 2017-01-20 09:45:13 +00:00
skrll
fd0caf00f0 Simplify getiobuf. buf_init already does bp->b_objlock == &buffer_lock 2017-01-20 08:16:31 +00:00
ryo
28df50d7bb Make pfil(9) MP-safe (applying psref(9)) 2017-01-16 09:28:40 +00:00
christos
7a2c2e75f4 put linux_handler_t in the right place. 2017-01-15 17:00:59 +00:00
christos
0dec36ce08 need intptr_t cast for linux_handler_t 2017-01-15 15:18:52 +00:00
maya
f6be953d31 use a bound string copy 2017-01-15 01:47:24 +00:00
maya
8341f84221 use a bound string copy 2017-01-15 01:28:14 +00:00
kamil
c52f1ed048 Fix generation of PTRACE_LWP_EXIT event
Set p_lwp_exited instead of p_lwp_created for PTRACE_LWP_EXIT.

This made the lwp_exit1 ATF test passing.

Sponsored by <The NetBSD Foundation>
2017-01-14 19:32:10 +00:00
kamil
6413a1acf0 Introduce PTRACE_LWP_{CREATE,EXIT} in ptrace(2) and TRAP_LWP in siginfo(5)
Add interface in ptrace(2) to track thread (LWP) events:
 - birth,
 - termination.

The purpose of this thread is to keep track of the current thread state in
a tracee and apply e.g. per-thread designed hardware assisted watchpoints.

This interface reuses the EVENT_MASK and PROCESS_STATE interface, and
shares it with PTRACE_FORK, PTRACE_VFORK and PTRACE_VFORK_DONE.

Change the following structure:

typedef struct ptrace_state {
        int     pe_report_event;
        pid_t   pe_other_pid;
} ptrace_state_t;

to

typedef struct ptrace_state {
        int     pe_report_event;
        union {
                pid_t   _pe_other_pid;
                lwpid_t _pe_lwp;
        } _option;
} ptrace_state_t;

#define pe_other_pid    _option._pe_other_pid
#define pe_lwp          _option._pe_lwp

This keeps size of ptrace_state_t unchanged as both pid_t and lwpid_t are
defined as int32_t-like integer. This change does not break existing
prebuilt software and has minimal effect on necessity for source-code
changes. In summary, this change should be binary compatible and shouldn't
break build of existing software.


Introduce new siginfo(5) type for LWP events under the SIGTRAP signal:
TRAP_LWP. This change will help debuggers to distinguish exact source of
SIGTRAP.


Add two basic t_ptrace_wait* tests:
lwp_create1:
    Verify that 1 LWP creation is intercepted by ptrace(2) with
    EVENT_MASK set to PTRACE_LWP_CREATE

lwp_exit1:
    Verify that 1 LWP creation is intercepted by ptrace(2) with
    EVENT_MASK set to PTRACE_LWP_EXIT

All tests are passing.


Surfing the previous kernel ABI bump to 7.99.59 for PTRACE_VFORK{,_DONE}.

Sponsored by <The NetBSD Foundation>
2017-01-14 06:36:52 +00:00
kamil
0e96af0f53 Add support for PTRACE_VFORK_DONE and stub for PTRACE_VFORK in ptrace(2)
PTRACE_VFORK is supposed to be used to track vfork(2)-like events, when
parent gives birth to new process child and stops till it exits or calls
exec().
Currently PTRACE_VFORK is a stub.

PTRACE_VFORK_DONE is notification to notify a debugger that a parent has
resumed after vfork(2)-like action.
PTRACE_VFORK_DONE throws SIGTRAP with TRAP_CHLD.

Sponsored by <The NetBSD Foundation>
2017-01-13 23:00:35 +00:00
hannken
cfa69dcf1b Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().
2017-01-13 10:10:32 +00:00
christos
d8dfcd6c2a regen 2017-01-13 06:18:31 +00:00
christos
a1a8fc3617 const police! 2017-01-13 06:11:27 +00:00
hannken
0365dd0e1a Adapt to the recent vnode changes. 2017-01-11 14:52:02 +00:00
joerg
6ff696c6b4 Add ddb command to find a vnode by the address of its lock.
This makes it much easier to convert lockstat traces into understandable
data.
2017-01-11 12:17:34 +00:00
hannken
e2f2c94b67 Move vnode member v_lock as vi_lock to vnode_impl.h. 2017-01-11 09:08:58 +00:00
hannken
dcc198a3f8 Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.
Add an ugly hack so pstat.c may still traverse the list.
2017-01-11 09:07:57 +00:00
hannken
6e1af6b1d7 Move vnode members v_synclist_slot and v_synclist as vi_synclist_slot and
vi_synclist to vnode_impl.h.
2017-01-11 09:06:57 +00:00
hannken
2b4a4af133 Move vnode members v_dnclist and v_nclist as vi_dnclist and
vi_nclist to vnode_impl.h.
2017-01-11 09:04:37 +00:00
pgoyette
4869ce0a43 Use membar_{producer,consumer}() to ensure proper access to the "ready"
flag.
2017-01-10 22:08:14 +00:00
pgoyette
5a30768de5 Rework the sysctl initialization to avoid creating new nodes from
within the helper function.  This should avoid the "locking against
myself" error reported earlier.
2017-01-10 00:50:57 +00:00
kamil
687ff8a6ad Introduce new si_code for SIGTRAP: TRAP_CHLD - process child trap
The SIGTRAP signal is thrown from the kernel if EVENT_MASK (ptrace_event)
enables PTRACE_FORK. This new si_code helps debuggers to distinguish the
exact source of signal delivered for a debugger.

Another purpose of TRAP_CHLD is to retain the same behavior inside the
NetBSD kernel for process child traps and have an interface to monitor it.

Retrieving exact event and extended properties of process child trap is
available with PT_GET_PROCESS_STATE.

There is no behavior change for existing software.

This si_code value is NetBSD extension.

Sponsored by <The NetBSD Foundation>
2017-01-10 00:48:37 +00:00
christos
69f0023338 If we had an error, don't do the debug checks because they will most certainly
fail and we'll panic.
2017-01-09 14:25:52 +00:00
kamil
e6f79d077f Cleanup dead code after revert of racy vfork(2) commit
This removes dead code introduced with the following commit:

date: 2012-07-27 22:52:49 +0200;  author: christos;  state: Exp;  lines: +8 -2;
revert racy vfork() parent-blocking-before-child-execs-or-exits code.
ok rmind
2017-01-09 00:31:30 +00:00