Commit Graph

9589 Commits

Author SHA1 Message Date
hannken
6e1af6b1d7 Move vnode members v_synclist_slot and v_synclist as vi_synclist_slot and
vi_synclist to vnode_impl.h.
2017-01-11 09:06:57 +00:00
hannken
2b4a4af133 Move vnode members v_dnclist and v_nclist as vi_dnclist and
vi_nclist to vnode_impl.h.
2017-01-11 09:04:37 +00:00
pgoyette
4869ce0a43 Use membar_{producer,consumer}() to ensure proper access to the "ready"
flag.
2017-01-10 22:08:14 +00:00
pgoyette
5a30768de5 Rework the sysctl initialization to avoid creating new nodes from
within the helper function.  This should avoid the "locking against
myself" error reported earlier.
2017-01-10 00:50:57 +00:00
kamil
687ff8a6ad Introduce new si_code for SIGTRAP: TRAP_CHLD - process child trap
The SIGTRAP signal is thrown from the kernel if EVENT_MASK (ptrace_event)
enables PTRACE_FORK. This new si_code helps debuggers to distinguish the
exact source of signal delivered for a debugger.

Another purpose of TRAP_CHLD is to retain the same behavior inside the
NetBSD kernel for process child traps and have an interface to monitor it.

Retrieving exact event and extended properties of process child trap is
available with PT_GET_PROCESS_STATE.

There is no behavior change for existing software.

This si_code value is NetBSD extension.

Sponsored by <The NetBSD Foundation>
2017-01-10 00:48:37 +00:00
christos
69f0023338 If we had an error, don't do the debug checks because they will most certainly
fail and we'll panic.
2017-01-09 14:25:52 +00:00
kamil
e6f79d077f Cleanup dead code after revert of racy vfork(2) commit
This removes dead code introduced with the following commit:

date: 2012-07-27 22:52:49 +0200;  author: christos;  state: Exp;  lines: +8 -2;
revert racy vfork() parent-blocking-before-child-execs-or-exits code.
ok rmind
2017-01-09 00:31:30 +00:00
christos
f896811791 fix build without ddb. 2017-01-08 19:49:25 +00:00
kamil
e4281b2073 Introduce new ptrace(2) interface: PT_SET_SIGINFO and PT_GET_SIGINFO
This interface is designed to read signal information emited to tracee and
fake this signal with new value.

This functionality is required to distinguish types of events that occured
in the tracee and intercepted by a debugger.

These accessors introduce a new structure type ptrace_siginfo:
/*
 * Signal Information structure
 */
typedef struct ptrace_siginfo {
       siginfo_t       psi_siginfo;    /* signal information structure */
       lwpid_t         psi_lwpid;      /* destination LWP of the signal
                                        * value 0 means the whole process
                                        * (route signal to all LWPs) */
} ptrace_siginfo_t;

Include <sys/siginfo.h> in <sys/ptrace.h> in order to not break existing
software due to unknown symbol siginfo_t.

This interface has been proposed to the tech-kern@ mailing list.

Sponsored by <The NetBSD Foundation>
2017-01-06 22:53:17 +00:00
kamil
239e90be56 Introduce new SIGTRAP code: TRAP_EXEC
On exec() events under a debugger generate the SIGTRAP signal with
TRAP_EXEC property. This allows tracer to distinguish exec() events easily.

Sponsored by <The NetBSD Foundation>
2017-01-06 22:42:58 +00:00
pgoyette
a62101788a Use the new magic BINTIME_SCALE_* macros instead of magic numbers.
No functional change.
2017-01-05 23:29:14 +00:00
hannken
592be9ae45 Name all "vnode_impl_t" variables "vip".
No functional change.
2017-01-05 10:05:11 +00:00
pgoyette
c9b6361b98 By popular demand, update kernhist to use bintime(9) as the basis for
its timestamps.

As this changes storage structures for data passed between kernel and
userland, welcome to 7.99.55!

XXX Output routines still use microsecond resolution when printf()ing.

XXX Possible future feature would be addition of option to use
XXX getbintime(9) for less time-critical histories.
2017-01-05 03:40:33 +00:00
pgoyette
c42fba4183 Actually initialize the sysctl stuff for kernhist! Missed this file
in earlier commits.
2017-01-05 03:22:20 +00:00
hannken
78a3dd75dc Expand struct vcache to individual variables (vcache.* -> vcache_*).
No functional change.
2017-01-04 17:13:50 +00:00
pgoyette
5a0b3ff699 Rearrange the sysctl export structure for better alignment. 2017-01-04 01:05:58 +00:00
hannken
8b7bed0d14 Now that v_usecount tracks valid references add some "v_usecount == 1"
assertions.
2017-01-02 10:36:58 +00:00
hannken
e0f81f2c02 Change vcache_*vget() to increment v_usecount on success only.
Increment v_holdcnt to prevent the vnode from disappearing while
vcache_vget() waits for a stable state.

Now v_usecount tracks the number of successfull references.
2017-01-02 10:35:00 +00:00
hannken
998709c439 Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54
2017-01-02 10:33:28 +00:00
pgoyette
c2efd8c96e Provide a sysctl method of exporting the kernel history data.
XXX vmstat will be update soon to use the sysctl rather than grovelling
XXX through kvm.
2017-01-01 23:58:47 +00:00
pgoyette
c129bbe940 Remove some extraneous whitespace 2016-12-28 06:25:40 +00:00
hannken
3b04d6a086 It is wrong to block the vnode during vcache_rekey. The vnode may be looked
up using the old key until vcache_rekey_exit changes the key to the new one.

Add an assertion that the temporary key is different from the current one.
2016-12-27 11:59:36 +00:00
maya
441aa9cf25 Revert previous commit (to r1.117)
Superfluous warnings in simple userland programs is not a valid reason to
break a security model.
2016-12-27 09:34:44 +00:00
pgoyette
ee1d5b993e Decouple BIOHIST from other users of KERNHIST. 2016-12-27 04:12:34 +00:00
pgoyette
d05a55c879 #include giohist.h from proper location 2016-12-26 23:49:53 +00:00
pgoyette
6a7e4606d5 Fix locking so we don't release the lock between the time we check the
tailq (for being non-empty) and the time we remove an entry.
2016-12-26 23:15:15 +00:00
pgoyette
7f0851cee1 Add a BIOHIST option. As mentioned on tech-kern. 2016-12-26 23:12:33 +00:00
mlelstv
46f58a90c6 When balancing threads over multiple CPUs, use fixpoint arithmetic
for averages. Otherwise the decisions can be heavily biased by rounding
errors.

Add sysctl kern.sched_average_weight to change the weight of
historical data, the default is 50%.
2016-12-22 14:11:58 +00:00
hannken
0d2ece78cb Restructure vdrain_vrele(). While it is not possible for another thread
to lock this vnodes v_interlock -> vdrain_lock another vnode sharing the
v_interlock may lock this order.
While here, restore fstrans_start_nowait arg to FSTRANS_LAZY.

Fixes a deadlock seen recently on some pbulk environments.
2016-12-20 10:02:21 +00:00
cherry
28fcb4a4b5 panic() must be able to take varargs - in userspace testing too. 2016-12-19 13:02:14 +00:00
dholland
b79a953f51 typo in comment 2016-12-18 05:43:20 +00:00
riastradh
51beee07d0 Fix return value of nommap. 2016-12-16 23:35:04 +00:00
kamil
241cf91ddc Add support for hardware assisted watchpoints/breakpoints API in ptrace(2)
Add new ptrace(2) calls:
 - PT_COUNT_WATCHPOINTS - count the number of available hardware watchpoints
 - PT_READ_WATCHPOINT   - read struct ptrace_watchpoint from the kernel state
 - PT_WRITE_WATCHPOINT  - write new struct ptrace_watchpoint state, this
                          includes enabling and disabling watchpoints

The ptrace_watchpoint structure contains MI and MD parts:

typedef struct ptrace_watchpoint {
	int		pw_index;	/* HW Watchpoint ID (count from 0) */
	lwpid_t		pw_lwpid;	/* LWP described */
	struct mdpw	pw_md;		/* MD fields */
} ptrace_watchpoint_t;

For example amd64 defines MD as follows:
struct mdpw {
	void	*md_address;
	int	 md_condition;
	int	 md_length;
};

These calls are protected with the __HAVE_PTRACE_WATCHPOINTS guard.

Tested on amd64, initial support added for i386 and XEN.

Sponsored by <The NetBSD Foundation>
2016-12-15 12:04:17 +00:00
hannken
4349535165 Change the freelists to lrulists, all vnodes are always on one
of the lists.  Speeds up namei on cached vnodes by ~3 percent.

Merge "vrele_thread" into "vdrain_thread" so we have one thread
working on the lrulists.  Adapt vfs_drainvnodes() to always wait
for a complete cycle of vdrain_thread().
2016-12-14 15:49:35 +00:00
hannken
70ec436e39 Move vnode members "v_freelisthd" and "v_freelist" from "struct vnode"
to "struct vnode_impl" and rename to "vi_lrulisthd" and "vi_lrulist".

No functional change intended.

Welcome to 7.99.48
2016-12-14 15:48:54 +00:00
hannken
13fa9cae25 Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().
2016-12-14 15:46:57 +00:00
nat
f1631e52a4 Add functions to access device flags. This restores simultaneous audio
open/close.

OK hannken@ christos@
2016-12-09 19:13:47 +00:00
roy
89a9eb7b34 When loading a kernel, test if it's already loaded before authorizing.
This allows us to return EEXIST instead of EPERM for higher secure levels.

My use case was to stop npfctl complaining that it could not load bpfjit
on ERLITE when it was compiled into the kernel.
It then went on to complain that NPF performance would be de-graded,
but this is clearly not the case.
2016-12-09 13:06:41 +00:00
christos
cdadd9e0af void duplicate definition on statically linking libc+ssp and rumpkern+ssp. 2016-12-06 02:55:42 +00:00
christos
cf786e11e4 set the signal flag when the signal was sent to every lwp, not to just an
individual one.
2016-12-05 22:07:16 +00:00
christos
1d6d63b6d6 PR/51685: Kamil Rytarowski: Fill sigcontext info in kpsignal2 so that the
debugger/core-dump signal info gets filled in in all code paths (including
the lwp_kill one).
2016-12-04 16:40:43 +00:00
christos
840d624913 Add missing ktrkuser 2016-12-03 22:28:16 +00:00
hannken
f3e32599e8 - Change vcache_reclaim() to always call VOP_INACTIVE() before VOP_RECLAIM().
When called from vrecycle() or vgone() there is a window where the refcount
  is greater than zero and another thread could get and release a reference
  that would miss VOP_INACTIVE() as the refcount doesn't drop to zero.

  Adjust test fs/puffs/t_basic:  test VOP_INACTIVE count being greater zero.

- Make vrecycle() more robust by checking v_usecount first and preventing
  further references across vn_lock().  Fixes a deadlock where one thread
  starts unmount, second thread locks a directory and allocates a vnode
  and first thread tries to vrecycle() the directory.
  First thread holds vfs_busy and wants vnode, second thread holds vnode
  and wants vfs_busy.

- With these fixes in place change cleanvnode() to use vget()/vrecycle()
  to reclaim the vnode.
2016-12-01 14:49:03 +00:00
ozaki-r
6f15561386 Fix a race condition of low priority xcall
xc_lowpri and xc_thread are racy and xc_wait may return during/before
executing all xcall callbacks, resulting in a kernel panic at worst.

xc_lowpri serializes multiple jobs by a mutex and a cv. If all xcall
callbacks are done, xc_wait returns and also xc_lowpri accepts a next job.

The problem is that a counter that counts the number of finished xcall
callbacks is incremented *before* actually executing a xcall callback
(see xc_tailp++ in xc_thread). So xc_lowpri accepts a next job before
all xcall callbacks complete and a next job begins to run its xcall callbacks.

Even worse the counter is global and shared between jobs, so if a xcall
callback of the next job completes, the shared counter is incremented,
which confuses wc_wait of the previous job as all xcall callbacks of the
previous job are done and wc_wait of the previous job returns during/before
executing its xcall callbacks.

How to fix: there are actually two counters that count the number of finished
xcall callbacks for low priority xcall for historical reasons (I guess):
xc_tailp and xc_low_pri.xc_donep. xc_low_pri.xc_donep is incremented correctly
while xc_tailp is incremented wrongly, i.e., before executing a xcall callback.
We can fix the issue by dropping xc_tailp and using only xc_low_pri.xc_donep.

PR kern/51632
2016-11-21 00:54:21 +00:00
christos
ecb08d7cca Add FALLTHROUGH commit 2016-11-19 19:06:12 +00:00
pgoyette
f48fa2dcc1 By popular request, don't bother initializing a static pointer to NULL. 2016-11-18 02:37:33 +00:00
pgoyette
fdd49fc76c Use compile-time initialization for the list head, and make sure that
the sysctllog is also initialized before being used.
2016-11-17 08:06:49 +00:00
pgoyette
a1889144f5 Initialize the bufq code right before we're ready to load the strategy
modules.
2016-11-16 12:31:33 +00:00
pgoyette
219154eeef Define a new module class for the bufq_strategy modules. These need to
be loaded and intialized before autoconfigure runs, since some devices
(like disks and floppy drives) want to call bufq_alloc().
2016-11-16 10:42:14 +00:00
pgoyette
556c690963 Modularize the various bufq strategies 2016-11-16 00:46:46 +00:00