Commit Graph

1275 Commits

Author SHA1 Message Date
mlelstv 8723855e11 Restore mapping of file id to pid/type/fd.
Use 64bit file id to allow for 32bit fd and 25-26bit pid.
2019-04-25 22:48:42 +00:00
christos dc2e740ec9 add a node for the process resource limits. 2019-03-30 23:28:30 +00:00
hannken f421b3668b Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.
2019-02-20 10:07:27 +00:00
hannken 2df7877a09 Set "mnt_lower" before the first file system operation on the new file system. 2019-02-20 10:05:59 +00:00
hannken b689ec0f78 Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
2019-01-01 10:06:54 +00:00
jdolecek 560071c2bd assert that WAPBL journal write lock is actually held when called with
PGO_JOURNALLOCKED or IO_JOURNALLOCKED

suggested by mrg@, thanks
2018-12-10 21:10:52 +00:00
jdolecek 5f7e430114 support flag PGO_JOURNALLOCKED also for genfs_getpages() 2018-12-09 20:32:37 +00:00
christos dea5460561 As discussed in tech-kern:
- make sysctl kern.expose_address tri-state:
	0: no access
	1: access to processes with open /dev/kmem
	2: access to everyone
  defaults:
	0: KASLR kernels
	1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
  process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI
2018-12-05 18:16:51 +00:00
jdolecek bcc384fdef remove M_CANFAIL flag for malloc(9) - it was completely ignored, so had
actually no effect
2018-10-14 17:37:40 +00:00
hannken fef73d22c5 Bring back three state file system suspension:
NORMAL -> SUSPENDING -> SUSPENDED

and add operation fstrans_start_lazy() that only blocks while SUSPENDED.

Change vndthread() support operation handle_with_rdwr() to bracket
its file system operations by fstrans_start_lazy() and fstrans_done().

PR kern/53624 (dom0 freeze on domU exit)
2018-10-05 09:51:55 +00:00
riastradh d1579b2d70 Rename min/max -> uimin/uimax for better honesty.
These functions are defined on unsigned int.  The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER!  Some subsystems have

	#define min(a, b)	((a) < (b) ? (a) : (b))
	#define max(a, b)	((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX.  Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate.  But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all.  (Who knows, maybe in some cases integer
truncation is actually intended!)
2018-09-03 16:29:22 +00:00
chs e406c140eb add a genfs method to allow a file system to limit the range of pages
that are given to a single GOP_WRITE() call.  needed by ZFS.
2018-05-28 21:04:37 +00:00
hannken 12bfd1b42c Change procfs_revoke_vnodes() to use vrecycle()/vgone() instead
of VOP_REVOKE().

Gets rid of a bunch of suspensions on /proc as vrecycle() will
succeed most time and we suspend at most once per call.
2018-04-16 20:27:38 +00:00
hannken 31bf7bd0f0 Lock the target cwdi and take an additional reference to the
vnode we are interested in to prevent it from disappearing
before getcwd_common().

Should fix PR kern/53096 (netbsd-8 crash on heavy disk I/O)
2018-04-07 13:42:42 +00:00
christos 2be0b0c7db factor out some repeated code and simplify the logputchar function. 2018-03-31 23:12:01 +00:00
christos 262138d20e rename some "cmdline" stuff now that it is used to print environment too 2017-12-31 03:29:18 +00:00
christos 878c60d1ba Add an environ node 2017-12-31 03:02:23 +00:00
christos a6510c8d12 Allow procfs_kqfilter, since we allow poll. "go" does it. 2017-12-01 19:01:34 +00:00
christos 54ab745a37 fix locking, remove error(1) comments. 2017-11-08 00:51:47 +00:00
christos 95387a97be use p->p_path, remove unused code. 2017-11-08 00:42:12 +00:00
pgoyette cb32a134a5 Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
  the kernel and in the structures used for exporting the history data
  to userland via sysctl(9).  This avoids problems on some architectures
  where passing a 64-bit (or larger) value to printf(3) can cause it to
  process the value as multiple arguments.  (This can be particularly
  problematic when printf()'s format string is not a literal, since in
  that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
  include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
  updated.  Each format specifier now includes an explicit length
  modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
  updated to replace uses of "%p" with "%#jx", and the pointer
  arguments are now cast to (uintptr_t) before being subsequently cast
  to (uintmax_t).  This is needed to avoid compiler warnings about
  casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
  "%c" format strings replaced with numeric formats; several instances
  of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
  history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
  the -u option does not exist (previously, this condition was silently
  ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
  data exported via sysctl(9) and exits if they do not match the values
  with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
  requirements imposed on the format strings, along with several other
  minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
    uint64_t) for the history arguments.  But that would require another
    "rototill" of all the users in the future when we add support for an
    architecture that supports a larger size.  Also, the printf(3) format
    specifiers for explicitly-sized values, such as "%"PRIu64, are much
    more verbose (and less aesthetically appealing, IMHO) than simply
    using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
    but it is possible that I've missed some of them.  I would be glad to
    update any stragglers that anyone identifies.
2017-10-28 00:37:11 +00:00
maya 18b796d442 Use C99 initializer for filterops
Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- 	{ a, b, c, d
+ 	{
+ 	.f_isfd = a,
+ 	.f_attach = b,
+ 	.f_detach = c,
+ 	.f_event = d,
};
2017-10-25 08:12:37 +00:00
kre 6bf17633e9 Use %ju and (intmax_t) to unbreak i386 build. 2017-09-29 17:27:26 +00:00
christos e48fa1237b Split the status printing routines (one for NetBSD and one for Linux) for
simplicity (Robert Swindelis)
2017-09-29 12:57:05 +00:00
kamil a69b333e73 Remove the filesystem tracing feature
This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
 - /proc/#/ctl from mount_procfs(8)
 - P_FSTRACE note from the documentation of ps(1)
 - /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
 - KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
 - source code file miscfs/procfs/procfs_ctl.c
 - PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
 - KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
 - PSL_FSTRACE (0x00010000) from sys/sys/proc.h
 - P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>
2017-08-28 00:46:06 +00:00
hannken 7801661c06 No need to cache anonymous device vnodes, they will never be looked up.
Set key to (dead_rootmount, 0, NULL) and add assertions.
2017-08-21 08:56:45 +00:00
christos f36a117b25 Provide EVFILT_WRITE; this is what FreeBSD does and go wants it.
Makes go unit tests pass.
2017-07-01 20:07:00 +00:00
hannken ad2fab4568 Add missing check for dead or dying vnode to the entry of genfs_getpages(). 2017-06-27 08:40:53 +00:00
hannken 3764143ef7 Refuse to open a block device with zero open count when it has
a mountpoint set.  This may happen after forced detach or unplug
of a mounted block device.
2017-06-24 12:14:21 +00:00
hannken 287643b0da Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.
2017-06-04 08:05:41 +00:00
hannken f5647f853e Locking a layer vnode using the regular bypass routine is no longer
racy.  Undo the change from 2017-03-30 11:16:52, commitid eurqbzuGxGRlryLz
and make vi_lock a krwlock_t again.
2017-06-04 08:02:26 +00:00
hannken 05a181bf68 Now that FSTRANS is part of VOP_*LOCK() remove FSTRANS and vdead_check()
from genfs_.*lock() and assert the vnode state once the vnode is locked.
2017-06-04 08:01:33 +00:00
chs fd34ea77eb remove checks for failure after memory allocation calls that cannot fail:
kmem_alloc() with KM_SLEEP
  kmem_zalloc() with KM_SLEEP
  percpu_alloc()
  pserialize_create()
  psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
2017-06-01 02:45:05 +00:00
riastradh 7f7aad09bd Make VOP_RECLAIM do the last unlock of the vnode.
VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
2017-05-26 14:20:59 +00:00
hannken a4311f705f Protect layer_getpages against vnodes disappearing during a
forced unmount.
2017-05-24 09:54:40 +00:00
hannken 69174779b1 With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73
2017-05-24 09:53:55 +00:00
hannken 01d31ceb6d Return ENOENT if trying to suspend an unmounted file system. 2017-05-07 08:25:54 +00:00
hannken 2aedd7ca2a Move v_writecount adjustment from revoke to reclaim. 2017-05-07 08:21:57 +00:00
riastradh 6fa7b15833 Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.
No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
2017-04-26 03:02:47 +00:00
hannken 769f0c92c9 Switch procfs_domounts() to mountlist iterator. 2017-04-13 09:54:18 +00:00
martin c3c564b370 Make the non-DIAGNOSTIC version compile 2017-04-12 06:43:56 +00:00
riastradh d20cc14aa7 Eliminate now-unused WILLUNLOCK vop flag. 2017-04-11 14:29:32 +00:00
riastradh 87fb32292e Make VOP_INACTIVE preserve vnode lock on return.
Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
2017-04-11 14:24:59 +00:00
hannken 228d72edde Field "layerm_vfs" of "struct layer_mount" got superseded by "mnt_lower".
Adapt consumers and remove the now unused field.

Ride 7.99.68
2017-04-11 07:51:37 +00:00
dholland 03a2126f0d Clarify meaning of "glocked" argument of genfs_putpages_read. 2017-04-01 23:34:17 +00:00
riastradh 446694ba12 Simplify genfs_getpages_read async/unlock protocol.
Previously the caller unlocked for error or sync I/O, whereas
genfs_getpages_read unlocked on successful async.

Now caller unlocks in every case, and genfs_getpages_read doesn't
touch the lock.
2017-04-01 19:57:54 +00:00
riastradh 30509f8074 KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector. 2017-04-01 19:35:56 +00:00
christos 7b49b1857c remove comment. 2017-03-30 20:21:00 +00:00
christos 2d1443a6bc add an auxv node. 2017-03-30 20:16:29 +00:00
hannken d0dc55acf0 Locking a layer vnode is racy as it may become reclaimed before
calling the operation on the lower vnode.

Replace vi_lock with a rw_obj and change layered file systems
to share the lock with the lower vnode.

Layered file systems now use genfs_lock()/_unlock/_islocked().

Welcome to 7.99.67
2017-03-30 09:16:52 +00:00