Commit Graph

114 Commits

Author SHA1 Message Date
thorpej 982ae832c3 Overhaul of the EVFILT_VNODE kevent(2) filter:
- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
  forcing each individual file system to deal with it (except VOP_RENAME(),
  because VOP_RENAME() is a mess and we currently have 2 different ways
  of handling it; at least it's reasonably well-centralized in the "new"
  way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
  compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
  to avoid doing work for events no one cares about (avoiding, e.g.
  taking locks and traversing the klist to send a NOTE_WRITE when
  someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
  to be invoked before and after vop_pre() and vop_post(), respectively.
  Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
  vop_*_args structures.  These context fields are used to convey information
  between the file system VOP function and the VOP wrapper, but do not
  occupy an argument slot in the VOP_*() call itself.  These context fields
  are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
  back the resulting link count of the target vnode.  Return this in tmpfs,
  udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
2021-10-20 03:08:16 +00:00
riastradh 9fc453562f Round of uvm.h cleanup.
The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
  to query whether curlwp is the pagedaemon, which should maybe be
  exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
  by it.  We should split up uvm_extern.h but this will serve for now
  to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
  UVMHIST(ubchist), since ubchist is declared in uvm.h but the
  reference evaporates if UVMHIST is not defined, so we reduce header
  file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
  here.

ok chs@
2020-09-05 16:30:10 +00:00
ad 5c373ea86c PR kern/55268: tmpfs is slow
tmpfs_getpages(): handle the PGO_LOCKED case and implement lazy update of
atime/mtime.
2020-05-17 19:39:15 +00:00
christos 9aa2a9c323 Add ACL support for FFS. From FreeBSD. 2020-05-16 18:31:45 +00:00
ad d6844897f1 cache_enter_id(): give it a boolean parameter to indicate whether the cached
identity is valid.
2020-05-12 23:17:41 +00:00
ad f5ad84fdb3 PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)
- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
  somewhere.  Use it to decide whether to do direct-mapped copy, rather than
  poking around directly in the vnode in ubc_uiomove(), which is ugly and
  doesn't work for tmpfs.  It would be nicer to contain all this in UVM but
  the filesystem provides the needed locking here (VV_MAPPED) and to
  reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS().  Pass in UBC_ISMAPPED where
  appropriate.
2020-04-23 21:47:07 +00:00
ad c90f9c8c81 Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
2020-04-04 20:49:30 +00:00
ad f1944288ff tmpfs_reg_resize(): do nothing if newsize == oldsize. 2020-03-14 13:37:49 +00:00
ad d2a0ebb67a UVM locking changes, proposed on tech-kern:
- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart.  v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap.  Others to follow later.
2020-02-23 15:46:38 +00:00
christos 0da0aa0234 Add newly created vnodes to the namei cache. The rest of the filesystems
already did that (or they don't support writing). Discussed in tech-kern.
2019-09-18 17:59:14 +00:00
hannken b689ec0f78 Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
2019-01-01 10:06:54 +00:00
chs 0656708806 allow tmpfs files to be larger than 4GB. 2018-05-28 21:04:35 +00:00
hannken bd6e0af46f Change tmpfs_chsize() to update mtime etc. even if "length == node->tn_size".
Adresses PR kern/51762 "mtime not updated by open(O_TRUNC)"
2017-01-04 10:06:43 +00:00
leot 9df9aefc02 Make sure that nde->td_node is NULL for asserts.
Thanks and from Mindaugas Rasiukevicius.

Fixes PR kern/50381.
2015-10-29 16:19:44 +00:00
justin eec608f983 This enum is likely to be made unsigned by the compiler, so the assertion
will not work and clang objects with -Wtautological-constant-out-of-range-compare
2015-07-07 09:30:24 +00:00
hannken 8c80da52ef Change tmpfs to vcache.
- Use tmpfs node address as key.
- Remove tn_vlock, field tn_vnode now protected by vcache.
- Add a hold count to tmpfs node to prevent nodes from disappearing
  while tmpfs_fhtovp() trys to vcache_get() them.  Last holder
  destroys reclaimed nodes.
- Remove the now unneeded parent unlock/lock for lookup of '..'.
2015-07-06 10:07:12 +00:00
riastradh f6139440c5 Make vget always return vnode unlocked.
Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.
2015-04-20 13:44:16 +00:00
gson 1b265e6701 Store symlinks without a NUL terminator so that lstat(2) returns the
correct length.  Fixes the tmpfs part of PR kern/48864.
2014-09-08 14:49:46 +00:00
hannken 04c776e5c8 Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
2014-01-23 10:13:55 +00:00
hannken 1139274440 Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
2014-01-17 10:55:01 +00:00
pedro 0cc071bd88 Allocate direntp on the stack in tmpfs_dir_getdents(), thus saving
calls to kmem_zalloc() and kmem_free(); OK rmind@. From OpenBSD.
2014-01-08 16:11:04 +00:00
hannken 8ea5485de5 Fix a race where thread1 runs VOP_REMOVE() and gets preempted in
tmpfs_reclaim() before the call to tmpfs_free_node().  Thread2
runs VFS_FHTOVP() and gets a new vnode attached to the node thread1
is about to destroy.

Change tmpfs_alloc_node() to always assign non-zero generation number
and tmpfs_inactive() to set the generation number of unlinked nodes
to zero.
2014-01-03 09:53:12 +00:00
rmind ccc45228d5 - tmpfs_construct_node: prevent from the new node construction if the
directory was removed.  Fixes the crash reported by Nicolas Joly.
- tmpfs_reclaim: avoid race by checking tn_links with the vnode locked.
2013-11-24 17:16:29 +00:00
rmind 63faa32f62 tmpfs_reg_resize: use size_t. 2013-11-23 21:53:27 +00:00
rmind e63cf28e67 - Simplify tmpfs_update(), eliminate tmpfs_note_t::tn_status and deferred
timestamp updates.  Fix some incorrect updates and plug some missing ones.
  Should fix PR/48385.
- tmpfs_rmdir: avoid O(n) scan when the directory is not empty and whiteout
  entries were never added.
2013-11-23 16:35:32 +00:00
rmind ace15189ad tmpfs_dir_getdotents: fix the recent regression, set the correct
d_fileno value for dot-dot.  Spotted by Pedro Martelletto, thanks!
2013-11-21 14:39:09 +00:00
rmind 8da90206bc Make tmpfs_node_t::tn_gen a 32-bit number, keep it in sync with tmpfs_fid_t.
Also, change tn_status to unsigned while here.
2013-11-18 01:39:34 +00:00
rmind 3033c7dc60 tmpfs_dir_getdents: avoid leaking kernel memory to the userspace.
From Pedro Martelletto.

XXX: regress/sys/fs/getdents should be a part of the test suite
2013-11-16 17:58:27 +00:00
rmind 6862603939 tmpfs_alloc_node: use cprng_fast64(), the old random(9) shall be removed. 2013-11-11 17:04:06 +00:00
rmind 89433ee6d9 Handle whiteout case in tmpfs_dir_detach() and tmpfs_unmount(). 2013-11-10 12:46:19 +00:00
christos 15bc40ee73 mark variable __diagused 2013-11-10 03:20:20 +00:00
rmind 1f5dbc945b tmpfs: replace the broken tmpfs_dircookie() logic which uses the node
address truncated to 31 bits (required for 32-bit readdir compatibility,
e.g. linux32).  Instead, assign 2^31 range using the following logic:
- The first half of the 2^31 is assigned incrementally (the fast path).
- When exceeded, use the second half of 2^31, but manage with vmem(9).

It will require 2 billion files per-directory to trigger vmem(9) usage.
Also, while here, add some fixes for tmpfs_unmount().

Should fix PR/47739, PR/47480, PR/46088 and PR/41068.
Thanks to wiz@ for stress testing.
2013-11-08 15:44:23 +00:00
rmind 20a51a9773 tmpfs: fix the zero-length symlink target case as NetBSD supports them. 2013-11-01 15:38:45 +00:00
rmind f8abe6cb77 tmpfs_alloc_node: it is less error-prone to store the link path with
the NIL terminator included.  Adjust tmpfs_readlink() to exclude NIL.
Also, remove the check for zero-length and add some asserts.
2013-10-31 00:59:17 +00:00
rmind 49ce9c94dc - tmpfs_remove: check 'appendable' flag for the parent directory as well.
Patch from Pedro Martelletto.
- tmpfs_dir_detach: remove missleading check.
- tmpfs_link: remove unused variable.
2013-10-04 15:14:11 +00:00
elad 0c9d8d15c9 Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

    http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
    http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
    http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
2012-03-13 18:40:26 +00:00
tls 3afd44cf08 First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>.  This change includes
the following:

	An initial cleanup and minor reorganization of the entropy pool
	code in sys/dev/rnd.c and sys/dev/rndpool.c.  Several bugs are
	fixed.  Some effort is made to accumulate entropy more quickly at
	boot time.

	A generic interface, "rndsink", is added, for stream generators to
	request that they be re-keyed with good quality entropy from the pool
	as soon as it is available.

	The arc4random()/arc4randbytes() implementation in libkern is
	adjusted to use the rndsink interface for rekeying, which helps
	address the problem of low-quality keys at boot time.

	An implementation of the FIPS 140-2 statistical tests for random
	number generator quality is provided (libkern/rngtest.c).  This
	is based on Greg Rose's implementation from Qualcomm.

	A new random stream generator, nist_ctr_drbg, is provided.  It is
	based on an implementation of the NIST SP800-90 CTR_DRBG by
	Henric Jungheim.  This generator users AES in a modified counter
	mode to generate a backtracking-resistant random stream.

	An abstraction layer, "cprng", is provided for in-kernel consumers
	of randomness.  The arc4random/arc4randbytes API is deprecated for
	in-kernel use.  It is replaced by "cprng_strong".  The current
	cprng_fast implementation wraps the existing arc4random
	implementation.  The current cprng_strong implementation wraps the
	new CTR_DRBG implementation.  Both interfaces are rekeyed from
	the entropy pool automatically at intervals justifiable from best
	current cryptographic practice.

	In some quick tests, cprng_fast() is about the same speed as
	the old arc4randbytes(), and cprng_strong() is about 20% faster
	than rnd_extract_data().  Performance is expected to improve.

	The AES code in src/crypto/rijndael is no longer an optional
	kernel component, as it is required by cprng_strong, which is
	not an optional kernel component.

	The entropy pool output is subjected to the rngtest tests at
	startup time; if it fails, the system will reboot.  There is
	approximately a 3/10000 chance of a false positive from these
	tests.  Entropy pool _input_ from hardware random numbers is
	subjected to the rngtest tests at attach time, as well as the
	FIPS continuous-output test, to detect bad or stuck hardware
	RNGs; if any are detected, they are detached, but the system
	continues to run.

	A problem with rndctl(8) is fixed -- datastructures with
	pointers in arrays are no longer passed to userspace (this
	was not a security problem, but rather a major issue for
	compat32).  A new kernel will require a new rndctl.

	The sysctl kern.arandom() and kern.urandom() nodes are hooked
	up to the new generators, but the /dev/*random pseudodevices
	are not, yet.

	Manual pages for the new kernel interfaces are forthcoming.
2011-11-19 22:51:18 +00:00
hannken f68873a343 Finish and enable whiteout support for tmpfs:
- Enable VOP tmpfs_whiteout().
- Support ISWHITEOUT in tmpfs_alloc_file().
- Support DOWHITEOUT in tmpfs_remove() and tmpfs_rmdir().
- Make rmdir on a directory containing whiteouts working.

Should fix PR #35112 (tmpfs doesn't play well with unionfs).
2011-08-27 15:32:28 +00:00
enami d281b75f90 Backout previous. May be I need more coffee. 2011-06-30 00:37:07 +00:00
enami 7a1a5c2e85 - Use << PAGE_SHIFT rather than calling round_page again.
- No need to call uao_dropswap_range() here since uao_dropswap()
  is already called for each pages by uvm_vnp_setsize().
2011-06-30 00:09:26 +00:00
hannken d296304e60 Rename uvm_vnp_zerorange(struct vnode *, off_t, size_t) to
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.

Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode.  Ubc_purge() no longer panics when unmounting tmpfs.

Keep uvm_vnp_zerorange() until the next kernel version bump.
2011-06-16 09:21:02 +00:00
rmind e225b7bd09 Welcome to 5.99.53! Merge rmind-uvmplock branch:
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
  New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
  the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
  Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
  kernel-lock on some ports).  Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
2011-06-12 03:35:36 +00:00
rmind 9976e82537 Fix non-DEBUG build. 2011-05-29 22:43:32 +00:00
rmind 4781942ccb - Rework and document inode reference counting. Also document inode life
cycle (destruction part).  Perform link counting in tmpfs_dir_attach()
  and tmpfs_dir_detach(), instead of alloc/free and arbitrary places.
  Fixes PR/44285, PR/44288, PR/44657 and likely PR/42484.

- Fix the race between the lookup and inode destruction.  Fixes PR/43167
  and its duplicates PR/40088, PR/40757.

- Improve tmpfs_rename() locking a little, fix kqueue event notifications
  and also fix PR/43617.  Add simplistic tmpfs_parentcheck_p(); to be
  expanded and used for further rename() locking fixes.

- Cache directory entry "hint" in the tmpfs node, add tmpfs_dir_cached(),
  and thus avoid unnecessary lookup in tmpfs_remove() and tmpfs_rmdir().

- Set correct _PC_FILESIZEBITS value in tmpfs_pathconf().  Fixes PR/43576.

- Few minor fixes.
2011-05-29 22:29:06 +00:00
rmind 78c419363e tmpfs_update: comment out assert for now. 2011-05-25 02:03:22 +00:00
rmind 8d8c6221a2 tmpfs_dir_lookup: use 'name' variable in memcmp() as intended; fix warning. 2011-05-25 00:06:45 +00:00
rmind e9d92e9ce9 - tmpfs_lookup: cache (cnp->cn_flags & ISLASTCN) in const bool; de-indent.
- Group tmpfs_{alloc,free}_dirent() with other dirent routines.

No functional changes.
2011-05-24 23:16:16 +00:00
rmind 9d8a062828 - Describe some locking.
- Add VOP argument comments, add some asserts.
- Update/fix/remove outdated/missleading comments.
- Clean up, de-indent, KNF, misc.

No functional changes intended.
2011-05-24 20:17:49 +00:00
rmind cc2c5ce631 tmpfs_free_node: comment out assert, which can fire e.g. on shutdown. 2011-05-24 14:18:03 +00:00
rmind 7384069ad1 - tmpfs_alloc_node/tmpfs_free_node: move inode limiting into tmpfs_node_get()
and tmpfs_node_put(), update outdated/wrong comments and move/add asserts.
- tmpfs_mount: check for the version of arguments a bit earlier.
2011-05-24 01:09:47 +00:00