Commit Graph

157 Commits

Author SHA1 Message Date
riastradh 943e1fb0b5 uvm: Eliminate __HAVE_ATOMIC_AS_MEMBAR conditionals.
Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html
2023-02-24 11:03:13 +00:00
andvar ff23aff6ad fix various typos in comments, documentation and messages. 2022-05-31 08:43:13 +00:00
riastradh ef3476fb57 sys: Use membar_release/acquire around reference drop.
This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
2022-04-09 23:38:31 +00:00
riastradh 122a3e8a60 sys: Membar audit around reference count releases.
If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

	  Thread A			  Thread B
	obj->foo = 42;			obj->baz = 73;
	mumble(&obj->bar);		grumble(&obj->quux);
	/* membar_exit(); */		/* membar_exit(); */
	atomic_dec -- not last		atomic_dec -- last
					/* membar_enter(); */
					KASSERT(invariant(obj->foo,
					    obj->bar));
					free_stuff(obj);

The memory barriers ensure that

	obj->foo = 42;
	mumble(&obj->bar);

in thread A happens before

	KASSERT(invariant(obj->foo, obj->bar));
	free_stuff(obj);

in thread B.  Without them, this ordering is not guaranteed.

So in general it is necessary to do

	membar_exit();
	if (atomic_dec_uint_nv(&obj->refcnt) != 0)
		return;
	membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting.  (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these.  Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that.  It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
2022-03-12 15:32:30 +00:00
skrll e9de112945 Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats 2021-03-13 15:29:55 +00:00
chs 9133d44ed0 In uvmpd_tryownerlock(), if the initial try-lock of the owner lock fails
then rather than do more try-locks and eventually sleep for a tick,
take a hold on the current owner's lock, drop the page interlock,
and acquire the lock that we took the hold on in a blocking fashion.
After we get the lock, check if the lock that we acquired is still
the lock for the owner of the page that we're interested in.
If the owner hasn't changed then can proceed with this page,
otherwise we will skip this page and move on to a different page.
This dramatically reduces the amount of time that the pagedaemon
sleeps trying to get locks, since even 1 tick is an eternity to sleep
in this context and it was easy to trigger that case in practice,
and with this new method the pagedaemon only very rarely actually blocks
to acquire the lock that it wants since the object locks are adaptive,
and when the pagedaemon does block then the amount of time it spends
sleeping will be generally be much less than 1 tick.
2020-11-04 01:30:19 +00:00
chs a8aa7072a7 in uao_get(), if we unlock the uobj to read a page from swap,
we must clear the cached page array because it is now stale.
also add a missing call to uvm_page_array_fini() if the I/O fails.
fixes PR 55493.
2020-08-19 15:36:41 +00:00
simonb bf74807839 Remove trailing \n from UVMHIST_LOG() format strings. 2020-08-19 07:29:00 +00:00
skrll f3bd60e230 Consistently use UVMHIST(__func__)
Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
2020-07-09 05:57:15 +00:00
skrll 4e62681fb0 Trailing whitespace 2020-07-08 13:26:22 +00:00
ad a6b947c515 uao_get(): in the PGO_SYNCIO case use uvm_page_array and simplify control
flow a little bit.
2020-05-25 22:04:51 +00:00
ad 4bfe043955 - Alter the convention for uvm_page_array slightly, so the basic search
parameters can't change part way through a search: move the "uobj" and
  "flags" arguments over to uvm_page_array_init() and store those with the
  array.

- With that, detect when it's not possible to find any more pages in the
  tree with the given search parameters, and avoid repeated tree lookups if
  the caller loops over uvm_page_array_fill_and_peek().
2020-05-25 21:15:10 +00:00
ad 48081d9a3b PR kern/55300: ubciomove triggers page not dirty assertion
If overwriting an existing page, mark it dirty since there may be no
managed mapping to track the modification.
2020-05-25 20:13:00 +00:00
ad 0fd3595baf uao_get(): handle PGO_OVERWRITE. 2020-05-22 19:02:59 +00:00
hannken b616d2aaa2 Suppress GCC warnings and fix a UVMHIST_LOG() statement.
Kernels ALL/amd64 and ALL/i386 and port sparc64 build again.
2020-05-20 12:47:36 +00:00
ad 812b46df75 PR kern/32166: pgo_get protocol is ambiguous
Also problems with tmpfs+nfs noted by hannken@.

Don't pass PGO_ALLPAGES to pgo_get, and ignore PGO_DONTCARE in the
!PGO_LOCKED case.  In uao_get() have uvm_pagealloc() take care of page
zeroing and release busy pages on error.
2020-05-19 22:22:15 +00:00
ad ff872804dc Start trying to reduce cache misses on vm_page during fault processing.
- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter.  Mark
  pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages.  In
  all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
  those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
  don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
  the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
2020-05-17 19:38:16 +00:00
ad 9c9ebb954c PR kern/55268: tmpfs is slow
uao_get(): in the PGO_LOCKED case, we're okay to allocate a new page as long
as the caller holds a write lock.  PGO_NOBUSY doesn't put a stop to that.
2020-05-15 22:27:04 +00:00
ad 1d7848ad43 Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core.  Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
2020-03-22 18:32:41 +00:00
ad 1912643ff9 Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
2020-03-17 18:31:38 +00:00
ad 5972ba1600 Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock.  Proposed on tech-kern.
2020-03-14 20:23:51 +00:00
rin 4c1762c6a4 0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.
2020-02-24 12:38:57 +00:00
ad d2a0ebb67a UVM locking changes, proposed on tech-kern:
- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart.  v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap.  Others to follow later.
2020-02-23 15:46:38 +00:00
ad 05a3457e85 Merge from yamt-pagecache (after much testing):
- Reduce unnecessary page scan in putpages esp. when an object has a ton of
  pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
  precisely in uvm layer.
2020-01-15 17:55:43 +00:00
ad 94843b1390 - Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks.  Require that the page interlock be held over calls to
  uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state.  Rather than
  updating the global state synchronously, set an intended state on
  individual pages (active, inactive, enqueued, dequeued) while holding the
  page interlock.  After the interlock is released put the pages on a 128
  entry per-CPU queue for their state changes to be made real in batch.
  This results in in a ~400 fold decrease in contention on my test system.
  Proposed on tech-kern but modified to use the page interlock rather than
  atomics to synchronise as it's much easier to maintain that way, and
  cheaper.
2019-12-31 22:42:50 +00:00
ad 881d12e6f2 Merge from yamt-pagecache:
- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
2019-12-15 21:11:34 +00:00
ad 5978ddc663 Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code.  Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
2019-12-13 20:10:21 +00:00
ad f36df6629d Avoid calling pmap_page_protect() while under uvm_pageqlock. 2019-12-01 20:31:40 +00:00
ad 221d5f982e - Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.
2019-12-01 14:40:31 +00:00
msaitoh 8ccde42c84 Avoid undefined behavior in uao_pagein_page(). Found by kUBSan. OK'd by
riastradh. I think this is a real bug on amd64 at least.
2019-07-28 05:28:53 +00:00
chs 0656708806 allow tmpfs files to be larger than 4GB. 2018-05-28 21:04:35 +00:00
pgoyette cb32a134a5 Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
  the kernel and in the structures used for exporting the history data
  to userland via sysctl(9).  This avoids problems on some architectures
  where passing a 64-bit (or larger) value to printf(3) can cause it to
  process the value as multiple arguments.  (This can be particularly
  problematic when printf()'s format string is not a literal, since in
  that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
  include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
  updated.  Each format specifier now includes an explicit length
  modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
  updated to replace uses of "%p" with "%#jx", and the pointer
  arguments are now cast to (uintptr_t) before being subsequently cast
  to (uintmax_t).  This is needed to avoid compiler warnings about
  casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
  "%c" format strings replaced with numeric formats; several instances
  of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
  history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
  the -u option does not exist (previously, this condition was silently
  ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
  data exported via sysctl(9) and exits if they do not match the values
  with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
  requirements imposed on the format strings, along with several other
  minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
    uint64_t) for the history arguments.  But that would require another
    "rototill" of all the users in the future when we add support for an
    architecture that supports a larger size.  Also, the printf(3) format
    specifiers for explicitly-sized values, such as "%"PRIu64, are much
    more verbose (and less aesthetically appealing, IMHO) than simply
    using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
    but it is possible that I've missed some of them.  I would be glad to
    update any stragglers that anyone identifies.
2017-10-28 00:37:11 +00:00
chs b4dc18ca94 add assertions that would have caught the recent audio mmap bugs. 2017-05-30 17:09:17 +00:00
martin 07832dcc0c PR kern/51371: fix misleading indentation 2016-07-28 07:52:06 +00:00
pooka d8e04c9094 to garnish, dust with _KERNEL_OPT 2015-08-24 22:50:32 +00:00
riastradh e951f85b1a Allow VM_NFREELIST in uao_set_pgfl, meaning any freelist is OK. 2014-05-25 18:55:11 +00:00
riastradh c20b71f6fe Add uao_set_pgfl to limit a uvm_aobj's pages to a specified freelist.
Brought up on tech-kern:

https://mail-index.netbsd.org/tech-kern/2014/05/20/msg017095.html
2014-05-22 14:01:46 +00:00
martin c9e83a001d Mark a diagnostic-only variable 2013-10-25 20:22:55 +00:00
matt 76a03311f2 #include <sys/atomic.h> 2012-09-15 06:25:47 +00:00
rmind 1e84067639 - Manage anonymous UVM object reference count with atomic ops.
- Fix an old bug of possible lock against oneself (uao_detach_locked() is
  called from uao_swap_off() with uao_list_lock acquired).  Also removes
  the try-lock dance in uao_swap_off(), since the lock order changes.
2012-09-14 22:20:50 +00:00
rmind 8c26068070 - Describe uvm_aobj and the lock order.
- Remove unnecessary uao_dropswap_range1() wrapper.
- KNF.  Sprinkle some __cacheline_aligned.
2012-09-14 18:56:15 +00:00
matt 3d901f1f5e Allocate color appropriate pages. 2011-09-06 16:41:55 +00:00
rmind e225b7bd09 Welcome to 5.99.53! Merge rmind-uvmplock branch:
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
  New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
  the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
  Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
  kernel-lock on some ports).  Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
2011-06-12 03:35:36 +00:00
rmind c22a36981f Replace "malloc" in comments, remove unnecessary header inclusions. 2011-04-23 18:14:12 +00:00
rmind cdc76ff97e Replace uvm_aobj_cache with kmem(9). 2011-02-11 00:21:18 +00:00
chuck f9d8cc1a37 udpate license clauses on chuck^2 code to match the new-style BSD licenses.
based on diff that rmind@ sent me (and confirmed with chs@ via email).

no functional change with this commit.
2011-02-02 15:28:38 +00:00
enami 7f4cb8b53c Remove nop code; the code is moved to uao_dropswap_range1() when it is
introduced in rev. 1.75.
2011-01-25 03:34:29 +00:00
hannken c84e81cad1 Add vm page flag PG_MARKER and use it to tag dummy marker pages
in genfs_do_putpages() and uao_put().
Use 'v_uobj.uo_npages' to check for an empty memq.
Put some assertions where these marker pages may not appear.

Ok: YAMAMOTO Takashi <yamt@netbsd.org>
2010-07-29 10:54:50 +00:00
rmind 22d67cdea8 uvm_fault_{upper,lower}_done: move drop-swap outside the page-queues lock.
Assert for object lock being held (or ref count 0) in uao_set_swslot().
2010-05-28 23:41:14 +00:00
rmind 40cf6f3659 Remove uarea swap-out functionality:
- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code.  Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
2009-10-21 21:11:57 +00:00