Commit Graph

250 Commits

Author SHA1 Message Date
skrll
1f8e1b91a7 Some KNF. NFC. 2020-12-20 11:11:34 +00:00
chs
31e69e6c91 In the current code, CPU_COUNT_FREEPAGES counts pages in the global
freelists AND the per-CPU pgflcache free pages caches, and that is the
number of pages that the pagedaemon considers to be available.
However, most pages in the pgflcache per-CPU free page caches are NOT
actually available for any particular allocation, and thus allocating
a page can fail even though the pagedaemon thinks enough pages are
available.  This change makes CPU_COUNT_FREEPAGES only count pages in
the global freelists and not pages in the pgflcache per-CPU free page
caches, thus better aligning the pagedaemon's view of how many pages
are available with the number of pages that can actually be allocated
by any particular request.  This fixes a hang that Christos was hitting.
2020-10-18 18:31:31 +00:00
chs
9d18193c79 Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately.  Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.
In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors.  Fixes PR 55702.
2020-10-18 18:22:29 +00:00
skrll
023d9a4b2b G/C uvm_pagezerocheck 2020-09-20 10:30:05 +00:00
tnn
68cb89e9f3 add a __diagused to fix non-DIAGNOSTIC kernel 2020-08-15 01:27:22 +00:00
chs
19303cecfc centralize calls from UVM to radixtree into a few functions.
in those functions, assert that the object lock is held in
the correct mode.
2020-08-14 09:06:14 +00:00
skrll
f3bd60e230 Consistently use UVMHIST(__func__)
Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
2020-07-09 05:57:15 +00:00
thorpej
cb34023365 <sys/extent.h> not needed here. 2020-06-17 06:24:15 +00:00
ad
748d5a7e47 Remove PG_ZERO. It worked brilliantly on x86 machines from the mid-90s but
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
2020-06-14 21:41:42 +00:00
ad
c56f4883a6 uvm_pagerealloc(): resurrect the insertion case. 2020-06-13 19:55:39 +00:00
ad
ba90a6ba38 Counter tweaks:
- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
  each kind of page can be summed to get the totals.

- Track the number of free pages with a counter so that it's one less thing
  for the allocator to do, which opens up further options there.

- Remove cpu_count_sync_one().  It has no users and doesn't save a whole lot.
  For the cheap option, give cpu_count_sync() a boolean parameter indicating
  that a cached value is okay, and rate limit the updates for cached values
  to hz.
2020-06-11 22:21:05 +00:00
ad
4b8a875ae2 uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched.  It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
2020-06-11 19:20:42 +00:00
ad
232803aa39 Add uvm_pagewanted_p(): return true if someone is waiting on the page and
assert caller has correct lock to observe that.
2020-05-24 19:46:59 +00:00
ad
103c607c83 UVM_PAGE_TRKOWN: print the LID too 2020-05-19 20:46:39 +00:00
ad
c28f10c162 Don't set PG_AOBJ on a page unless UVM_OBJ_IS_AOBJ(), otherwise it can
catch pages from e.g. uvm_loanzero_object.
2020-05-17 17:12:28 +00:00
ad
8545b637a5 - If the hardware provided NUMA info, then use it to decide how to set up
the allocator's buckets, instead of doing round robin distribution.  There
  are open questions here but this is better than doing nothing.

- Kernel reserve pages are for the kernel not realtime threads.
2020-05-17 15:11:57 +00:00
ad
1912643ff9 Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
2020-03-17 18:31:38 +00:00
rin
999e4e2245 Fix build with UVMHIST. 2020-03-15 11:17:22 +00:00
ad
bd8206f32d Don't require a write lock for page enqueue/activate/deactivate. 2020-03-14 21:06:35 +00:00
ad
5972ba1600 Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock.  Proposed on tech-kern.
2020-03-14 20:23:51 +00:00
skrll
56c82af64d Trailing whitespace 2020-03-03 08:13:44 +00:00
skrll
19a7c6f088 Typo in comment 2020-03-03 07:51:26 +00:00
ad
bf79731039 Tighten up the locking around vp->v_iflag a little more after the recent
split of vmobjlock & v_interlock.
2020-02-27 22:12:53 +00:00
ad
3b5c9f3eb7 Fix a comment. 2020-02-23 23:54:52 +00:00
ad
d2a0ebb67a UVM locking changes, proposed on tech-kern:
- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart.  v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap.  Others to follow later.
2020-02-23 15:46:38 +00:00
ad
090ebf9cc8 uvmpdpol_pageactive(): the change to not re-activate recently activated
pages worked great with uvm_pageqlock, but it doesn't buy anything any more,
because now the busy pages are likely in a per-CPU queue somewhere waiting
to be processed, and changing the intent on those queued pages costs next
to nothing.  Remove this and get back all the bits in pg->pqflags.
2020-01-21 20:37:06 +00:00
ad
05a3457e85 Merge from yamt-pagecache (after much testing):
- Reduce unnecessary page scan in putpages esp. when an object has a ton of
  pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
  precisely in uvm layer.
2020-01-15 17:55:43 +00:00
ad
ca8c1cd33f - uvm_pagezerocheck(): put a global lock around it to protect the single
page mapping (DEBUG only).

- uvm_pagefree(): increment zeropages as needed.
2020-01-11 19:51:01 +00:00
ad
c5b060977a - Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
  threads to help out.  In particular, during preempt() if we're using SMT,
  try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems.  This
  mainly entails rearranging one of the CPU lists so it makes sense in all
  configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
  where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
  Extend the SMT awareness to try and handle that situation too (keep fast
  CPUs busy, use slow CPUs as helpers).
2020-01-09 16:35:03 +00:00
ad
eeff73a80f Page allocator:
The method for assigning pages to buckets in the non-NUMA case sucks.  It
can defeat memory interleaving in the hardware, and not distribute pages
fairly by colour.  To fix this and make things more deterministic, take the
physical PFN and colour into account.

Then when freeing pages, in the non-NUMA case don't change the page's bucket
either.  Keeping the bucket number stable will also permit partitioning page
replacement state by CPU package / NUMA node.
2020-01-05 22:01:09 +00:00
ad
94843b1390 - Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks.  Require that the page interlock be held over calls to
  uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state.  Rather than
  updating the global state synchronously, set an intended state on
  individual pages (active, inactive, enqueued, dequeued) while holding the
  page interlock.  After the interlock is released put the pages on a 128
  entry per-CPU queue for their state changes to be made real in batch.
  This results in in a ~400 fold decrease in contention on my test system.
  Proposed on tech-kern but modified to use the page interlock rather than
  atomics to synchronise as it's much easier to maintain that way, and
  cheaper.
2019-12-31 22:42:50 +00:00
ad
5c06357c90 Rename uvm_free() -> uvm_availmem(). 2019-12-31 13:07:09 +00:00
ad
b78a6618bd Rename uvm_page_locked_p() -> uvm_page_owner_locked_p() 2019-12-31 12:40:27 +00:00
ad
595f59ee30 uvm_pagealloc_pgb(): don't fill cache if we're into the reserves.
uvm_pagereplace(): use radix_tree_replace_node() to avoid alloc/free.
2019-12-30 17:45:53 +00:00
ad
bf8259ba89 Add missing call to uvm_pgflcache_resume(). 2019-12-28 16:07:41 +00:00
martin
995b81af44 Use PRIxPADDR to print a physical address (instead of casting to void*
and printing a pointer - which does not work well if sizeof(paddr_t) !=
sizeof(void*)).
2019-12-28 08:49:41 +00:00
ad
364cbbd32e Nothing uses uvm.cpus any more, and we can do the same with cpu_lookup(),
so get rid of it.
2019-12-27 13:19:24 +00:00
ad
9b1e2fa25c Redo the page allocator to perform better, especially on multi-core and
multi-socket systems.  Proposed on tech-kern.  While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
2019-12-27 12:51:56 +00:00
ad
be88751995 uvm_pagealloc_strat(): Tweak the locking to allow for lazy dequeue of pages
in the pdpolicy code.  This means taking pg->interlock if assigning to
an object.  The remaining barrier to lazy dequeue is having a dedicated
TAILQ_ENTRY in the page (it's currently shared with the page allocator).
2019-12-22 16:37:36 +00:00
ad
4d754f38b3 uvm_page_to_phys: mask off the lower bits. 2019-12-21 15:16:14 +00:00
ad
479b92623d Detangle the pagedaemon from uvm_fpageqlock:
- Have a single lock (uvmpd_lock) to protect pagedaemon state that was
  previously covered by uvmpd_pool_drain_lock plus uvm_fpageqlock.
- Don't require any locks be held when calling uvm_kick_pdaemon().
- Use uvm_free().
2019-12-21 14:50:34 +00:00
ad
a7b92da9a8 - Rename VM_PGCOLOR_BUCKET() to VM_PGCOLOR(). I want to reuse "bucket" for
something else soon and TBH it matches what this macro does better.

- Add inlines to set/get locator values in the unused lower bits of
  pg->phys_addr.  Begin by using it to cache the freelist index, because
  computing it is expensive and that shows up during profiling.  Discussed
  on tech-kern.
2019-12-21 14:41:44 +00:00
ad
92ec96a550 Counter tweaks:
"zeroaborts" + "free" don't need to be per-CPU counters, and "bucketmiss"
wasn't used.  Remove those and cluster by usage.
2019-12-21 14:33:18 +00:00
ad
f391f83641 Add uvm_free(): returns number of free pages in system. 2019-12-21 12:58:26 +00:00
ad
7c88a62545 PR kern/54783: t_mmap crahes the kernel
- Fix various locking & sequencing errors with breaking loans.

- Don't call uvm_pageremove_tree() while holding pg->interlock as radixtree
  can take further locks when freeing nodes.
2019-12-18 20:38:14 +00:00
ad
a98966d3dc - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
  done with a separate commit.  Cuts system time for a build by 20-25% on
  a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
2019-12-16 22:47:54 +00:00
ad
f9a949d85f Merge from yamt-pagecache:
uvm_pagerealloc(): Don't bother with insert to new.  Nobody uses it and it
can return an error now due to radixtree.
2019-12-16 18:30:18 +00:00
ad
881d12e6f2 Merge from yamt-pagecache:
- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
2019-12-15 21:11:34 +00:00
ad
6857513180 Merge from yamt-pagecache: use radixtree for page lookup.
rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress.  radixtree is the intended
replacement.

Ok yamt@.
2019-12-14 17:28:58 +00:00
ad
5978ddc663 Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code.  Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
2019-12-13 20:10:21 +00:00