freelists AND the per-CPU pgflcache free pages caches, and that is the
number of pages that the pagedaemon considers to be available.
However, most pages in the pgflcache per-CPU free page caches are NOT
actually available for any particular allocation, and thus allocating
a page can fail even though the pagedaemon thinks enough pages are
available. This change makes CPU_COUNT_FREEPAGES only count pages in
the global freelists and not pages in the pgflcache per-CPU free page
caches, thus better aligning the pagedaemon's view of how many pages
are available with the number of pages that can actually be allocated
by any particular request. This fixes a hang that Christos was hitting.
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.
In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
each kind of page can be summed to get the totals.
- Track the number of free pages with a counter so that it's one less thing
for the allocator to do, which opens up further options there.
- Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot.
For the cheap option, give cpu_count_sync() a boolean parameter indicating
that a cached value is okay, and rate limit the updates for cached values
to hz.
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
the allocator's buckets, instead of doing round robin distribution. There
are open questions here but this is better than doing nothing.
- Kernel reserve pages are for the kernel not realtime threads.
- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
pages worked great with uvm_pageqlock, but it doesn't buy anything any more,
because now the busy pages are likely in a per-CPU queue somewhere waiting
to be processed, and changing the intent on those queued pages costs next
to nothing. Remove this and get back all the bits in pg->pqflags.
- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.
- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.
- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.
- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).
The method for assigning pages to buckets in the non-NUMA case sucks. It
can defeat memory interleaving in the hardware, and not distribute pages
fairly by colour. To fix this and make things more deterministic, take the
physical PFN and colour into account.
Then when freeing pages, in the non-NUMA case don't change the page's bucket
either. Keeping the bucket number stable will also permit partitioning page
replacement state by CPU package / NUMA node.
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.
- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
in the pdpolicy code. This means taking pg->interlock if assigning to
an object. The remaining barrier to lazy dequeue is having a dedicated
TAILQ_ENTRY in the page (it's currently shared with the page allocator).
- Have a single lock (uvmpd_lock) to protect pagedaemon state that was
previously covered by uvmpd_pool_drain_lock plus uvm_fpageqlock.
- Don't require any locks be held when calling uvm_kick_pdaemon().
- Use uvm_free().
something else soon and TBH it matches what this macro does better.
- Add inlines to set/get locator values in the unused lower bits of
pg->phys_addr. Begin by using it to cache the freelist index, because
computing it is expensive and that shows up during profiling. Discussed
on tech-kern.
- Fix various locking & sequencing errors with breaking loans.
- Don't call uvm_pageremove_tree() while holding pg->interlock as radixtree
can take further locks when freeing nodes.
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.
- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.
Ok yamt@.
lock for use of the pagedaemon policy code. Discussed on tech-kern.
PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour