Commit Graph

127 Commits

Author SHA1 Message Date
riastradh a83cdbec2d uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B) 2023-04-09 09:00:56 +00:00
andvar 6c0e29fbe8 s/strucure/structure/ and s/structues/structures/ in comments. 2023-02-12 16:28:32 +00:00
simonb 688f1f85ca Add a sysctl hashstat collector for ubchash. 2021-04-01 06:26:26 +00:00
skrll e9de112945 Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats 2021-03-13 15:29:55 +00:00
chs 58f644a817 remove someone's leftover debug printfs. 2020-11-10 04:27:22 +00:00
rin 27b5f91738 PR kern/55658
Revert rev 1.122:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/uvm/uvm_bio.c#rev1.122

If this commit is applied to NFS client, changes to files in client
side are sometimes invisible in server side, which results in file
corruption.

Demonstrated by test code provided by Anthony Mallet:
https://mail-index.netbsd.org/current-users/2020/10/17/msg039708.html

Whether the test case above passes or not depends on architectures
and size of NFS I/O specified by -r and -w options of mount_nfs(8)
(the default size is 32KB for x86 and 8KB for other archs).

Whereas it fails on amd64 and i386 with the default size, it passes
on other archs (aarch64, arm, alpha, m68k, and powerpc at least) with
their default. On most ports, it fails with some I/O sizes.

However, the condition for failure is still unclear; whereas it fails
with 2KB I/O size on amiga (m68k, 8KB page), it passes with same I/O
size on alpha (8KB page). It may depends on some VM parameters or
details in pmap implementation, or some race conditions are involved.

Great thanks to Anthony Mallet for providing the test code, and sorry
everyone for breakage.
2020-10-18 08:52:15 +00:00
rin d56c56e97e PR kern/55658
ubc_fault_page(): Ignore PG_RDONLY flag and always pmap_enter() the page
with the permissions of the original access_type.

It is the file system's responsibility to allocate blocks that is being
modified by write(), before calling into UBC to fill the pages for that
range. KASSERT() is added there to confirm that no clean page is mapped
writable.

Fix infinite loop in uvm_fault_internal(), observed on 16KB-page systems,
where it continues to try to make a partially-backed page writable.

No regression in ATF and KASSERT() does not fire on several architectures,
as far as I can see.

Fix suggested by chs. Thanks!
2020-10-05 04:48:23 +00:00
rin b0317a421b PR kern/55467
tmpfs calls pmap_kenter_pa(9) with virtual address with page offset

Bisectioning revealed that the failure starts with this commit:

sys/fs/tmpfs/tmpfs_vnops.c rev 1.142:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/fs/tmpfs/tmpfs_vnops.c#rev1.142

by which tmpfs became to use UBC_FAULTBUSY flag for ubc_uiomove(9).
If this flag is specified, pmap_kenter_pa(9) is called with virtual
address with page offset via ubc_alloc(9):

https://nxr.netbsd.org/xref/src/sys/uvm/uvm_bio.c#616

Most ports seem to neglect silently page offset of va argument for
pmap_kenter_pa(9). However, it causes KASSERT failure correctly for
powerpc/booke. So, truncate page offset there.

Now, tmpfs works just fine on evbppc-booke, and I've confirmed that
no new failures are detected by ATF.

Discussed with chs@. Thanks!
2020-07-09 09:24:32 +00:00
skrll f3bd60e230 Consistently use UVMHIST(__func__)
Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
2020-07-09 05:57:15 +00:00
skrll 4e62681fb0 Trailing whitespace 2020-07-08 13:26:22 +00:00
jdolecek e4118f4162 make ubc_winshift / ubc_winsize constant, and based on whatever is bigger
of (1 << UBC_WINSHIFT, MAX_PAGE_SIZE)

given that default UBC_WINSHIFT is 13, this changes behaviour only
for mips and powerpc (BookE/OEA), which will now have twice as much
memory used for UBC windows; if this ever becomes a problem, it's
possible to reduce ubc_nwins in MD code similar to what is done on sparc

this eliminates variable-length arrays in ubc_fault(),
ubc_uiomove(), and ubc_zerorange() so that the stack usage can be
determined and checked in compile time
2020-06-25 14:04:30 +00:00
ad 704e68575e ubc_uiomove_direct(): if UBC_FAULTBUSY, the left-over portion of the final
page needs to be zeroed.
2020-05-25 19:29:08 +00:00
ad 504f478a85 - ubc_uiomove(): Always use direct access in the UBC_FAULTBUSY case, since
it works basically the same way as !direct minus temporary mappings, and
  there are no concurrency issues.

- ubc_alloc_direct(): In the PGO_OVERWRITE case blocks are allocated
  beforehand.  Avoid waking or activating pages unless needed.
2020-05-24 20:05:53 +00:00
ad 5959644557 - In ubc_alloc() take initial offset into account in the UBC_FAULTBUSY case
or one too few pages can be mapped.

- In ubc_release() with UBC_FAULTBUSY, chances are that pages are newly
  allocated and freshly enqueued, so avoid uvm_pageactivate() if possible

- Keep track of the pages mapped in ubc_alloc() in an array on the stack,
  and use this to avoid calling pmap_extract() in ubc_release().
2020-05-23 11:59:03 +00:00
ad 812b46df75 PR kern/32166: pgo_get protocol is ambiguous
Also problems with tmpfs+nfs noted by hannken@.

Don't pass PGO_ALLPAGES to pgo_get, and ignore PGO_DONTCARE in the
!PGO_LOCKED case.  In uao_get() have uvm_pagealloc() take care of page
zeroing and release busy pages on error.
2020-05-19 22:22:15 +00:00
thorpej 11d794387e Disable ubc_direct by default again. There are still stability issues
(e.g. panic during 2020.04.25.00.07.27 amd64 releng test run).
2020-04-26 16:16:13 +00:00
ad 18391da5bf ubc_alloc_direct(): for a write make sure pages are always marked dirty
because there's no managed mapping.
2020-04-24 19:47:03 +00:00
ad f6da483c1a Enable ubc_direct by default, but only on systems with no more than 2 CPUs
for now.
2020-04-23 21:53:01 +00:00
ad f5ad84fdb3 PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)
- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
  somewhere.  Use it to decide whether to do direct-mapped copy, rather than
  poking around directly in the vnode in ubc_uiomove(), which is ugly and
  doesn't work for tmpfs.  It would be nicer to contain all this in UVM but
  the filesystem provides the needed locking here (VV_MAPPED) and to
  reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS().  Pass in UBC_ISMAPPED where
  appropriate.
2020-04-23 21:47:07 +00:00
ad e4cdabc9f4 ubc_direct_release(): unbusy the pages directly since pg->interlock is
being taken.
2020-04-23 21:12:06 +00:00
ad 5294ba607b ubc_direct_release(): remove spurious call to uvm_pagemarkdirty(). 2020-04-07 19:12:25 +00:00
ad f3fdb8c6cb PR kern/54759: vm.ubc_direct deadlock when read()/write() into mapping of itself
Prevent ubc_uiomove_direct() on mapped vnodes.
2020-04-07 19:11:13 +00:00
ad 1912643ff9 Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
2020-03-17 18:31:38 +00:00
ad 5972ba1600 Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock.  Proposed on tech-kern.
2020-03-14 20:23:51 +00:00
ad d2a0ebb67a UVM locking changes, proposed on tech-kern:
- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart.  v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap.  Others to follow later.
2020-02-23 15:46:38 +00:00
ad 05a3457e85 Merge from yamt-pagecache (after much testing):
- Reduce unnecessary page scan in putpages esp. when an object has a ton of
  pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
  precisely in uvm layer.
2020-01-15 17:55:43 +00:00
ad 94843b1390 - Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks.  Require that the page interlock be held over calls to
  uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state.  Rather than
  updating the global state synchronously, set an intended state on
  individual pages (active, inactive, enqueued, dequeued) while holding the
  page interlock.  After the interlock is released put the pages on a 128
  entry per-CPU queue for their state changes to be made real in batch.
  This results in in a ~400 fold decrease in contention on my test system.
  Proposed on tech-kern but modified to use the page interlock rather than
  atomics to synchronise as it's much easier to maintain that way, and
  cheaper.
2019-12-31 22:42:50 +00:00
ad 5978ddc663 Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code.  Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
2019-12-13 20:10:21 +00:00
skrll f563530044 Fix a UVMHIST_LOG format broken in 1.91 2019-11-07 07:45:14 +00:00
jdolecek 2386f90fa2 for direct map case, avoid PGO_NOBLOCKALLOC when writing, it makes
genfs_getpages() return unallocated pages using the zero page and
PG_RDONLY; the old code relied on fault logic to get it allocated, which
the direct case can't rely on

instead just allocate the blocks right away; pass PGO_JOURNALLOCKED
so that code wouldn't try to take wapbl lock, this code path is called
with it already held

this should fix KASSERT() due to PG_RDONLY on write with wapbl

towards resolution of PR kern/53124
2018-12-09 20:45:37 +00:00
jdolecek 29944bc553 need to use PGO_NOBLOCKALLOC also in ubc_alloc_direct() case, same
as non-direct code - otherwise the code tries to acquire the wapbl
lock again in genfs_getpages(), and panic due to locking against itself

towards PR kern/53124
2018-11-20 20:07:19 +00:00
chs c40f790c72 add missing boilerplate for UVMHIST. 2018-06-02 15:24:55 +00:00
jdolecek 91c2b8613a uvm_pageactivate() needs to be called _after_ code is done with the page, no reason
to bother pdaemon with PG_BUSY pages; also clear the PG_FAKE and PG_CLEAN after
we are done with the write

this does not make any difference on my machine, but maybe it might fix
the machine check panic on Martin's alpha

while here remove UBC_PARTIALOK handling from ubc_zeropage_direct(), just to be sure
it works exactly the same as the non-direct one
2018-05-26 18:57:35 +00:00
jdolecek 6a369df7db change code to take advantage of direct map when available, avoiding the need
to map pages into kernel

this improves performance of UBC-based (read(2)/write(2)) I/O especially
for cached block I/O - sequential read on my NVMe goes from 1.7 GB/s to 1.9 GB/s
for non-cached, and from 2.2 GB/s to 5.6 GB/s for cached read

the new code is conditional now and off for now, so that it can be tested further;
can be turned on by adjusting ubc_direct variable to true

part of fix for PR kern/53124
2018-05-19 15:13:26 +00:00
jdolecek b6ce67bcb3 make ubc_alloc() and ubc_release() static, they should not be used
outside of ubc_uiomove()/ubc_zeropage(); for now mark as noinline
to keep them available as breakpoints
2018-04-20 18:58:10 +00:00
jdolecek af45d25bfb mark ubc_winshift and ubc_winsize as __read_mostly, they are used often
so might benefit from cache placement
2018-03-26 21:43:30 +00:00
maxv d4848b04b5 Use UVM_PROT_RW instead of UVM_PROT_ALL. This doesn't change anything,
since the protection code is not applied: the pages are manually kentered
as RW.

But fix it anyway, so that "pmap 0" does not say the map is executable.
2018-02-09 09:07:13 +00:00
pgoyette cb32a134a5 Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
  the kernel and in the structures used for exporting the history data
  to userland via sysctl(9).  This avoids problems on some architectures
  where passing a 64-bit (or larger) value to printf(3) can cause it to
  process the value as multiple arguments.  (This can be particularly
  problematic when printf()'s format string is not a literal, since in
  that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
  include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
  updated.  Each format specifier now includes an explicit length
  modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
  updated to replace uses of "%p" with "%#jx", and the pointer
  arguments are now cast to (uintptr_t) before being subsequently cast
  to (uintmax_t).  This is needed to avoid compiler warnings about
  casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
  "%c" format strings replaced with numeric formats; several instances
  of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
  history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
  the -u option does not exist (previously, this condition was silently
  ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
  data exported via sysctl(9) and exits if they do not match the values
  with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
  requirements imposed on the format strings, along with several other
  minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
    uint64_t) for the history arguments.  But that would require another
    "rototill" of all the users in the future when we add support for an
    architecture that supports a larger size.  Also, the printf(3) format
    specifiers for explicitly-sized values, such as "%"PRIu64, are much
    more verbose (and less aesthetically appealing, IMHO) than simply
    using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
    but it is possible that I've missed some of them.  I would be glad to
    update any stragglers that anyone identifies.
2017-10-28 00:37:11 +00:00
chs fd34ea77eb remove checks for failure after memory allocation calls that cannot fail:
kmem_alloc() with KM_SLEEP
  kmem_zalloc() with KM_SLEEP
  percpu_alloc()
  pserialize_create()
  psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
2017-06-01 02:45:05 +00:00
ozaki-r 20fd1f7ca2 Fix typo 2017-03-21 02:24:35 +00:00
kre cd91c66b27 Ugh. This stuff is disgusting. We really need an arch dependent
PRIxOFF (and PRIdOFF) to print off_t's in a way that matches the
arch's definition of off_t.

In the meantime fall back on %jx and an (intmax_t) cast.   Ugly.
(And the way it is written is even uglier...)
2017-03-20 22:57:04 +00:00
kre 6c932ac002 Third time lucky...
Why is there no PRI[xd]OFF ?   How are off_t's intended to be printed?

If a PRIxOFF gets added in some appropriate place, the XXX lines in this
commit can go away.

(I understand not having PRI[xd]VOFF).
2017-03-20 10:44:24 +00:00
kre dfa5db8c47 One more (should have noticed last time) and this time fix the
format the way it should have been fixed, not just what was easiest...
2017-03-20 07:31:28 +00:00
kre 48097a21f3 Perhaps fix printf format for KASSERTMSG (unbreak i386 build maybe).
This can be revisited by anyone who wants to do things better...
2017-03-20 04:35:04 +00:00
riastradh 291c0f694c #if DIAGNOSTIC panic ---> KASSERT 2017-03-19 23:47:46 +00:00
rmind 657547e6a2 ubc_alloc: perform pmap_update() in the error path as we might have
removed the mapping.
2015-05-27 19:43:40 +00:00
matt 42253a3174 Don't nest structure definitions. 2014-09-05 09:24:21 +00:00
riastradh 4fa7e6020b Initialize ubchist earlier. 2014-07-07 20:14:43 +00:00
martin c9e83a001d Mark a diagnostic-only variable 2013-10-25 20:22:55 +00:00
jym 325494fe33 Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html
2011-09-27 01:02:33 +00:00