Commit Graph

3760 Commits

Author SHA1 Message Date
jdolecek 4d49760268 use the new NOTE_SUBMIT to flag if the locking is necessary
for EVFILT_READ/EVFILT_WRITE knotes

fixes PR kern/23915 by Martin Husemann (pipes), and similar locking problem
in tty code
2004-02-22 17:51:25 +00:00
jdolecek 2ae728e7ef mount(2): if vinvalbuf() fails, we must also vput() the mountpoint vnode
fixes stale vnode lock after attempt to mount something on a NTFS directory
2004-02-22 09:56:26 +00:00
dan 5819919614 micro-optimisation - if we're going to return 0, do so before doing
other unnecessary work
2004-02-22 01:00:41 +00:00
enami 06107df871 Modify pool page header allocation strategy as follows:
In addition to current one (i.e., don't wast so large part of the page),
- if the header fitsin the page without wasting any items, put it there.
- don't put the header in the page if it may consume rather big item.

For example, on i386, header is now allocated in the page for the pools
like fdescpl or sigapl, and allocated off the page for the pools like
buf1k or buf2k.
2004-02-22 00:19:48 +00:00
atatat 56392ab40b Use KERN_PROCSLOP for struct kinfo_proc and KERN_LWPSLOP for
struct kinfo_lwp, and not vice versa.

Should solve the issue with top dying because it's unable to "allocate
memory".
2004-02-21 03:27:57 +00:00
atatat 42d379d041 Use new PTRTOUINT64() macro instead of local PTRTOINT64() macro. 2004-02-19 03:57:56 +00:00
atatat caea20e952 Add PTRTOUINT64() and UINT64TOPTR() macros to sys/sysctl.h for use by
kern.proc, kern.proc2, kern.lwp, and kern.buf.

Define more MIB for kern.buf so that specific buffers can be selected
(only all/all is supported right now), and use a 32/64 bit agnostic
structure for communcating buffer information to userland.

Convert systat to the new kern.buf method.

Clean up the vm.buf* handling a little.  There's no actual need to
record the dynamically assigned OIDs, since sysctl_data can tell us
what we're looking at.

Oh, and fix a typo in a comment.
2004-02-19 03:56:30 +00:00
matt cb57c6f8e9 Move detection of a special symbol into a separate function. Add some more
special symbols.
2004-02-19 03:42:01 +00:00
matt f01501b2c6 Support really large LKMs. Find out how much space is needed for symbols
and then allocate it on demand.  Rename some common symbols (__bss_start,
_edata, _end, __start_link_set_*, __stop_link_set_*) so that ".<module>"
is appended to them.  This shrinks an amd64 kernel by 20KB of BSS.
2004-02-18 23:44:49 +00:00
matt 004f0d503a s/sumbols/symbols/ 2004-02-18 20:41:09 +00:00
hannken c59d4851b8 Run pmap_deactivate() earlier in exit1(). Prevents a panic on sparc MP
where p->p_vmspace was 0xdeadbeef in pmap_deactivate().

Approved by: YAMAMOTO Takashi <yamt@netbsd.org>
2004-02-18 14:42:20 +00:00
simonb d7ee872c5f Don't shadow a function name with a parameter. 2004-02-17 11:36:01 +00:00
tron 7008209ace Include "sys/systm.h" to get the prototype for panic() which is required
for diagnostic kernels.
2004-02-17 08:22:12 +00:00
rtr 8845b1e975 split off the evcnt code (which is unrelated to autoconfiguration)
into a separate file

approved by simonb@
2004-02-17 05:03:15 +00:00
enami 456851e71a Some whitespace fix. 2004-02-17 01:45:34 +00:00
enami d59c88c291 The vnode capability id is gone. 2004-02-17 01:35:33 +00:00
enami 6a268a570b Rewind the `bp' advanced backward by cache_revlookup() if getcwd_getcache()
finally returns cache miss.

# Slightly modified from posted version so that it is cleanly patchable
# at least on 1.6 branch.
2004-02-17 01:29:39 +00:00
yamt 0e9e078e22 - raise ipl when calling buf_canrelease() because it traverses buffer queue.
- correct/add comments on buf_canrelease().
2004-02-16 09:34:15 +00:00
jdolecek 159f41eca4 allocate wired memory for the marker kevent in kqueue_scan() instead
of using on-stack memory, so that this wouldn't eventually cause kernel
panic if the process get swapped out and another process runs kqueue_scan()
problem pointed out in kern/24220 by Stephan Uphoff
2004-02-14 11:56:28 +00:00
hannken 142e9d5deb Add a generic copy-on-write hook to add/remove functions that will be
called with every buffer written through spec_strategy().

Used by fss(4). Future file-system-internal snapshots will need them too.

Welcome to 1.6ZK

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
2004-02-14 00:00:56 +00:00
wiz d20841bb64 Uppercase CPU, plural is CPUs. 2004-02-13 11:36:08 +00:00
enami 7ff66821f4 Also defer the writing of KTR_EMUL entry. Otherwise, the parent process
may sleep with setting KTRFAC_ACTIVE of child process and the child will
run without emitting any ktrace entry.
2004-02-12 23:47:21 +00:00
tls eb9b96577c Fix bug noted by yamt@netbsd.org: the UVM free target is in *pages*,
so the last change has us comparing pages to bytes instead of pages
to buffers!  The consequence was to try to free radically less memory
than UVM wanted us to -- though always at least one buffer, which is
probably why the results weren't dire.

This does suggest that buf_canrelease() could be a *lot* more
conservative about how much to release than "2 * page deficit".  In
fact, serious trouble seems to ensue if it's not -- when anything
else on the system demands enough pages, we slam down to the low
water mark nd stay there.  I've adjusted it to use min(page defecit,
buffer memory / 16), which still isn't quite right but seems better.

Another change: consider the case of an infinite loop that does
"tar xzf pkgsrc.tar.gz ; rm -rf pkgsrc".  Each time the rm runs,
all the dead metadata will go on the AGE list -- and, until we hit
the high-water mark, stay there, at which point it may be slowly
recycled.  Two adjustments seem to solve this:  1) whack buf_lotsfree()
to return 0 if there's anything on the AGE list; 2) whack buf_canrelease()
to count the memory used by the AGE list and always return at least
that much.

This basically turns the AGE list into a "delayed free" list, since we
can't entirely eliminate it as we can't free pool items from interrupt
context (e.g. from biodone()).

To consider: with the bookkeeping corrected, should buf_drain() move
back to the _end_ of the pagedaemon, and should the calculation then
try to give back at least the current defecit?
2004-02-11 17:36:31 +00:00
yamt 1e18e59746 - borrow vmspace0 in uvm_proc_exit instead of uvmspace_free.
the latter is not a appropriate place to do so and it broke vfork.
- deactivate pmap before calling cpu_exit() to keep a balance of
  pmap_activate/deactivate.
2004-02-09 13:11:21 +00:00
yamt fa47baddee lwp_exit2: grab kernel_lock to preserve locking order. 2004-02-09 13:02:48 +00:00
yamt a45adbd9c7 don't deactivate pmap in exit1 because we'll touch the pmap later.
instead, borrow vmspace0 immediately before destroying the pmap
in uvmspace_free.
2004-02-07 10:05:52 +00:00
christos 13057976a6 include <uvm/uvm_object.h> for the benefit of ports that don't include
it in <machine/pmap.h>
2004-02-06 13:46:27 +00:00
junyoung 48d5030e12 ANSIfy & zap some blank lines. 2004-02-06 08:08:46 +00:00
junyoung 9a410f9ed0 Rename es_check in struct execsw to es_makecmds. 2004-02-06 08:02:58 +00:00
pk f092315b50 pg_delete: re-arrange SESSRELE() calls to allow for better code generation. 2004-02-06 06:59:33 +00:00
pk 7026ce08c8 ioctl TIOCSCTTY: re-arrange SESSHOLD() calls to allow for better code generation. 2004-02-06 06:58:21 +00:00
christos 0283dcd8a6 - Don't use uao_ functions directly; use them through the pgops methods.
- Fix missing reference leak in the error path of shmat() mentioned in
  Full-Disclosure.
2004-02-05 22:28:33 +00:00
christos 6b1b54b981 Don't use uao_reference, directly use the pgops instead. XXX: we should
prolly make all the uao_ functions used in pgops static.
2004-02-05 22:26:52 +00:00
tls aeaf748ff2 Buffer cache fixes to avoid thrashing between high and low water marks
and uncontrolled growth.

The key fix is from Dan Carasone, who noticed that buf_canfree() was
counting in _bytes_ but freeing in _buffers_, which caused the instant
drop to lowater observed by some users.

We now control the rate of growth; the probability of getting a new
allocation is inversely proportional to the current size of the
cache.  This idea is from a long-ago conversation with Kirk McKusick
and, if memory serves, was used for the file-system cache in some
other BSD variant at some point in history.

With growth and shrinkage more or less dealt with, we return the
default maximum cache size to 15%.  The default _minimum_ cache size
is raised from 1/16 of the maximum cache size to 1/8, since 1/16 was
chosen when the maximum size was 30% of memory.

Finally, after observing the behaviour of the pagedaemon and the
buffer cache drainer under pathological workloads (e.g. a benchmark
that steps through 75% of available memory backwards) I have moved
the call to buf_drain() to the beginning of the pagedaemon from the
end; if the pagedaemon bogs down, it still won't get run as often
as it should, but at least this way it will see the state of the
free count and free target _before_ the scan step does its thing.
2004-01-30 11:32:16 +00:00
tsarna 72489e1ea0 uuidgen(2) syscall. Originally from FreeBSD, ported by John Franklin in
PR#23470, with minor updates by me. This is only the syscall support
from that PR, for now.

Changes: port over fix from FreeBSD for multicast address generation.
Changed bcopy to memcpy.  For now, #ifdef notyet the portions of
kern_uuid.c that are meant to be used by (currently nonexistent) other
things in the kernel.  Added syscall to COMPAT_FREEBSD as well, though
that's currently not useful, as any program new enough to use this call
also uses other syscalls we don't (yet) emulate.
2004-01-29 02:00:02 +00:00
dan c6ba3edf9d Reduce the default BUFCACHE to 10% for now. Too many users are
tripping over this getting too large, and suffering other performance
problems due to the lack of good backpressure shrinking the bufcache
when other memory is required.  Again, this tunable should be
revisited when the backpressure mechanism has been improved.

sysctl vm.bufcache can be used to manually tune those rare machines
that might need more than this.

See comments in rev 1.106 for more detail.
2004-01-27 11:35:23 +00:00
hannken 3db4e2acd8 Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.
VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp)  Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp)      Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
2004-01-25 18:06:48 +00:00
hannken d7f6cbf8bc Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern. 2004-01-25 18:02:04 +00:00
hannken b1cb363c11 Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.
VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp)  Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp)      Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
2004-01-25 18:02:03 +00:00
wiz f62661e104 Add semicolons after variable declarations; closes PR 24201. 2004-01-24 01:40:57 +00:00
simonb 2763a4b916 Fix NTP PPSAPI support (enabled with "options PPS_SYNC"):
From PR kern/13702 from Charles Carvalho.  Tested on alpha and
i386 with a Laipac TF10 PPS-capable GPS.  The com.c change was
copied wholesale from Charles' z8530tty.c patch.
2004-01-23 05:01:19 +00:00
atatat 4fe5b245f9 Fix the kern.mbuf tunables. 2004-01-21 02:11:20 +00:00
yamt ce0a402d3c bufpool_page_alloc: for no-wait allocations, specify UVM_KMF_TRYLOCK as well. 2004-01-19 11:57:42 +00:00
atatat 47768f08f2 In sysctl_locate(), use "rnode" like everywhere else, don't call it
"rv".

In sysctl_destroyv(), deal with deleting alias nodes, and pass a token
size_t to sysctl_destroy().

In sysctl_free(), check that "node" has not reached "rnode", not that
"pnode" has.

In sysctl_realloc(), don't bother setting sysctl_clen...the value is
unchanged.
2004-01-17 04:01:14 +00:00
atatat 5d3b89e2f4 Avoid dereferencing l...it might be NULL 2004-01-17 03:33:24 +00:00
yamt 047fc0b378 - fix locking order problem. (pa_slock -> pr_slock)
- protect pr_phtree with pr_slock.
- add some LOCK_ASSERTs.
2004-01-16 12:47:37 +00:00
mrg 23884e8622 clean up a little:
- delete ktrsyscall32()
- add a check #ifdef _LP64 to do the conversion if P_32 is set to the
standard ktrsyscall()
- add a couple of similar _LP64/P_32 checks to the systrace code.

this should get systrace working for 32 bit apps as well as complete
ktrace support for "trace_enter/trace_exit" using platforms such as amd64.

XXX: systrace isn't supported on sparc64 currently... (it doesn't use
trace_enter/trace_exit, or have it's own calls to systrace_xxx()...)
2004-01-16 05:03:02 +00:00
mrg 4c2a3c644a export ktrinitheader() and ktrwrite() for ktrsyscall32(), which is used
to write 32 bit syscall arguments in a 64 bit format.
2004-01-15 14:29:20 +00:00
enami 9e2ac76ac4 Obviously, sizeof(u_int) is not enough to copy struct buf.
Prevents ``sysctl -a'' from dumping core.
2004-01-15 09:03:26 +00:00
yamt 7f20b0c529 bump vnode hold count for page cache as well
to resolve unfairness between page cache and traditional buffer cache.
pointed by enami tsugutomo on current-users@.
2004-01-14 11:28:04 +00:00