NetBSD

Commit Graph

Author	SHA1	Message	Date
perry	a2cd732268	Remove leading __ from __(const\|inline\|signed\|volatile) -- it is obsolete.	2005-12-24 19:12:23 +00:00
christos	95e1ffb156	merge ktrace-lwp.	2005-12-11 12:16:03 +00:00
atatat	420d91208b	Properly fix the constipated lossage wrt -Wcast-qual and the sysctl code. I know it's not the prettiest code, but it seems to work rather well in spite of itself.	2005-06-09 02:19:59 +00:00
christos	efb6943313	- add const. - remove unnecessary casts. - add __UNCONST casts and mark them with XXXUNCONST as necessary.	2005-05-29 22:24:14 +00:00
yamt	6b2d8b66a4	merge yamt-km branch. - don't use managed mappings/backing objects for wired memory allocations. save some resources like pv_entry. also fix (most of) PR/27030. - simplify kernel memory management API. - simplify pmap bootstrap of some ports. - some related cleanups.	2005-04-01 11:59:21 +00:00
chs	c92634930b	fix validation of new values when setting vm.{hi,low}water. fixes PR 29651.	2005-03-31 02:34:10 +00:00
perry	da8abec863	nuke trailing whitespace	2005-02-26 21:34:55 +00:00
tls	abcbeb46d9	Users have observed that the amount of memory used by the metadata cache can in some situations exceed the high-water mark, and stay there once it gets there. Adjust the canrelease function so that it will immediately bring us back down to the high-water mark in this situation. How can this happen at all? Consider a machine with two filesystems, one with a much larger blocksize than the other. If the small-block filesystem is very busy, growing the cache up to the high-water mark, and then the large-block filesystem becomes busy, buffers will be recycled (since we are at the high-water mark) but _grow each time they're recycled_. Once we're above the high-water mark, the canrelease call in allocbuf (without this change) doesn't shrink us back down below it; so things get worse and worse.	2005-01-10 15:29:50 +00:00
dbj	b2c6a6a4ea	also define bioops if FFS is not defined.	2004-12-23 20:11:28 +00:00
jrf	ac053b9ef0	Fix previous commit, got bufcache and bufmem messages reversed.	2004-12-05 06:12:54 +00:00
jrf	cd5b5ced44	Change sysctl -d vm.bufcache to say percent of physical memory not kernel memory. Addresses PR misc/27233. Approved by atatat@netbsd.org.	2004-12-05 06:00:20 +00:00
christos	dfa8d84485	PR/25749: Peter Postma: Missing splx() in kernel.	2004-11-13 19:16:18 +00:00
enami	4fa5fd9ed6	- Testing low memory condition to see if we should alloc or not doesn't make sense, since 1) the condition is quite normal condition and 2) there is pool between us and uvm. - Make the step of allocation possibility a bit seamless by moving the origin of curve from 0 to lowater mark. Simon told that this helps for interactive performance when there is heavy disk activity in PR#27057.	2004-10-04 01:24:18 +00:00
enami	51718e92ee	Factor out code to set watermark and ensure high > low.	2004-10-04 00:46:05 +00:00
enami	682c3c9443	- Don't let pagedaemon sleep while draining buf. - Estimate amount of memory to free at a time. Address PR#27057 (and similar hangs I saw several months ago).	2004-10-03 08:47:48 +00:00
enami	ba25820566	x > 15 is always false if x is 0 .. 15. # XXX: testing free memory here is quite doubtful. also, I guess lowater # XXX: is better than 0 as origin.	2004-10-03 08:30:09 +00:00
enami	778d21de43	Cheap test first.	2004-10-03 08:17:54 +00:00
yamt	3362d4ed5b	fix allocbuf() O(n**2) behaviour where n is number of AGE buffers by always tracking amount of buffers on a queue. bump to 2.0H.	2004-09-18 16:37:12 +00:00
yamt	2bf1a4ef17	- add missing function prototypes. - fix prototype mismatches.	2004-09-18 16:01:03 +00:00
yamt	d08391b2a3	buf_trim: a buffer grabbed by getnewbuf() should be clean and anonymous. thus, there's no need to check and handle B_WANTED here.	2004-09-08 10:20:15 +00:00
hannken	7a5be5a9ff	- Add flag L_COWINPROGRESS to struct lwp to avoid recursion when doing copy-on-write. - Change VFS_SNAPSHOT() to return the snapshot vnode locked. - Make the IO path for copy-on-write and snapshot-read more lightweight. Avoids deadlocks where vn_rdwr(...READ...) has a shared lock and needs to copy-on-write. Avoids deadlocks/panics where to clean pages the copy-on-write needs to allocate pages for its VOP_PUTPAGES(). L_COWINPROGRESS part approved by: Jason R. Thorpe <thorpej@netbsd.org>	2004-06-20 18:55:58 +00:00
thorpej	3183ea47c2	When initializing the buffer cache memory pools where the size <= PAGE_SIZE, also use the standard allocator on systems that use a direct-mapped memory segment for mapping pool pages.	2004-06-20 18:29:47 +00:00
thorpej	bbbb3183d6	Don't use PR_IMMEDRELEASE on buffer cache pools. Instead, set a high water mark of 1, which will have the same effect. Pointed out back in January by YAMAMOTO Takashi.	2004-06-20 18:17:09 +00:00
atatat	5b22e79ada	Remaining sysctl descriptions under kern subtree	2004-05-25 04:30:32 +00:00
yamt	ab195ed32f	bio_doread: vp is always non-NULL here.	2004-04-25 12:41:12 +00:00
christos	6bd1d6d4db	Replace the statfs() family of system calls with statvfs(). Retain binary compatibility.	2004-04-21 01:05:31 +00:00
simonb	1c13fd358f	Give buf_lotsfree() a bit of a service: - Fix a 32-bit overflow that could erroneously return true even if the currently allocated buffer memory was greater than the high water mark. - Add an early check for bufmem > hiwater to avoid a needless call to random(). - Sprinkle some comments. Add a vm.bufmem sysctl so the current bufmem value can be easily queried from userland. Reviewed by Thor Simon.	2004-03-26 00:31:55 +00:00
simonb	07056cd3d1	More white space nits.	2004-03-25 23:17:16 +00:00
simonb	c67d420cbf	White-space nit.	2004-03-25 08:22:31 +00:00
atatat	19af35fd0d	Tango on sysctl_createv() and flags. The flags have all been renamed, and sysctl_createv() now uses more arguments.	2004-03-24 15:34:46 +00:00
dan	5819919614	micro-optimisation - if we're going to return 0, do so before doing other unnecessary work	2004-02-22 01:00:41 +00:00
atatat	caea20e952	Add PTRTOUINT64() and UINT64TOPTR() macros to sys/sysctl.h for use by kern.proc, kern.proc2, kern.lwp, and kern.buf. Define more MIB for kern.buf so that specific buffers can be selected (only all/all is supported right now), and use a 32/64 bit agnostic structure for communcating buffer information to userland. Convert systat to the new kern.buf method. Clean up the vm.buf* handling a little. There's no actual need to record the dynamically assigned OIDs, since sysctl_data can tell us what we're looking at. Oh, and fix a typo in a comment.	2004-02-19 03:56:30 +00:00
yamt	0e9e078e22	- raise ipl when calling buf_canrelease() because it traverses buffer queue. - correct/add comments on buf_canrelease().	2004-02-16 09:34:15 +00:00
tls	eb9b96577c	Fix bug noted by yamt@netbsd.org: the UVM free target is in pages, so the last change has us comparing pages to bytes instead of pages to buffers! The consequence was to try to free radically less memory than UVM wanted us to -- though always at least one buffer, which is probably why the results weren't dire. This does suggest that buf_canrelease() could be a lot more conservative about how much to release than "2 * page deficit". In fact, serious trouble seems to ensue if it's not -- when anything else on the system demands enough pages, we slam down to the low water mark nd stay there. I've adjusted it to use min(page defecit, buffer memory / 16), which still isn't quite right but seems better. Another change: consider the case of an infinite loop that does "tar xzf pkgsrc.tar.gz ; rm -rf pkgsrc". Each time the rm runs, all the dead metadata will go on the AGE list -- and, until we hit the high-water mark, stay there, at which point it may be slowly recycled. Two adjustments seem to solve this: 1) whack buf_lotsfree() to return 0 if there's anything on the AGE list; 2) whack buf_canrelease() to count the memory used by the AGE list and always return at least that much. This basically turns the AGE list into a "delayed free" list, since we can't entirely eliminate it as we can't free pool items from interrupt context (e.g. from biodone()). To consider: with the bookkeeping corrected, should buf_drain() move back to the _end_ of the pagedaemon, and should the calculation then try to give back at least the current defecit?	2004-02-11 17:36:31 +00:00
tls	aeaf748ff2	Buffer cache fixes to avoid thrashing between high and low water marks and uncontrolled growth. The key fix is from Dan Carasone, who noticed that buf_canfree() was counting in _bytes_ but freeing in _buffers_, which caused the instant drop to lowater observed by some users. We now control the rate of growth; the probability of getting a new allocation is inversely proportional to the current size of the cache. This idea is from a long-ago conversation with Kirk McKusick and, if memory serves, was used for the file-system cache in some other BSD variant at some point in history. With growth and shrinkage more or less dealt with, we return the default maximum cache size to 15%. The default _minimum_ cache size is raised from 1/16 of the maximum cache size to 1/8, since 1/16 was chosen when the maximum size was 30% of memory. Finally, after observing the behaviour of the pagedaemon and the buffer cache drainer under pathological workloads (e.g. a benchmark that steps through 75% of available memory backwards) I have moved the call to buf_drain() to the beginning of the pagedaemon from the end; if the pagedaemon bogs down, it still won't get run as often as it should, but at least this way it will see the state of the free count and free target _before_ the scan step does its thing.	2004-01-30 11:32:16 +00:00
dan	c6ba3edf9d	Reduce the default BUFCACHE to 10% for now. Too many users are tripping over this getting too large, and suffering other performance problems due to the lack of good backpressure shrinking the bufcache when other memory is required. Again, this tunable should be revisited when the backpressure mechanism has been improved. sysctl vm.bufcache can be used to manually tune those rare machines that might need more than this. See comments in rev 1.106 for more detail.	2004-01-27 11:35:23 +00:00
hannken	b1cb363c11	Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern. VOP_STRATEGY(bp) is replaced by one of two new functions: - VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp. - DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp. DEV_STRATEGY(bp) is used only for block-to-block device situations.	2004-01-25 18:02:03 +00:00
yamt	ce0a402d3c	bufpool_page_alloc: for no-wait allocations, specify UVM_KMF_TRYLOCK as well.	2004-01-19 11:57:42 +00:00
enami	9e2ac76ac4	Obviously, sizeof(u_int) is not enough to copy struct buf. Prevents ``sysctl -a'' from dumping core.	2004-01-15 09:03:26 +00:00
yamt	8c55727694	reset i/o priority in geteblk() as well.	2004-01-10 14:43:05 +00:00
yamt	7266a95907	store a i/o priority hint in struct buf for buffer queue discipline.	2004-01-10 14:39:50 +00:00
thorpej	4aeba6790d	Initialize buffer pools with PR_IMMEDRELEASE. Don't use pool_reclaim() on those pools; it is no longer necessary.	2004-01-09 19:01:01 +00:00
tls	e4758a97ae	Change BUFCACHE (default hard limit on physmem consumption by metadata cache) from 30% to 20%. This seems to significantly smooth the oscillation between "almost no memory available" and "UVM free target available" caused by the current sudden, heavy backpressure on the metadata cache. We should revisit this again once the backpressure mechanism is better tuned; ideally, the hard limit should almost never come into play, because the metadata cache should gradually give back pages as buffers hit the AGE list and as the page cache demands them, rather than giving back a big slug of pages all at once when UVM decides it's in a hurry and fires off the page daemon. Just how well this adjustment works is likely to vary significantly from machine to machine depending on I/O mix, filesystem frag size, and total memory. However, 20% seems to be quite a bit better than 30% on several systems I've tested and is, coincidentally, more than enough to cache the entire metadata working set of the AnonCVS server with 100 clients, which is a useful worst-case stake in the ground...	2004-01-09 06:26:15 +00:00
tls	28364b01be	Add pool_reclaim() on pool to which we just pool_put() a buffer in buf_mrelease(). Without this, though the pages are returned to the relevant pool, they are never available for any other use in the system. Now the backpressure on the physical size of the buffer cache through the buf_drain() call in the pagedaemon works correctly. If anything, it may be a bit more aggressive than intended. On my 256MB system, with vm.bufcache set to the default 30% of physmem, a kernel with this fix can do 5 simultaneous config/makedep/builds of different NetBSD kernels in 1313 seconds; with the "traditional" buffer cache code it requires 1320 seconds. Running "find / -type d -exec ls -l {}" while the build is going demonstrates that the backpressure is working correctly: free memory oscillates slowly between close to none and the UVM target free, and vmstat -m shows a large number of releases for the buffer pools. For future work: how is "bufpl" memory returned to the system? This is not obvious to me (I must be looking in the wrong place). Also, buf_mrelease() is also called from brelse() in some cases. Would it be better to add a pool flag causing automatic release of full pages as they become available (not fragmented)? Jason Thorpe proposed this and it seems more elegant than cleaning the _entire_ pool only upon memory pressure. Greg Oster did a lot of the work of figuring this out. Jason proposed the use of pool_reclaim as a way to fix it.	2004-01-08 23:41:14 +00:00
atatat	5efc584023	Expose the buf_map symbol so that pmap(1) can find it. Split the sysctl setup routine into two routines, one for each "subtree". Perhaps it's a little pedantic, but it's cleaner. Also, assert that the "kern" and "vm" nodes exist.	2004-01-06 13:51:09 +00:00
pk	90cc172b86	bufpool_page_free: pass `buf_map' to uvm_km_free().	2004-01-04 16:17:13 +00:00
pk	dc6d5d0dd1	getnewbuf: return buffer locked.	2003-12-31 14:37:17 +00:00
thorpej	7e958083b1	Consistently use ANSI-style function decls.	2003-12-30 20:40:39 +00:00
pk	70f20a1217	Replace the traditional buffer memory management -- based on fixed per buffer virtual memory reservation and a private pool of memory pages -- by a scheme based on memory pools. This allows better utilization of memory because buffers can now be allocated with a granularity finer than the system's native page size (useful for filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation of virtual to physical memory mappings (due to the former fixed virtual address reservation) resulting in better utilization of MMU resources on some platforms. Finally, the scheme is more flexible by allowing run-time decisions on the amount of memory to be used for buffers. On the other hand, the effectiveness of the LRU queue for buffer recycling may be somewhat reduced compared to the traditional method since, due to the nature of the pool based memory allocation, the actual least recently used buffer may release its memory to a pool different from the one needed by a newly allocated buffer. However, this effect will kick in only if the system is under memory pressure.	2003-12-30 12:33:13 +00:00
dbj	076b9a1a1e	when ifdef DEBUG and debug_verify_freelist != 0 then perform an expensive search of the buffer freelists in brelse and bremfree to verify consistency	2003-12-02 04:18:19 +00:00

1 2 3

148 Commits