NetBSD

Author	SHA1	Message	Date
ad	a0d1fd8d0c	It's not a good idea for device drivers to modify b_flags, as they don't need to understand the locking around that field. Instead of setting B_ERROR, set b_error instead. b_error is 'owned' by whoever completes the I/O request.	2007-07-29 13:31:07 +00:00
rmind	20bbb87e34	Implementation of per-CPU work-queues support for workqueue(9) interface. WQ_PERCPU flag for workqueue and additional argument for workqueue_enqueue() to assign a CPU might be used. Notes: - For now, the list is used for workqueue_queue, which is non-optimal, and will be changed with array, where index would be CPU ID. - The data structures should be changed to be cache-friendly. Reviewed by: <yamt>, <tech-kern>	2007-07-12 20:39:56 +00:00
pooka	835b0326c5	Using POOL_INIT here makes no sense, since file systems always have an init method. So get rid of it and #ifdef _LKM and just always init in the init method. Give malloc types the same treatment. Makes file systems nicer to work with in linksetless environments and fixes a few LKM discrepancies.	2007-06-30 09:37:53 +00:00
perseant	9234ba6fd8	Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately. This prevents a problem in which processes could get stuck in "buffers" sleep forever.	2007-05-16 19:11:37 +00:00
perseant	9be0ebd9da	Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The default is "on", i.e., ignore lazy sync. Reduce the amount of polling/busy-waiting done by lfs_putpages(). To accomplish this, copied genfs_putpages() and modified it to indicate which page it was that caused it to return with EDEADLK. fsync()/fdatasync() should no longer ever fail with EAGAIN, and should not consume huge quantities of cpu. Also, try to make dirops less likely to be written as the result of a VOP_PUTPAGES(), while ensuring that they are written regularly.	2007-04-17 01:16:46 +00:00
christos	53524e44ef	Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.	2007-03-04 05:59:00 +00:00
perseant	d1d9b558a7	Reverse the order of searching the vnode list in lfs_writevnodes(). This should speed up e.g. "chown -R" on LFS filesystems; e.g. it shows a 100% increase in the 'seq_stat' column of bonnie++.	2007-02-23 23:16:03 +00:00
yamt	8bf7662829	merge yamt-splraiseipl branch. - finish implementing splraiseipl (and makeiplcookie). http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html - complete workqueue(9) and fix its ipl problem, which is reported to cause audio skipping. - fix netbt (at least compilation problems) for some ports. - fix PR/33218.	2006-12-21 15:55:21 +00:00
christos	168cd830d2	__unused removal on arguments; approved by core.	2006-11-16 01:32:37 +00:00
reinoud	0ce809091d	Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all vnodes were synced and processed backwards. This meant that the last accessed node was processed first and the earlierst last. An extra benefit is the removal of the ugly hack from the Berkly days on LFS. In the proces, i've also replaced the various variations hand written loops by the TAILQ_FOREACH() macro's.	2006-10-20 18:58:12 +00:00
christos	4d595fd7b1	- sprinkle __unused on function decls. - fix a couple of unused bugs - no more -Wno-unused for i386	2006-10-12 01:30:41 +00:00
christos	b64edcaded	fix empty if	2006-10-04 15:53:24 +00:00
perseant	2ac2813b6e	Use lockstatus instead of a homebrewed locking system to control LFCNWRAPSTOP and LFCNWRAPGO. Be less verbose about the various looping checks: use log() rather than printf(), and only log anything if we are really looping ("count = 2" is not an error condition). Allow dirops sleeping on available space to be interruptible.	2006-09-28 23:08:23 +00:00
christos	0dc26f6dcb	remove impossible test	2006-09-02 06:46:04 +00:00
perseant	437e855235	Changes to help the roll-forward agent, to wit: * Mark being-deleted files in the Ifile so we can finish deleting them at fs mount time. * Flag the Ifile with "cleaner must clean" when writers are waiting for the cleaner, rather than relying solely on the cleaner's estimation of whether it should clean or not. * Note partial segments written by a user agent (in particular, fsck_lfs) so that repeated rolls forward don't interfere with one another. * Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once, for better testing of the validity of checkpoints. * Keep track of the on-disk nlink count when cleaning, so that we don't partially complete directory operations while cleaning. * Ensure that every single Ifile inode write represents a consistent view of the filesystem. In particular, the accounting for the segment we are writing the inode into must be correct, and the accounting for the segment that inode used to reside in must be correct. Rather than just rewriting the inode if we wrote it wrong, rewrite the necessary ifile blocks before writing the inode so we never write it wrong. * Don't unmark any VDIROP vnodes if we haven't written them to disk, avoiding yet another problem with the "wait for the cleaner" error return from lfs_putpages(). Also, move the last callback to an aiodone call, so we no longer do any memory management from interrupt context.	2006-09-01 19:41:28 +00:00
perseant	20227e112e	Note partial segments that are written by the cleaner, to help out the roll-forward agent.	2006-07-20 23:16:50 +00:00
perseant	186ffd50ab	Loop on the check for lfs_nowrap, so we don't allow a process to squeeze by.	2006-07-20 23:15:39 +00:00
perseant	8c161d1081	Don't try to write all the vnodes, when the cleaner needs a vnode to be recycled.	2006-07-20 23:12:26 +00:00
perseant	b99e4c8268	Don't wake up the cleaner if the filesystem is unwrappable, and fix the compatibility fcntls. Also includes one-line fixes for an MP locking bug and a zero-length FINFO problem that manifested during testing.	2006-06-29 19:28:21 +00:00
perseant	1c57171fe3	Change LFCNWRAP{STOP,GO} to make them more suitable for snapshotting; in particular, the caller can now choose whether to wait for the condition to be met, and if the caller of LFCNWRAPSTOP dies or otherwise closes the descriptor, the filesystem is started again. Updated the ckckp regression test to use the new semantics. dump_lfs(8) now uses the fcntls to implement LFS-style snapshotting through the -X flag, addressing PR#33457 albeit not using fss(4). Fixed a couple other problems with dump_lfs that manifested themselves during testing.	2006-06-24 05:28:54 +00:00
yamt	e408053d1b	fix a simonb-timecounters regression. the precision of getnanotime() is not suitable for file timestamps. esp. when it's nfs-exported. - introduce vfs_timestamp(). (the name is from freebsd. currently merely a wrapper of nanotime()) - for ufs-like filesystems, use it rather than getnanotime(). XXX check other filesystems.	2006-06-23 14:13:02 +00:00
kardel	de4337ab21	merge FreeBSD timecounters from branch simonb-timecounters - struct timeval time is gone time.tv_sec -> time_second - struct timeval mono_time is gone mono_time.tv_sec -> time_uptime - access to time via {get,}{micro,nano,bin}time() get* versions are fast but less precise - support NTP nanokernel implementation (NTP API 4) - further reading: Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html	2006-06-07 22:33:33 +00:00
perseant	0e0bb04d7a	Fix a bug in which FINFOs were written with a version number of zero. Add assertions and add this to the DEBUG fip test in lfs_writeseg.	2006-05-20 01:10:18 +00:00
perseant	6e53d31f5c	Break out the finfo array manipulation code into two new functions, lfs_acquire_finfo() and lfs_release_finfo(). Add a debugging check for zero-length finfo arrays in the segment summary to avoid future regressions.	2006-05-18 23:15:09 +00:00
elad	fc9422c9d9	integrate kauth.	2006-05-14 21:31:52 +00:00
perseant	285f68c114	Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when many inodes are cleaned at once. Make sure that we write all the pages on vnodes that are being flushed, even if we don't think there's room; drain v_numoutput before lfs_vflush() completes. Also, don't allow a vnode that is in the process of being cleaned to be chosen by getnewvnode(); this avoids a segment accounting panic in the case that a large number of inodes are fed to lfs_markv() all at once.	2006-05-12 23:36:11 +00:00
perseant	8696fd25e2	Don't ever partially write dirops, even if we need the cleaner to run. This increases the chances of the "no clean segments" panic slightly, but allows us to run the ckckp regression test successfully to completion.	2006-05-01 19:47:29 +00:00
perseant	481da54fc1	Postpone the segment accounting changes coming from truncation until the inode that makes those changes valid is either written to disk by lfs_writeinode() or discarded by lfs_vfree(). A couple of locking fixes are also included as well.	2006-04-30 21:19:42 +00:00
perseant	7cd0266a27	Regression test improvements: Move the stop for LFCNWRAPSTOP to the point at which writing at segment 0 is really about to commence, since this is what the test expects (and incidentally what a snapshotting utility wants as well). More correctly reconstruct the on-disk state at every checkpoint, rather than relying on the entire state at the point of wrapping to be accurate (that is only true the first time we wrap). Add a "make abort" target to make rerunning the test more convenient when it has failed and we're done analyzing the failure.	2006-04-22 00:10:54 +00:00
perseant	0268059112	Introduce two fcntl calls that freeze the filesystem right at the point where segment 0 is being considered for writing. This allows for automated checkpoint vailidity scanning, and could be used (in conjunction with the existing LFCNREWIND) for e.g. snapshot dumps as well. Include a regression test that does such scanning. When writing the Ifile, loop through the dirty block list three times to make sure that the checkpoint is always consistent (the first and second times the Ifile blocks can cross a segment boundary; not so the third time unless the segments are very small). Discovered by using the aforementioned regression test.	2006-04-17 20:02:34 +00:00
perseant	81ded5df65	Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING explicitly (especially since we didn't know about VFREEING at all before), but notice the EBUSY return from vget() instead. Fix some more MP locking protocol issues, most of which were pointed out by Christian Ehrhardt this morning on tech-kern.	2006-04-13 23:46:28 +00:00
perseant	7c22dcc8a6	Several minor bug fixes: * Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages. * Keep IN_MODIFIED set if we run out of avail in lfs_putpages. * Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found while running with an LFS root. * Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for something the pagedaemon is relying on.	2006-04-07 23:59:28 +00:00
perseant	dddf5c5171	Improvements to LFS's paging mechanism, to wit: * Acknowledge that sometimes there are more dirty pages to be written to disk than clean segments. When we reach the danger line, lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if it holds the segment lock, drops it and waits for the cleaner to make room before continuing. * Note and avoid a three-way deadlock in lfs_putpages (a writer holding a page busy blocks on the cleaner while the cleaner blocks on the segment lock while lfs_putpages blocks on the page).	2006-03-24 20:05:32 +00:00
tls	a67eab5ee4	From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org: We were returning the wrong value for free space. Now we're not.	2006-03-17 23:21:01 +00:00
yamt	690d424f28	- add simple functions to allocate/free a buffer for i/o. - make bufpool static.	2006-01-04 10:13:05 +00:00
christos	95e1ffb156	merge ktrace-lwp.	2005-12-11 12:16:03 +00:00
yamt	6138b82a56	always use nanotime rather than time. it's bad to mix nanotime and time because it sometimes make timestamps go backwards.	2005-09-26 13:52:20 +00:00
christos	a12024da06	Use nanotime() to update the time fields in filesystems. Convert the code from macros to real functions. Original patch and review from chuq. Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not have enough precision for all fields, so this is not very useful for those two.	2005-09-12 16:24:41 +00:00
christos	50f8955b6e	64 bit inode changes.	2005-08-19 02:04:03 +00:00
christos	273df63602	- sprinkle const - avoid shadow variables.	2005-05-29 21:25:24 +00:00
perseant	2f695b5476	Provide a resize_lfs(8), including kernel and cleaner support. The current implementation requires the fs to be mounted while resizing. Tested in both directions, and everything appears to work happily, but ymmv.	2005-04-23 19:47:51 +00:00
perseant	f4a7694fc9	Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through lfs_balloc(), and use that to estimate the number of dirty pages belonging to LFS (subsystem or filesystem). This is almost certainly wrong for the case of a large mmap()ed region, but the accounting is tighter than what we had before, and performs much better in the typical case of pages dirtied through write().	2005-04-19 20:59:05 +00:00
perseant	f63fa194c2	Check the to-be-on-disk consistency of directories as well (correct a typo in an earlier commit).	2005-04-18 23:03:08 +00:00
perseant	2ee78c4fa9	Keep track of the highest block held by an LFS inode, so that we can be assured that the last byte of a file is always allocated. Previously a file extension could cause the filesystem to be flushed, writing an inconsistent inode to disk. Although this condition would be corrected the next time blocks were written to disk, an intervening crash would leave the filesystem in an inconsistent state, leaving fsck_lfs to complain of an inode "partially truncated".	2005-04-14 00:02:46 +00:00
perseant	1ebfc508b6	Protect various per-fs structures with fs->lfs_interlock simple_lock, to improve behavior in the multiprocessor case. Add debugging segment-lock assertion statements.	2005-04-01 21:59:46 +00:00
perseant	eefd94b8e2	Straighten out the maze of ifdefs. Instead, consolidate all the debugging stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS statistics in sysctl, while I'm there. A bit of a rototill.	2005-03-08 00:18:19 +00:00
perry	bcfcddbac1	nuke trailing whitespace	2005-02-26 22:31:44 +00:00
perseant	25f49c3c91	Various minor LFS improvements: * Note when lfs_putpages(9) thinks it is not going to be writing any pages before calling genfs_putpages(9). This prevents a situation in which blocks can be queued for writing without a segment header. * Correct computation of NRESERVE(), though it is still a gross overestimate in most cases. Note that if NRESERVE() is too high, it may be impossible to create files on the filesystem. We catch this case on filesystem mount and refuse to mount r/w. * Allow filesystems to be mounted whose block size is == MAXBSIZE. * Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN entries in indirect blocks again, triggering a failed assertion "daddr <= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct this. * Add a high-water mark for the number of dirty pages any given LFS can hold before triggering a flush. This is settable by sysctl, but off (zero) by default. * Be more careful about the MAX_BYTES and MAX_BUFS computations so we shouldn't see "please increase to at least zero" messages. * Note that VBLK and VCHR vnodes can have nonzero values in di_db[0] even though their v_size == 0. Don't panic when we see this. * Change lfs_bfree to a signed quantity. The manner in which it is processed before being passed to the cleaner means that sometimes it may drop below zero, and the cleaner must be aware of this. * Never report bfree < 0 (or higher than lfs_dsize) through lfs_statvfs(9). This prevents df(1) from ever telling us that our full filesystems have 16TB free. * Account space allocated through lfs_balloc(9) that does not have associated buffer headers, so that the pagedaemon doesn't run us out of segments. * Return ENOSPC from lfs_balloc(9) when bfree drops to zero. * Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being unmounted. Because vfs_busy() is a shared lock, and lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be holding the lock that umount() is blocking on, then try to vfs_busy() again in getnewvnode().	2005-02-26 05:40:42 +00:00
yamt	22399b45d0	change some members of struct buf from long to int. ride on 2.0H.	2004-09-18 16:40:11 +00:00
mycroft	bc25b30608	Add a new flag, IN_MODIFY. This is like IN_UPDATE\|IN_CHANGE, but unlike setting those flags, it does not cause the inode to be written in the periodic sync. This is used for writes to special files (devices and named pipes) and FIFOs. Do not preemptively sync updates to access times and modification times. They are now updated in the inode only opportunistically, or when the file or device is closed. (Really, it should be delayed beyond close, but this is enough to help substantially with device nodes.) And the most amusing part: Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be synced. In ext2fs, it was causing the metadata to not be synced. We now only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed that we do in fact trickle correctly now.	2004-08-14 01:08:02 +00:00

1 2 3 4 5

203 Commits