int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
Buffers run through copy-on-write are marked B_COWDONE. This condition
is valid until the buffer has run through bwrite() and gets cleared from
biodone().
Welcome to 4.99.39.
Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
(uint8_t instead of int8_t) - this prevents an ugly sign-extension
printing bug as well as formally undefined behavior when you mount an
unclean fs enough times.
From (my own) PR kern/28134; I've been carrying this patch for three
years, long enough to forget about it, and it's had no ill effects in
that time.
reviewed: pooka
group block buffer busy. If filesystem has any active snapshots, bawrite
can come back trying to allocate new snapshot data block from the same
cylinder group and cause deadlock.
From FreeBSD Rev. 1.117
- Instead of hooking the handler on the specdev of a mounted file system
hook directly on the `struct mount'.
- Rename from `vn_cow_*' to `fscow_*' and move to `kern/vfs_trans.c'. Use
`mount_*specific' instead of clobbering `struct mount' or `struct specinfo'.
- Replace the hand-made reader/writer lock with a krwlock.
- Keep `vn_cow_*' functions and mark as obsolete.
- Welcome to NetBSD 4.99.32 - `struct specinfo' changed size.
Reviewed by: Jason Thorpe <thorpej@netbsd.org>
introduce vrele2(), which allows to release vnodes the way lfs
sometimes wants it:
+ without calling inactive
+ inserting the vnode at the head of the freelist (this is a very
questionable optimization that isn't even enabled by default,
but I went along with the same semantics for now)
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
- Always call dqsync() with dq locked.
- Add some assertions to verify the lock held.
- Serialize quotaon()/quotaoff(), dqhashmtx becomes dqlock. From ad@
Reviewed by: Andrew Doran <ad@netbsd.org>
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
- Replace DQ_LOCK/DQ_WANT/sleep/wakeup with a mutex `dq_interlock'. Use this
mutex to protect all quota values and flags.
- Protect the hashtable with a mutex.
- Never update quotas for the quota files on the same file system. Prevents
a deadlock when dqsync() has to change the quota file's size (PR #13942).
Reviewed by: Andrew Doran <ad@netbsd.org>
Bill Stouder-Studenmund <wrstuden@netbsd.org>
WQ_PERCPU flag for workqueue and additional argument for workqueue_enqueue()
to assign a CPU might be used. Notes:
- For now, the list is used for workqueue_queue, which is non-optimal,
and will be changed with array, where index would be CPU ID.
- The data structures should be changed to be cache-friendly.
Reviewed by: <yamt>, <tech-kern>
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().
No objections on tech-kern@
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
the reference counter of the corresponding struct dquot will overflow.
Change the type of the reference counter from u_int16_t to u_int32_t and
add an assertion to check for overflow.
Observed and tested by Edgar Fuß.
Welcome to 4.99.21 (struct dquot and therefore struct inode changed layout)
Adjust fs->fs_maxfilesize instead of ump->um_maxfilesize
in ffs_oldfscompat_read() because the latter is overrided
by the former after ffs_oldfscompat_read() returned.
Fixes EFBIG errors on read(2) and "exec /sbin/init: error 8"
problem on mac68k after mountroot() on old 4.3BSD UFS created
by the Mkfs tool for MacOS (reported and confirmed on port-mac68k).
kern/36331 (MP deadlock between ufs_ihashget() and VOP_LOOKUP()) for ffs,
other file systems to follow. Reported by perseant@, debugged by Sverre
Froyen, patch posted/tested by Blair Sadewitz.
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS. The
default is "on", i.e., ignore lazy sync.
Reduce the amount of polling/busy-waiting done by lfs_putpages(). To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK. fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.
Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
corresponding flags.
Revert softdep_trackbufs() to its state before vn_start_write() was added.
Remove from struct mount now unneeded flags IMNT_SUSPEND* and
members mnt_writeopcountupper, mnt_writeopcountlower and mnt_leaf.
Welcome to 4.99.17
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.
Restores source compatibility with pre-newlock2 tools like ps or top.
Reviewed by Andrew Doran.
Patch by Slava Semushin <slava.semushin@gmail.com>
Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.
Implemented for file systems of type ffs.
The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.
Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.
Welcome to 4.99.9 (new vfs op vfs_suspendctl).
- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
loops where vnodes can get removed or added during the loops. This could
lead to panic's on unmount since nodes are skipped or otherwise
TAILQ_NEXT(0xdeadbeef, ...) was dereferenced.
After a rmdir()ed directory has been truncated, force an update of
the directory's inode after queuing the dirrem that will decrement
the parent directory's link count. This will force the update of
the parent directory's actual link to actually be scheduled. Without
this change the parent directory's actual link count would not be
updated until ufs_inactive() cleared the inode of the newly removed
directory, which might be deferred indefinitely. ufs_inactive()
will not be called as long as any process holds a reference to the
removed directory, and ufs_inactive() will not clear the inode if
the link count is non-zero, which could be the result of an earlier
system crash.
[plus description about problems woth background fsck solved
by this; irrelevant to NetBSD]
For me, the good effect is at least that I'm getting less filesystem
inconsistencies after a crash.
Approved by christos quite a while ago.
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.
An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.
In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
LFCNWRAPSTOP and LFCNWRAPGO.
Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).
Allow dirops sleeping on available space to be interruptible.
instead of just vnode pointers. Fixes erroneous "does not match mounted
device" errors from mount(8) in the presence of MFS /dev, init.root, &c.
No objections on tech-kern.
if we ourselves hold the lock. This prevents e.g. mknod from hanging
indefinitely.
Also, always use the return value from VOP_ISLOCKED to determine whether
we hold the lock or someone else does, rather than looking into the lock
structure ourselves.
* Mark being-deleted files in the Ifile so we can finish deleting them
at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
the cleaner, rather than relying solely on the cleaner's estimation of
whether it should clean or not.
* Note partial segments written by a user agent (in particular,
fsck_lfs) so that repeated rolls forward don't interfere with one
another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
view of the filesystem. In particular, the accounting for the segment
we are writing the inode into must be correct, and the accounting for
the segment that inode used to reside in must be correct. Rather than
just rewriting the inode if we wrote it wrong, rewrite the necessary
ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
avoiding yet another problem with the "wait for the cleaner" error
return from lfs_putpages().
Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
as the lfs compat_30_fhandle, g/c the latter.
Add an alias for the LFCNIFILEFH fcntl, so that binaries compiled in the
meantime (with too large lfs_fhandle) continue to work.
This makes vfs_cleanerd work again after the kernel checks filehandle size
more strictly (problem reported by Kurt Schreiner on current-users).
While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.
Discussed on tech-kern, with lots of help from yamt (thanks!).
particular, the caller can now choose whether to wait for the condition
to be met, and if the caller of LFCNWRAPSTOP dies or otherwise closes
the descriptor, the filesystem is started again. Updated the ckckp
regression test to use the new semantics.
dump_lfs(8) now uses the fcntls to implement LFS-style snapshotting through
the -X flag, addressing PR#33457 albeit not using fss(4). Fixed a couple
other problems with dump_lfs that manifested themselves during testing.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.
- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().
XXX check other filesystems.
posted for it even if the vnode is locked. This will deadlock with wmesg
"softgetdbuf" if it gets a BMSAFEMAP dependency as here we have "bp == nbp"
and try to get a buffer we already own.
Approved by: Frank van der Linden <fvdl@netbsd.org>
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
many inodes are cleaned at once. Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.
Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree". The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.
Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done. This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().
A couple of locking fixes are also included as well.
Move the stop for LFCNWRAPSTOP to the point at which writing at segment 0
is really about to commence, since this is what the test expects (and
incidentally what a snapshotting utility wants as well).
More correctly reconstruct the on-disk state at every checkpoint, rather
than relying on the entire state at the point of wrapping to be accurate
(that is only true the first time we wrap). Add a "make abort" target to
make rerunning the test more convenient when it has failed and we're done
analyzing the failure.
where segment 0 is being considered for writing. This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.
Include a regression test that does such scanning.
When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small). Discovered by using the aforementioned
regression test.
My understanding is that the CLRSIG() is supposed to clear the signal
that was sent to the syncer process to prevent it from being delivered
to the syncer process in case unmounting fails, so that the syncer process
does not die while the filesystem is still mounted. The typical scenario
is, the syncher process is tsleep()ing in the kernel, and waking up when
it needs to do work. If someone sends a signal to it, eg. kill -TERM
the mfs process, then the kernel will try to unmount the mfs filesystem
before delivering the signal to the process. If that unmount fails, then
we should not really kill the process because that will hang the mount.
So we call CLRSIG() to stop the signal from being delivered.
So the first call to issignal() will return the signal number that was
sent to the syncer process (unless someone malicious was able to send
a lower numbered signal between the time tsleep() returned and we called
issignal()... something that is not really easy to do). But you are
right, we should not be calling it many times as a side effect of this
macro.
Rewrite CLRSIG() clear all the signals and call issignal() the correct
number of times.
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.
Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
instead of bytes for the index, and never search below fs->lfs_freehd.
Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).
Free the bitmap when we unmount the filesystem.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
something the pagedaemon is relying on.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
* Acknowledge that sometimes there are more dirty pages to be written to
disk than clean segments. When we reach the danger line,
lfs_gop_write() now returns EAGAIN. The caller of VOP_PUTPAGES(), if
it holds the segment lock, drops it and waits for the cleaner to make
room before continuing.
* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
a page busy blocks on the cleaner while the cleaner blocks on the
segment lock while lfs_putpages blocks on the page).