* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.
Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.
Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86
Clean up some crappy pointer frobnication.
rarely in the normal case. (Note: This happens at reboot/shutdown time because
all file systems are unmounted.)
Also, for IN_MODIFY, use IN_ACCESSED, not IN_MODIFIED; otherwise "ls -l" of
your device node or FIFO would cause the time stamps to get written too
quickly.
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.
Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)
And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
doing copy-on-write.
- Change VFS_SNAPSHOT() to return the snapshot vnode locked.
- Make the IO path for copy-on-write and snapshot-read more lightweight.
Avoids deadlocks where vn_rdwr(...READ...) has a shared lock and needs
to copy-on-write.
Avoids deadlocks/panics where to clean pages the copy-on-write needs
to allocate pages for its VOP_PUTPAGES().
L_COWINPROGRESS part approved by: Jason R. Thorpe <thorpej@netbsd.org>
- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.
Welcome to 2.0F.
Approved by: Jason R. Thorpe <thorpej@netbsd.org>
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
an _LKM.
This adds pools to the list of things that lkms must do manually
because they're set up with link sets. Not that there's anything
wrong with link sets, but that we need to try harder to remember that
lkms are second class citizens. Of a sort.
no longer use and/or need it
- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change
Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
there are now alternate non-kernel checks and fixes for this problem.
relevent prs include:
bin/17910 kern/21283 kern/21404 port-macppc/23925 port-macppc/23926
install/25138
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.
Convert struct session, ucred and lockf to pools.
enforces an unnecessary restriction that the superblock be in the
particular expected locations. Also, the compatibility case is
handled in ffs_oldfscompat_read.
Ensure that we don't use the first alternate superblock of a ffsv1
filesystem with 64k blocks (it is in the same place as an ffsv2 sb).
Fixes part of PR kern/24809
not being at 8k - causes all sorts of problems, in particular with
ffsv1 filessytems with 64k blocks, and disks that are reformatted from
ffsv1 to ffsv2 (and v.v.). see also PR kern/24809
VOP_STRATEGY(bp) is replaced by one of two new functions:
- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.
DEV_STRATEGY(bp) is used only for block-to-block device situations.
suspending.
Move vfs_write_suspend() and vfs_write_resume() from kern/vfs_vnops.c
to kern/vfs_subr.c.
Change vnode write gating in ufs/ffs/ffs_softdep.c (from FreeBSD).
When vnodes are throttled in softdep_trackbufs() check for
file system suspension every 10 msecs to avoid a deadlock.
it can be used to clear the work queue.
Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.
Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)
From FreeBSD.
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.
This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.
On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
ffs_isblock:
check if a block is available
returns true if all the correponding bits in the free map are 1
returns false if any corresponding bit in the free map is 0
ffs_isfreeblock:
check if a block is completely allocated
returns true if all the corresponding bits in the free map are 0
returns false if any corresponding bit in the free map is 1
previous error conditions.
If "(flags & (V_WAIT|V_PCATCH)) == V_WAIT" the return value is always zero.
Ignore the return value in these cases.
From Darrin B. Jewell.
* Rename "config.h" to "nbtool_config.h" and
HAVE_CONFIG_H to HAVE_NBTOOL_CONFIG_H.
This makes in more obvious in the source when we're using
tools/compat/config.h versus "standard autoconf" config.h
* Consistently move the inclusion of nbtool_config.h to before
<sys/cdefs.h> so that the former can provide __RCSID() (et al),
and there's no need to protect those macros any more.
These changes should make it easier to "tool-ify" a program by adding:
#if HAVE_NBTOOL_CONFIG_H
#include "nbtool_config.h"
#endif
to the top of the source files (for the general case).
ffs_full_fsync(); while it is supposed to hint that the update of _file_
metadata (as in timestamps et al.) may be omitted it doesn't mean the
same for _filesystem_ metadata.
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.
From FreeBSD with slight modifications.
Approved by: Frank van der Linden <fvdl@netbsd.org>
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
This fixes a bug introduced in revision 1.120 of ffs_vfsops dated 2003/09/13
which results in fs_flags having a value of 0x7fffff00 when a superblock
is updated to use the new layout.
Discussed in http://mail-index.netbsd.org/tech-kern/2003/09/28/0003.html
genfs_getpages() can read in more blocks than it should due to faked filesize
of lfs_gop_size(). it's a security problem and it makes gcc3 "internal error"
to fix this,
- in genfs_getpages(), always calculate diskeof and memeof separately
so that filesystems (in this case, lfs) can use different strategies
for them.
- introduce GOP_SIZE_MEM flag and use it to request in-core filesize.
(it was an intention of GOP_SIZE_READ,
but after the above change _READ is not a straightforward name)
after this, no one uses GOP_SIZE_{READ,WRITE} anymore but leave them for now.
when an old ffs filesytem is first mounted (as a result, df reports disk
full on old ffs filesystem or mfs created by old binary). Problem first
noticed by onoe san.
http://mail-index.netbsd.org/tech-kern/2003/09/06/0001.htmlhttp://mail-index.netbsd.org/tech-kern/2003/09/06/0006.html
to avoid compat problems with old ffsv1 by reuse of the old FS_SWAPPED
value for FS_FLAGS_UPDATED, and use of new, larger fields:
- Don't use FS_FLAGS_UPDATED to see if we need to update new fields from
old fields in ffsv1 case.
- when writing back the superblock, copy back the flags to the old location
if only old flags are set (FS_FLAGS_UPDATED won't be set in this case)
in ffsv1 case.
panics in ffs_full_fsync because v_specmountpoint requires that the NULL
v_specinfo be followed.
Tidy up in the same order in all error paths so compiler can merge the
code sequences.
Fixes PR kern/22419
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:
* Include the fs ident in the filehandle; improve stale filehandle checks.
* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.
* Use b_interlock in lfs_vtruncbuf.
* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.
* Don't loop in lfs_fsync(), just write everything and wait.
* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.
* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
to determine if this filesystem was mounted by an older kernel after
having been mounted by a newer one, to avoid some summary mismatches.
* Reinstate support for 4.2 cylinder groups (read-only, as it was before).
64 bit block pointers, extended attribute storage, and a few
other things.
This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.
Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
of segments to mark. However, this may be much more than lfs_nseg.
Originally this wasn't a big problem, since only the structures in the
diskblock were changed, but nowadays there's a mirror of the segflags
in the in-core superblock. This problem caused the code to walk
way past the end of that allocated area, causing memory corruption
in other kernel structures. So, use lfs_nseg as the maximum, as it should be.
While here, simplify the loop; it had become an obfuscated piece of
code overtime.
when the filesystem is unmounted, relocking the Ifile when its lock is
draining. (We can't use vfs_busy() since the process is sleeping for a
good long time.) Clean up / organize lfs.h, while I'm here.
In lfs_update_single, assert that disk addresses are either negative, or
are still positive when converted to int32_t, to prevent recurrence of a
negative/positive block problem.
checking the memq.
Take greater care not to dirty the Ifile vnode when unmounting the filesystem.
This should fix a "(vp->v_flag & VONWORKLST) == 0" assertion panic in vgonel
that could occur when unmounting.
Do not allow the Ifile to be mapped for writing.
direct and indirect block pointers are not valid in the case of shortlinks.
while i'm here, move duplicated code in lfs_vget/fastvget into a new
function, lfs_vinit.
Note however that blocks can be added to the Ifile even when the segment
block is held because of inodes' atime. Do not panic with "dirty blocks"
if these blocks are present.
be expanded to cover other per-fs and subsystem-wide data as well.
Fix a case of IN_MODIFIED being set without updating lfs_uinodes, resulting
in a "lfs_uinodes < 0" panic.
Fix a deadlock in lfs_putpages arising from the need to busy all pages in a
block; unbusy any that had already been busied before starting over.
always true) and accompanying dead code.
- When constructing write clusters in lfs_writeseg, if the block we are
about to add is itself a cluster from GOP_WRITE, don't put a cluster
in a cluster, just write the GOP_WRITE cluster on its own. This seems
to represent a slight performance gain on my test machine.
- Charge someone's rusage for writes on LFSes. It's difficult to tell
who the "right" process to charge is; just charge whoever triggered
the write.
where the cleaner is trying to write, instead of tying up the "live"
buffers (or pages).
Fix a bug in the LFS_UBC case where oversized buffers would not be
checksummed correctly, causing uncleanable segments.
Make sure that wakeup(fs->lfs_iocount) is done if fs->lfs_iocount is 1
as well as 0, since we wait in some places for it to drop to 1.
Activate all pages that make it into lfs_gop_write without the segment
lock held, since they must have been dirtied very recently, even if
PG_DELWRI is not set.
actually happens.
Add a new fcntl call that will write the minimum necessary to checkpoint
(i.e., for on-disk directory structure to be consistent, not including
updates to file data) so that the cleaner can clean segments more quickly
without sacrificing three-way commit for cleaning.
either as a mysterious UVM error or as "panic: dirty bufs". Verify
maximum size in lfs_malloc.
Teach lfs_updatemeta and lfs_shellsort about oversized cluster blocks from
lfs_gop_write.
When unwiring pages in lfs_gop_write, deactivate them, under the theory
that the pagedaemon wanted to free them last we knew.
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:
* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.
* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.
* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.
And a few that are not strictly necessary:
* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."
* Unified GOP_ALLOC between FFS and LFS.
* Update LFS copyright headers to correct values.
* Actually cast to unsigned in lfs_shellsort, like the comment says.
* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
(backout parts of rev.1.40)
otherwise, directory structures can be corrupted because checkpoints can
occur via eg. lfs_vflush before parent directory is written.