in the vnode. All LK_* flags move from sys/lock.h to sys/vnode.h. Calls
to vlockmgr() in file systems get replaced with VOP_LOCK() or VOP_UNLOCK().
Welcome to 5.99.34.
Discussed on tech-kern.
- remove unnessisary check that would prevent it from mounting newer nilfs
images. A field has been added in the segment summary.
- store blocks of files on their virtual block number
- VOP_LOCK(vp, flags): Limit the set of allowed flags to LK_EXCLUSIVE,
LK_SHARED and LK_NOWAIT. LK_INTERLOCK is no longer allowed as it
makes no sense here.
- VOP_ISLOCKED(vp): Remove the for some time unused return value
LK_EXCLOTHER. Mark this operation as "diagnostic only".
Making a lock decision based on this operation is no longer allowed.
Discussed on tech-kern.
accounting. Use wired memory (which can be limited) for meta-data, and
kmem(9) for string allocations.
Close PR/31944. Fix PR/38361 while here. OK ad@.
This makes the size the same on 64bit archs. Don't bother bumping
any version, since you'd have explicitly had to jump through some
hoops to use pathconf before.
to do two things:
1) properly set "recycle?" in inactive
2) easily check if we are renaming a removed vnode. without the
check, it was possible to enter a dirent in the file system for
a removed (and hence scheduled to be vcleaned) vnode. this would
lead to the succesful vget() of a clean vnode. the use of the
cleaned vnode was, however, less succesful, except for purposes
of crashing.
the value before the call (yea, changing relookup would probably
be smart, but other file systems already initialize vpp, so I'm
letting someone else experiment with tylenol od).
system drivers where it was missing from and fixes one buggy
implementation. The arguably weird semantics of the check are
maintained (v_size vs. va_bytes, overwrite).
unlike event it did fail, the kernel would double lutz to doom
(in failure devvp now remains unmountable until reboot. fans
of complicated & untested error branches may attempt to gunk this
up. i'm not one of them).
* cosmetic surgery: cut extra ;
msdosfs and cd9660 are the only filesystems that verify the filesystem
type in the label. This is the wrong place, sanity checks should only
rely on the inner structure of the filesystem (like signatures or
magic numbers).
msdosfs also used the device type information from the label to
deduce a filesystem parameter heuristically for the gemdos variant.
If there is no information inside the filesystem data itself, this
should be an explicit mount option.
operations. This prevents kernel memory leaks (one of which happened
every time the file system was unmounted via PUFFSOP_UNMOUNT ...
and incidentally would've been trivially caught with the old
malloc(9) interface. I wonder if the message is to use a ton of
pools instead of regression-attractive kmem interface).
reference the puffs_node before sending the request to the file
server. This diminishes the window where the inode can be reclaimed
and be invalidated before it is accessed (but does not completely
eliminate the race, as that is a caller problem which we cannot
fix here).
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.
no functional change
to initiate self destruct, i.e. unmount(MNT_FORCE). This, however,
is a semi-controlled self-destruct, since all caches are flushed
before the (possibly) violent unmount takes place.
context. This fixes a long-standing but seldomly seen deadlock,
where the kernel was holding pages busy (due to e.g. readahead
request) while waiting for the server to respond, and the server
made a callback into the kernel asking to invalidate those pages.
... or, well, theoretically fixes, since I didn't have any reliable
way of repeating the deadlock and I think I saw it only twice.
* it depended on the biglock (in a very cruel way)
* it was attached to userspace transactions rather than logical
fs operations
(If someone wants to revisit it some day, most of the stuff can be
reused from cvs history)
a valid optimization, but that's long gone and once VOP_INACTIVE is
called and the file server says that the vnode is going to be recycled,
it really is going to be recycled extra references gained or not.
based on code taken from FreeBSD.
This stops truncation of files larger than 4GB by VOP_SETATTR() which e.g.
happened when copying large files "rump_smbfs". Kudos to Antti Kantee
for diagnosing the problem in smbfs_smb_setfsize().
Change cd9660_mount, in MNT_UPDATE case, to check dev_t's for equality
instead of just vnode pointers. Fixes erroneous "Invalid argument"
errors from mount(8) with -u against cd9660 root in the presence of
mfs or tmpfs /dev prepared after initial mountroot.
Tested on QEMU running cobalt Restore CD.
operations creating a node cannot be considered outgoing operations,
since after return from userspace they modify file system state
by creating a new node. if we do not protect the file system by
holding the directory lock, a lookup operation might race us into
the kernel and create the node earlier.
* remove pnode from hashlish before sending the reclaim faf off to
userspace. also, hold pmp_lock while frobbing the list.
system generate heaps of odd allocations since the end of write request was
overwritten by the start of the second resulting in another relocation.
Also added a full flush of the file on a VOP_CLOSE(). This automatically
flushes file tails to disc.
- pcinfo = kmem_zalloc(sizeof_puffs_cacheinfo) + runsize,
+ pcinfo = kmem_zalloc(sizeof(struct puffs_cacheinfo) + runsize,
in #ifdef'ed out code, per paired kmem_free() in the same function.
Closes PR kern/41840.
complaint in the "ntfs ubc_uiomove error" (ubc_uiomove error was
not coming from ntfs but instead the "to" file system) and PR
kern/38531 (well, I assume the submitter wanted cp(1) working on
ntfs instead of mangling ntfs the way the PR title suggests). Yes,
mmap works on ntfs like it always has.
*) well, um, and in other places too ... uuuh ... no comments.
but I guess this works as long as in-kernel ntfs doesn't grow write
support.
out before automatically.
However, when dealing with faulty discs that fail to mount, system nodes are
of course not written out and thus may still be marked dirty, if only due to
access. Especially on sequential media this gave rise to panics on reading
trackinfo since the write track section had not yet been initialised.
tested with a DEBUG+DIAGNOSTIC+LOCKDEBUG kernel. To summerise NiLFS, i'll
repeat my posting to tech-kern here:
NiLFS stands for New implementation of Logging File System; LFS done
right they claim :) It is at version 2 now and is being developed by NTT, the
Japanese telecom company and recently put into the linux source tree. See
http://www.nilfs.org. The on-disc format is not completely frozen and i expect
at least one minor revision to come in time.
The benefits of NiLFS are build-in fine-grained checkpointing, persistent
snapshots, multiple mounts and very large file and media support. Every
checkpoint can be transformed into a snapshot and v.v. It is said to perform
very well on flash media since it is not overwriting pieces apart from a
incidental update of the superblock, but that might change. It is accompanied
by a cleaner to clean up the segments and recover lost space.
My work is not a port of the linux code; its a new implementation. Porting the
code would be more work since its very linux oriented and never written to be
ported outside linux. The goal is to be fully interchangable. The code is non
intrusive to other parts of the kernel. It is also very light-weight.
The current state of the code is read-only access to both clean and dirty
NiLFS partitions. On mounting a dirty partition it rolls forward the log to
the last checkpoint. Full read-write support is however planned!
Just as the linux code, mount_nilfs allows for the `head' to be mounted
read/write and allows multiple read-only snapshots/checkpoint mounts next to
it.
By allowing the RW mount at a different snapshot for read-write it should be
possible eventually to revert back to a previous state; i.e. try to upgrade a
system and being able to revert to the exact state prior to the upgrade.
Compared to other FS's its pretty light-weight, suitable for embedded use and
on flash media. The read-only code is currently 17kb object code on
NetBSD/i386. I doubt the read-write code will surpass the 50 or 60. Compared
this to FFS being 156kb, UDF being 84 kb and NFS being 130kb. Run-time memory
usage is most likely not very different from other uses though maybe a bit
higher than FFS.
This variable was not actually used uninitialised, but some compilers
(e.g. gcc-4.3.3) warned that the variable might be used uninitialised.
Inspired by PR 41255 from Kurt Lidl.
would push out elements to fillup-read only when the time had come for them.
This could then trickle feed the read queue slowly, but fast enough to prevent
it from switching state.
Benefits are significant speed improvements on node creation/insertion while
keeping the lookup times low and still allowing sequential iteration over the
nodes.
member.
XXX No idea if this is the right solution to this problem, but it does
XXX at least allow thebuild to continue. The original committed should
XXX verify that this does what was intended!
(Hello again, Elad)
Be sure that no other active vnodes remains, before trying to release
the root one. Likewise, do not destroy the smbmount specific structure
if the umount will fail (busy conditions).
No objection from pooka@.
check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr
All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.
XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
of free blocks on the device and when free blocks are getting tight it tries
to readjust/recalculate that value by syncing the FS.
Second stage will be resizing the data/metadata partitions.
the other routines of the same spirit.
Adjust file-system code to use it.
Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).
No objections on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
the security checks when mounting a device (VOP_ACCESS() + kauth(9) call)).
Proposed with no objections on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2009/04/20/msg004859.html
The vnode is always expected to be locked, so no locking is done outside
the file-system code.
the security checks when mounting a device (VOP_ACCESS() + kauth(9) call)).
Proposed with no objections on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2009/04/20/msg004859.html
The vnode is always expected to be locked, so no locking is done outside
the file-system code.
dirent is no longer cached in lookup and we do the lookup ourselves
in rename, we are most definitely not allowed to assert that it
matches the source vnode passed as an argument. In case the source
node does not exist or has been replaced, punt with ENOENT.
Also, nuke some misleading prehistoric comments which haven't been
valid in over a year.
Fixes PR kern/41128 by Nicolas Joly
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
allow CD-ROM/DVD-ROM/DB-ROM drives to read the media while still allowing them
to be appended later. It can also be seen as a way to make mountable
snapshots.
its currently still misbehaving. If supported later, it would allow one or a
series of sessions on a sequential recordable media to be ignored as if they
never were created.
Also fix a small comment: its not the direct but the bootstrap disc strategy
that we close down.