old somewhat naive selection scheme that didn't allow different allocation
settings for nodes, directory information (FIDs) and data.
Also fix some curious side-effects of atime updates on RMW devices.
it could result in theory result in descriptor trashing.
On the performance side, it would try to fixup *every* descriptor even if
it wasn't an internally allocated one. Performance loss wasn't that big but
every bit helps.
preliminary Metadata partition write support but its disabled still since
its not finished yet and not functioning correctly. All other formats are
checked and should work fine.
improvements of at least 4 times in untarring and roughly 100 to 500 times
on file creation in big directories. Lookup of files was O(n*n) and is now
O(1) even for file creation. Free spaces in the directory are kept in a
seperate list for fast file creation.
The postmark benchmark gives:
UDF old:
pm>set transactions 2000
pm>set number 3000
pm>run
Creating files...Done
Performing transactions..........Done
Deleting files...Done
Time:
1593 seconds total
681 seconds of transactions (2 per second)
Files:
3956 created (2 per second)
Creation alone: 3000 files (4 per second)
Mixed with transactions: 956 files (1 per second)
990 read (1 per second)
1010 appended (1 per second)
3956 deleted (2 per second)
Deletion alone: 2912 files (9 per second)
Mixed with transactions: 1044 files (1 per second)
Data:
5.26 megabytes read (3.38 kilobytes per second)
21.93 megabytes written (14.10 kilobytes per second)
pm>
UDF new:
pm>set transactions 2000
pm>set number 3000
pm>run
Creating files...Done
Performing transactions..........Done
Deleting files...Done
Time:
19 seconds total
3 seconds of transactions (666 per second)
Files:
3956 created (208 per second)
Creation alone: 3000 files (230 per second)
Mixed with transactions: 956 files (318 per second)
990 read (330 per second)
1010 appended (336 per second)
3956 deleted (208 per second)
Deletion alone: 2912 files (970 per second)
Mixed with transactions: 1044 files (348 per second)
Data:
5.26 megabytes read (283.66 kilobytes per second)
21.93 megabytes written (1.15 megabytes per second)
unlock the source directory again on exit. The stub that doesn't allow
cross directory renames for now jumped to the wrong exit point and thus
left a locked directory node that paniced on next locking.
heavily fragmented files.
Also fixing some (rare) allocation bugs and function name streamlining.
Tested on harddisc, CD-RW and CD-R i.e. all three basic backend classes.
corruptions that could take place when overwriting sparse files.
Still one rare corruption possible where blocks are accidentally marked
free, but the cause is not yet found and looking at the pattern it won't
happen in every day use.
the name ".." on a parent path component. To prevent other similar errors,
name length checking is not done but the passed name that shouldn't be
passed is ignored.
detect that the `last_node' variable will be set before used since it can't
parse the semantics of `TAILQ_EMPTY()' that is used as a guard first.
Thanks for H?rvard for finding and reporting it :)
run through copy-on-write. Call fscow_run() with valid data where possible.
The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.
- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.
- Always run copy-on-write on buffers returned from ffs_balloc().
- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.
Welcome to 4.99.63
Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
and DVD's behave like floppy discs. Writing is supported upto and including
version 2.01; version 2.50 and 2.60 will follow.
Also extending the UDF implementation to support symbolic links and
hardlinks.
Added are the mmcformat(8) tool to format rewritable CD/DVD discs and
newfs_udf(8).
Limitations:
all operations can be performed on the file system though the
sheduling is currently optimised for archiving workloads.
mv(1)/rename(2) is currently only implemented for non-directories.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.
As a consequence, most of the file systems can now be loaded as new style
modules.
Quick sanity check by ad@.
Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.
Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:
- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.
- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.
One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
puffs_getvnode() was inserting vnodes into mnt_vnodelist without taking
a reference to the mount for each. When vnodes are scrubbed, refs to the
vnodes mount structure are dropped => boom.
The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:
- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:
- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.
- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.
- The system spends less time at IPL_SCHED, and there is less lock activity.
- Add a lot of missing selinit() and seldestroy() calls.
- Merge selwakeup() and selnotify() calls into a single selnotify().
- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.
Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
shutdown). There are still problems with device access and a PR will be
filed.
- Kill checkalias(). Allow multiple vnodes to reference a single device.
- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.
- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
zero - our implementation can't handle it (how sensible handling
a case like that would be is a whole other debate).
fixes panic reported by Jukka Salmi on current-users
the newly added space first. This significantly speeds up write speed for
msdosfs and making it at par with ffs wich already had this patched.
Speed increase measured on my IDE disc from 2Mb/sec to 32 Mb/sec
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
a bug: the homegrown version neglected to unlock vp
* don't reimplement eopnotsupp()
* init genfs_node earlier vget, protects against error paths in vget
from attempting to destroy a non-inited node
same file were renamed simultaneously, there was a window where
directory entry cached in the vnode during lookup would be replaced
before calling rename. This lead to one directory entry getting
renamed twice and the other one getting zero renames. Do a relookup
in rename to make sure we have the correct directory entry.
Thanks go to Greg Oster for reporting the problem, helping with
debugging and thoroughly testing the patch.
called with NOWAIT.
XXX: this is just a quick fix to stop the diagnostic panic. I
think ENOSPC should be treated elsewhere depending on how much
memory tmpfs claims.
thing and release locks before the userspace wait for operations
which release the lock before exit from the method in any case.
However, releasing the lock after inserting the request on the
operation queue gives us proper ordering possibilities in userspace
(at least if that bit were implemented, but I don't think there
any file system in userspace that depends on kernel locking and
probably there never should be one).
inspired by a conversation with Nacho Navarro
with the file server happen through puffs_msg_enqueue() and
puffs_msg_wait() instead of having a billion different routines.
Build the existing system upon these two. Most importantly though,
decouple insertation into the op queue from the actual wait. This
is useful for a number of reasons coming soon to a cvs repo near you.
unlocked vnode when trying to rename a directory. The fix was to
shuffle some bits around and #pray.
The rename routine actually needs a very very major wide-angle whopping:
* it takes locks out-of-order
* it deals with references from SAVESTART lookups in interesting ways
* I doubt there is any guarantee for correct operation if there
are multiple concurrent accesses
* the error branches might just as well call panic() directly
Rip the transport code completely out of puffs and generalize it
into an independent module which will be used for multiple purposes
in the future. This module is called the Pass-to-Userspace
Transporter (known as "putter" among friends).
This is very much work-in-progress and one dependency with puffs
remains: the request framing format.
The device name is still /dev/puffs, but that will change soon.
Users of puffs need the following in their kernel configs now:
pseudo-device putter
when vclean()ing. Pending an adventure to the genfs/vm labyrinth
to fix this properly, compensate here by not allowing unstrategic
(no pun) return values. They are always due to the userspace server
crashing anyway, so it's no big deal if we lie about the final
resting place of the pages.
userspace, since it doesn't contain any information yet. I should
still rework this more so this is just a quickie to get the read/write
style interface more up to speed with the ioctl version.
interacts with the userspace file server:
* since the kernel-user communication is not purely request-response
anymore (hasn't been since 2006), try to rename some "request" to
"message". more similar mangling will take place in the future.
* completely rework how messages are allocated. previously most of
them were borrowed from the stack (originally *all* of them),
but now always allocate dynamically. this makes the structure
of the code much cleaner. also makes it possible to fix a
locking order violation. it enables plenty of future enhancements.
* start generalizing the transport interface to be independent of puffs
* move transport interface to read/write instead of ioctl. the
old one had legacy design problems, and besides, ioctl's suck.
implement a very generic version for now; this will be
worked on later hopefully some day reaching "highly optimized".
* implement libpuffs support behind existing library request
interfaces. this will change eventually (I hate those interfaces)
userspace call, namely our private mount structure, in the activation
record. This avoids problems in situations where the userspace
file server happens to die during our upcall and the vnode is
forcibly reclaimed before we roll back to the current stack frame.
committed something, issue an abort. The abort is done through
the regular op channel, e.g. failed mkdir leads to regular rmdir,
inactive and reclaim. No internal interface is planned currently
for the one file system out of a million which would implement it
to benefit from the one case in a billion where kernel resource
allocation actually does fail and out of that one case in a trillion
where internal vs. external would make a difference.
to be vaild errno values
* include string describing error in PUFFS_ERR
* get rid of union in puffs_req, it's nothing but trouble
* pass pmp to async i/o callbacks
locked. Ideally the function should be rewritten to do things in
a different order, but this tries to keep changes minimal aiming
for a possible netbsd-4 pullup.
fixes PR kern/37034