31fc62d4e9
(PR #11468). In the case of fragment allocation, check to see if enough space is available before extending a fragment already scheduled for writing. The locked_queue_* variables indicate the number of buffer headers and bytes, respectively, that are unavailable to getnewbuf() because they are locked up waiting for LFS to flush them; make sure that that is actually what we're counting, i.e., never count malloced buffers, and always use b_bufsize instead of b_bcount. If DEBUG is defined, the periodic calls to lfs_countlocked will now complain if either counter is incorrect. (In the future lfs_countlocked will not need to be called at all if DEBUG is not defined.)
130 lines
6.0 KiB
Plaintext
130 lines
6.0 KiB
Plaintext
# $NetBSD: TODO,v 1.4 2000/11/17 19:14:41 perseant Exp $
|
|
|
|
- If we put an LFS onto a striped disk, we want to be able to specify
|
|
the segment size to be equal to the stripe size, regardless of whether
|
|
this is a power of two; also, the first segment should just eat the
|
|
label pad, like the segments eat the superblocks. Then, we could
|
|
neatly lay out the segments along stripe boundaries. [v2]
|
|
|
|
- Working fsck_lfs. (Have something that will verify, need something
|
|
that will fix too. Really, need a general-purpose external
|
|
partial-segment writer.)
|
|
|
|
- Roll-forward agent, *at least* to verify the newer superblock's
|
|
checkpoint (easy) but also to create a valid checkpoint for
|
|
post-checkpoint writes (requires an external partial-segment writer).
|
|
|
|
- Inode blocks are currently the same size as the fs block size; but all
|
|
the ones I've seen are mostly empty, and this will be especially true
|
|
if atime information is kept in the ifile instead of the inode. Could
|
|
we shrink the inode block size to 512? Or parametrize it at fs
|
|
creation time?
|
|
|
|
- Get rid of DEV_BSIZE, pay attention to the media block size at mount time.
|
|
|
|
- More fs ops need to call lfs_imtime. Which ones? (Blackwell et al., 1995)
|
|
|
|
- lfs_vunref_head exists so that vnodes loaded solely for cleaning can
|
|
be put back on the *head* of the vnode free list. Make sure we
|
|
actually do this, since we now take IN_CLEANING off during segment write.
|
|
|
|
- Investigate the "unlocked access" in lfs_bmapv, see if we could wait
|
|
there most of the time? Are we getting inconsistent data?
|
|
|
|
- Change the free_lock to be fs-specific, and change the dirvcount to be
|
|
subsystem-wide.
|
|
|
|
- The cleaner could be enhanced to be controlled from other processes,
|
|
and possibly perform additional tasks:
|
|
|
|
- Backups. At a minimum, turn the cleaner off and on to allow
|
|
effective live backups. More aggressively, the cleaner itself could
|
|
be the backup agent, and dump_lfs would merely be a controller.
|
|
|
|
- Cleaning time policies. Be able to tweak the cleaner's thresholds
|
|
to allow more thorough cleaning during policy-determined idle
|
|
periods (regardless of actual idleness) or put off until later
|
|
during short, intensive write periods.
|
|
|
|
- File coalescing and placement. During periods we expect to be idle,
|
|
coalesce fragmented files into one place on disk for better read
|
|
performance. Ideally, move files that have not been accessed in a
|
|
while to the extremes of the disk, thereby shortening seek times for
|
|
files that are accessed more frequently (though how the cleaner
|
|
should communicate "please put this near the beginning or end of the
|
|
disk" to the kernel is a very good question; flags to lfs_markv?).
|
|
|
|
- Versioning. When it cleans a segment it could write data for files
|
|
that were less than n versions old to tape or elsewhere. Perhaps it
|
|
could even write them back onto the disk, although that requires
|
|
more thought (and kernel mods).
|
|
|
|
- Move lfs_countlocked() into vfs_bio.c, to replace count_locked_queue;
|
|
perhaps keep the name, replace the function. Could it count referenced
|
|
vnodes as well, if it was in vfs_subr.c instead?
|
|
|
|
- Why not delete the lfs_bmapv call, just mark everything dirty that
|
|
isn't deleted/truncated? Get some numbers about what percentage of
|
|
the stuff that the cleaner thinks might be live is live. If it's
|
|
high, get rid of lfs_bmapv.
|
|
|
|
- There is a nasty problem in that it may take *more* room to write the
|
|
data to clean a segment than is returned by the new segment because of
|
|
indirect blocks in segment 2 being dirtied by the data being copied
|
|
into the log from segment 1. The suggested solution at this point is
|
|
to detect it when we have no space left on the filesystem, write the
|
|
extra data into the last segment (leaving no clean ones), make it a
|
|
checkpoint and shut down the file system for fixing by a utility
|
|
reading the raw partition. Argument is that this should never happen
|
|
and is practically impossible to fix since the cleaner would have to
|
|
theoretically build a model of the entire filesystem in memory to
|
|
detect the condition occurring. A file coalescing cleaner will help
|
|
avoid the problem, and one that reads/writes from the raw disk could
|
|
fix it.
|
|
|
|
- Overlap the version and nextfree fields in the IFILE
|
|
|
|
- Change so that only search one sector of inode block file for the
|
|
inode by using sector addresses in the ifile instead of
|
|
logical disk addresses.
|
|
|
|
- Fix the use of the ifile version field to use the generation number instead.
|
|
|
|
- Need to keep vnode v_numoutput up to date for pending writes?
|
|
|
|
- If delete a file that's being executed, the version number isn't
|
|
updated, and fsck_lfs has to figure this out; case is the same as if
|
|
have an inode that no directory references, so the file should be
|
|
reattached into lost+found.
|
|
|
|
- Investigate: should the access time be part of the IFILE:
|
|
pro: theoretically, saves disk writes
|
|
con: cacheing inodes should obviate this advantage
|
|
the IFILE is already humongous
|
|
|
|
- Currently there's no notion of write error checking.
|
|
+ Failed data/inode writes should be rescheduled (kernel level bad blocking).
|
|
+ Failed superblock writes should cause selection of new superblock
|
|
for checkpointing.
|
|
|
|
- Future fantasies:
|
|
- unrm, versioning
|
|
- transactions
|
|
- extended cleaner policies (hot/cold data, data placement)
|
|
|
|
- Problem with the concept of multiple buffer headers referencing the segment:
|
|
Positives:
|
|
Don't lock down 1 segment per file system of physical memory.
|
|
Don't copy from buffers to segment memory.
|
|
Don't tie down the bus to transfer 1M.
|
|
Works on controllers supporting less than large transfers.
|
|
Disk can start writing immediately instead of waiting 1/2 rotation
|
|
and the full transfer.
|
|
Negatives:
|
|
Have to do segment write then segment summary write, since the latter
|
|
is what verifies that the segment is okay. (Is there another way
|
|
to do this?)
|
|
|
|
- The algorithm for selecting the disk addresses of the super-blocks
|
|
has to be available to the user program which checks the file system.
|