NetBSD/sys/ufs/lfs/TODO

#	from: @(#)TODO	8.1 (Berkeley) 6/11/93
#	$Id: TODO,v 1.1 1994/06/08 11:42:17 mycroft Exp $

NOTE: Changed the lookup on a page of inodes to search from the back
in case the same inode gets written twice on the same page.

Make sure that if you are writing a file, but not all the blocks
make it into a single segment, that you do not write the inode in
that segment.

Keith:
	Why not delete the lfs_bmapv call, just mark everything dirty
		that isn't deleted/truncated?  Get some numbers about
		what percentage of the stuff that the cleaner thinks
		might be live is live.  If it's high, get rid of lfs_bmapv.

	There is a nasty problem in that it may take *more* room to write
	the data to clean a segment than is returned by the new segment
	because of indirect blocks in segment 2 being dirtied by the data
	being copied into the log from segment 1.  The suggested solution
	at this point is to detect it when we have no space left on the
	filesystem, write the extra data into the last segment (leaving
	no clean ones), make it a checkpoint and shut down the file system
	for fixing by a utility reading the raw partition.  Argument is
	that this should never happen and is practically impossible to fix
	since the cleaner would have to theoretically build a model of the
	entire filesystem in memory to detect the condition occurring.
	A file coalescing cleaner will help avoid the problem, and one
	that reads/writes from the raw disk could fix it.

DONE	Currently, inodes are being flushed to disk synchronously upon
		creation -- see ufs_makeinode.  However, only the inode
		is flushed, the directory "name" is written using VOP_BWRITE,
		so it's not synchronous.  Possible solutions: 1: get some
		ordering in the writes so that inode/directory entries get
		stuffed into the same segment.  2: do both synchronously
		3: add Mendel's information into the stream so we log
		creation/deletion of inodes.  4: do some form of partial
		segment when changing the inode (creation/deletion/rename).
DONE	Fix i_block increment for indirect blocks.
	If the file system is tar'd, extracted on top of another LFS, the
		IFILE ain't worth diddly.  Is the cleaner writing the IFILE?
		If not, let's make it read-only.
DONE	Delete unnecessary source from utils in main-line source tree.
DONE	Make sure that we're counting meta blocks in the inode i_block count.
	Overlap the version and nextfree fields in the IFILE
DONE	Vinvalbuf (Kirk):
		Why writing blocks that are no longer useful?
		Are the semantics of close such that blocks have to be flushed?
		How specify in the buf chain the blocks that don't need
		to be written?  (Different numbering of indirect blocks.)

Margo:
	Change so that only search one sector of inode block file for the
		inode by using sector addresses in the ifile instead of
		logical disk addresses.
	Fix the use of the ifile version field to use the generation
		number instead.
DONE	Unmount; not doing a bgetvp (VHOLD) in lfs_newbuf call.
DONE	Document in the README file where the checkpoint information is
		on disk.
	Variable block sizes (Margo/Keith).
	Switch the byte accounting to sector accounting.
DONE	Check lfs.h and make sure that the #defines/structures are all
		actually needed.
DONE	Add a check in lfs_segment.c so that if the segment is empty,
		we don't write it.
	Need to keep vnode v_numoutput up to date for pending writes?
DONE	USENIX paper (Carl/Margo).


Evelyn:
	lfsck:	If delete a file that's being executed, the version number
		isn't updated, and lfsck has to figure this out; case is			the same as if have an inode that no directory references,
		so the file should be reattached into lost+found.
	Recovery/fsck.

Carl:
	Investigate: clustering of reads (if blocks in the segment are ordered,
		should read them all) and writes (McVoy paper).
	Investigate: should the access time be part of the IFILE:
		pro: theoretically, saves disk writes
		con: cacheing inodes should obviate this advantage
		     the IFILE is already humongous
	Cleaner.
	Port to OSF/1 (Carl/Keith).
	Currently there's no notion of write error checking.
		+ Failed data/inode writes should be rescheduled (kernel level
		  bad blocking).
		+ Failed superblock writes should cause selection of new
		  superblock for checkpointing.

FUTURE FANTASIES: ============

+ unrm, versioning
+ transactions
+ extended cleaner policies (hot/cold data, data placement)

==============================
Problem with the concept of multiple buffer headers referencing the segment:
Positives:
	Don't lock down 1 segment per file system of physical memory.
	Don't copy from buffers to segment memory.
	Don't tie down the bus to transfer 1M.
	Works on controllers supporting less than large transfers.
	Disk can start writing immediately instead of waiting 1/2 rotation
	    and the full transfer.
Negatives:
	Have to do segment write then segment summary write, since the latter
	is what verifies that the segment is okay.  (Is there another way
	to do this?)
==============================

The algorithm for selecting the disk addresses of the super-blocks
has to be available to the user program which checks the file system.

(Currently in newfs, becomes a common subroutine.)
Update to 4.4-Lite fs code, with local changes. 1994-06-08 15:41:58 +04:00			`# from: @(#)TODO 8.1 (Berkeley) 6/11/93`
			`# $Id: TODO,v 1.1 1994/06/08 11:42:17 mycroft Exp $`

			`NOTE: Changed the lookup on a page of inodes to search from the back`
			`in case the same inode gets written twice on the same page.`

			`Make sure that if you are writing a file, but not all the blocks`
			`make it into a single segment, that you do not write the inode in`
			`that segment.`

			`Keith:`
			`Why not delete the lfs_bmapv call, just mark everything dirty`
			`that isn't deleted/truncated? Get some numbers about`
			`what percentage of the stuff that the cleaner thinks`
			`might be live is live. If it's high, get rid of lfs_bmapv.`

			`There is a nasty problem in that it may take more room to write`
			`the data to clean a segment than is returned by the new segment`
			`because of indirect blocks in segment 2 being dirtied by the data`
			`being copied into the log from segment 1. The suggested solution`
			`at this point is to detect it when we have no space left on the`
			`filesystem, write the extra data into the last segment (leaving`
			`no clean ones), make it a checkpoint and shut down the file system`
			`for fixing by a utility reading the raw partition. Argument is`
			`that this should never happen and is practically impossible to fix`
			`since the cleaner would have to theoretically build a model of the`
			`entire filesystem in memory to detect the condition occurring.`
			`A file coalescing cleaner will help avoid the problem, and one`
			`that reads/writes from the raw disk could fix it.`

			`DONE Currently, inodes are being flushed to disk synchronously upon`
			`creation -- see ufs_makeinode. However, only the inode`
			`is flushed, the directory "name" is written using VOP_BWRITE,`
			`so it's not synchronous. Possible solutions: 1: get some`
			`ordering in the writes so that inode/directory entries get`
			`stuffed into the same segment. 2: do both synchronously`
			`3: add Mendel's information into the stream so we log`
			`creation/deletion of inodes. 4: do some form of partial`
			`segment when changing the inode (creation/deletion/rename).`
			`DONE Fix i_block increment for indirect blocks.`
			`If the file system is tar'd, extracted on top of another LFS, the`
			`IFILE ain't worth diddly. Is the cleaner writing the IFILE?`
			`If not, let's make it read-only.`
			`DONE Delete unnecessary source from utils in main-line source tree.`
			`DONE Make sure that we're counting meta blocks in the inode i_block count.`
			`Overlap the version and nextfree fields in the IFILE`
			`DONE Vinvalbuf (Kirk):`
			`Why writing blocks that are no longer useful?`
			`Are the semantics of close such that blocks have to be flushed?`
			`How specify in the buf chain the blocks that don't need`
			`to be written? (Different numbering of indirect blocks.)`

			`Margo:`
			`Change so that only search one sector of inode block file for the`
			`inode by using sector addresses in the ifile instead of`
			`logical disk addresses.`
			`Fix the use of the ifile version field to use the generation`
			`number instead.`
			`DONE Unmount; not doing a bgetvp (VHOLD) in lfs_newbuf call.`
			`DONE Document in the README file where the checkpoint information is`
			`on disk.`
			`Variable block sizes (Margo/Keith).`
			`Switch the byte accounting to sector accounting.`
			`DONE Check lfs.h and make sure that the #defines/structures are all`
			`actually needed.`
			`DONE Add a check in lfs_segment.c so that if the segment is empty,`
			`we don't write it.`
			`Need to keep vnode v_numoutput up to date for pending writes?`
			`DONE USENIX paper (Carl/Margo).`


			`Evelyn:`
			`lfsck: If delete a file that's being executed, the version number`
			`isn't updated, and lfsck has to figure this out; case is the same as if have an inode that no directory references,`
			`so the file should be reattached into lost+found.`
			`Recovery/fsck.`

			`Carl:`
			`Investigate: clustering of reads (if blocks in the segment are ordered,`
			`should read them all) and writes (McVoy paper).`
			`Investigate: should the access time be part of the IFILE:`
			`pro: theoretically, saves disk writes`
			`con: cacheing inodes should obviate this advantage`
			`the IFILE is already humongous`
			`Cleaner.`
			`Port to OSF/1 (Carl/Keith).`
			`Currently there's no notion of write error checking.`
			`+ Failed data/inode writes should be rescheduled (kernel level`
			`bad blocking).`
			`+ Failed superblock writes should cause selection of new`
			`superblock for checkpointing.`

			`FUTURE FANTASIES: ============`

			`+ unrm, versioning`
			`+ transactions`
			`+ extended cleaner policies (hot/cold data, data placement)`

			`==============================`
			`Problem with the concept of multiple buffer headers referencing the segment:`
			`Positives:`
			`Don't lock down 1 segment per file system of physical memory.`
			`Don't copy from buffers to segment memory.`
			`Don't tie down the bus to transfer 1M.`
			`Works on controllers supporting less than large transfers.`
			`Disk can start writing immediately instead of waiting 1/2 rotation`
			`and the full transfer.`
			`Negatives:`
			`Have to do segment write then segment summary write, since the latter`
			`is what verifies that the segment is okay. (Is there another way`
			`to do this?)`
			`==============================`

			`The algorithm for selecting the disk addresses of the super-blocks`
			`has to be available to the user program which checks the file system.`

			`(Currently in newfs, becomes a common subroutine.)`