1999-03-15 03:46:47 +03:00
|
|
|
# $NetBSD: README,v 1.3 1999/03/15 00:46:47 perseant Exp $
|
1994-06-29 10:39:25 +04:00
|
|
|
|
|
|
|
# @(#)README 8.1 (Berkeley) 6/11/93
|
1994-06-08 15:41:58 +04:00
|
|
|
|
1999-03-15 03:46:47 +03:00
|
|
|
The file system is reasonably stable...I think.
|
1994-06-08 15:41:58 +04:00
|
|
|
|
1999-03-15 03:46:47 +03:00
|
|
|
For details on the implementation, performance and why garbage
|
|
|
|
collection always wins, see Dr. Margo Seltzer's thesis available for
|
|
|
|
anonymous ftp from toe.cs.berkeley.edu, in the directory
|
|
|
|
pub/personal/margo/thesis.ps.Z, or the January 1993 USENIX paper.
|
1994-06-08 15:41:58 +04:00
|
|
|
|
|
|
|
----------
|
|
|
|
The disk is laid out in segments. The first segment starts 8K into the
|
|
|
|
disk (the first 8K is used for boot information). Each segment is composed
|
|
|
|
of the following:
|
|
|
|
|
|
|
|
An optional super block
|
|
|
|
One or more groups of:
|
|
|
|
segment summary
|
|
|
|
0 or more data blocks
|
|
|
|
0 or more inode blocks
|
|
|
|
|
|
|
|
The segment summary and inode/data blocks start after the super block (if
|
|
|
|
present), and grow toward the end of the segment.
|
|
|
|
|
|
|
|
_______________________________________________
|
|
|
|
| | | | |
|
|
|
|
| summary | data/inode | summary | data/inode |
|
|
|
|
| block | blocks | block | blocks | ...
|
|
|
|
|_________|____________|_________|____________|
|
|
|
|
|
|
|
|
The data/inode blocks following a summary block are described by the
|
|
|
|
summary block. In order to permit the segment to be written in any order
|
|
|
|
and in a forward direction only, a checksum is calculated across the
|
|
|
|
blocks described by the summary. Additionally, the summary is checksummed
|
|
|
|
and timestamped. Both of these are intended for recovery; the former is
|
|
|
|
to make it easy to determine that it *is* a summary block and the latter
|
|
|
|
is to make it easy to determine when recovery is finished for partially
|
|
|
|
written segments. These checksums are also used by the cleaner.
|
|
|
|
|
|
|
|
Summary block (detail)
|
|
|
|
________________
|
|
|
|
| sum cksum |
|
|
|
|
| data cksum |
|
|
|
|
| next segment |
|
|
|
|
| timestamp |
|
|
|
|
| FINFO count |
|
|
|
|
| inode count |
|
|
|
|
| flags |
|
|
|
|
|______________|
|
|
|
|
| FINFO-1 | 0 or more file info structures, identifying the
|
|
|
|
| . | blocks in the segment.
|
|
|
|
| . |
|
|
|
|
| . |
|
|
|
|
| FINFO-N |
|
|
|
|
| inode-N |
|
|
|
|
| . |
|
|
|
|
| . |
|
|
|
|
| . | 0 or more inode daddr_t's, identifying the inode
|
|
|
|
| inode-1 | blocks in the segment.
|
|
|
|
|______________|
|
|
|
|
|
|
|
|
Inode blocks are blocks of on-disk inodes in the same format as those in
|
|
|
|
the FFS. However, spare[0] contains the inode number of the inode so we
|
|
|
|
can find a particular inode on a page. They are packed page_size /
|
|
|
|
sizeof(inode) to a block. Data blocks are exactly as in the FFS. Both
|
|
|
|
inodes and data blocks move around the file system at will.
|
|
|
|
|
|
|
|
The file system is described by a super-block which is replicated and
|
|
|
|
occurs as the first block of the first and other segments. (The maximum
|
|
|
|
number of super-blocks is MAXNUMSB). Each super-block maintains a list
|
|
|
|
of the disk addresses of all the super-blocks. The super-block maintains
|
|
|
|
a small amount of checkpoint information, essentially just enough to find
|
|
|
|
the inode for the IFILE (fs->lfs_idaddr).
|
|
|
|
|
|
|
|
The IFILE is visible in the file system, as inode number IFILE_INUM. It
|
|
|
|
contains information shared between the kernel and various user processes.
|
|
|
|
|
|
|
|
Ifile (detail)
|
|
|
|
________________
|
|
|
|
| cleaner info | Cleaner information per file system. (Page
|
|
|
|
| | granularity.)
|
|
|
|
|______________|
|
|
|
|
| segment | Space available and last modified times per
|
|
|
|
| usage table | segment. (Page granularity.)
|
|
|
|
|______________|
|
|
|
|
| IFILE-1 | Per inode status information: current version #,
|
|
|
|
| . | if currently allocated, last access time and
|
|
|
|
| . | current disk address of containing inode block.
|
|
|
|
| . | If current disk address is LFS_UNUSED_DADDR, the
|
|
|
|
| IFILE-N | inode is not in use, and it's on the free list.
|
|
|
|
|______________|
|
|
|
|
|
|
|
|
|
|
|
|
First Segment at Creation Time:
|
|
|
|
_____________________________________________________________
|
|
|
|
| | | | | | | |
|
|
|
|
| 8K pad | Super | summary | inode | ifile | root | l + f |
|
|
|
|
| | block | | block | | dir | dir |
|
|
|
|
|________|_______|_________|_______|_______|_______|_______|
|
|
|
|
^
|
|
|
|
Segment starts here.
|
|
|
|
|
|
|
|
Some differences from the Sprite LFS implementation.
|
|
|
|
|
|
|
|
1. The LFS implementation placed the ifile metadata and the super block
|
|
|
|
at fixed locations. This implementation replicates the super block
|
|
|
|
and puts each at a fixed location. The checkpoint data is divided into
|
|
|
|
two parts -- just enough information to find the IFILE is stored in
|
|
|
|
two of the super blocks, although it is not toggled between them as in
|
|
|
|
the Sprite implementation. (This was deliberate, to avoid a single
|
|
|
|
point of failure.) The remaining checkpoint information is treated as
|
|
|
|
a regular file, which means that the cleaner info, the segment usage
|
|
|
|
table and the ifile meta-data are stored in normal log segments.
|
|
|
|
(Tastes great, less filling...)
|
|
|
|
|
|
|
|
2. The segment layout is radically different in Sprite; this implementation
|
|
|
|
uses something a lot like network framing, where data/inode blocks are
|
|
|
|
written asynchronously, and a checksum is used to validate any set of
|
|
|
|
summary and data/inode blocks. Sprite writes summary blocks synchronously
|
|
|
|
after the data/inode blocks have been written and the existence of the
|
|
|
|
summary block validates the data/inode blocks. This permits us to write
|
|
|
|
everything contiguously, even partial segments and their summaries, whereas
|
|
|
|
Sprite is forced to seek (from the end of the data inode to the summary
|
|
|
|
which lives at the end of the segment). Additionally, writing the summary
|
|
|
|
synchronously should cost about 1/2 a rotation per summary.
|
|
|
|
|
|
|
|
3. Sprite LFS distinguishes between different types of blocks in the segment.
|
|
|
|
Other than inode blocks and data blocks, we don't.
|
|
|
|
|
|
|
|
4. Sprite LFS traverses the IFILE looking for free blocks. We maintain a
|
|
|
|
free list threaded through the IFILE entries.
|
|
|
|
|
|
|
|
5. The cleaner runs in user space, as opposed to kernel space. It shares
|
|
|
|
information with the kernel by reading/writing the IFILE and through
|
|
|
|
cleaner specific system calls.
|
|
|
|
|