Reasoning as before.
Note that I am not going through and checking for 64->32 truncations
in inode numbers; I'm sure there are quite a few, but that's a project
for later.
This prevents regressions in the ulfs code when switching to the new
accessors. Note that while adding byteswapping to the other accessors
is straightforward, I haven't done it yet; and that also is not enough
to make LFS_EI work, because there are places lying around that bypass
the accessors for one reason and another and all of them need to be
updated. That is going to have to wait for a later day as LFS_EI is
not on the critical path right now.
(Mostly.)
The ufs-derived ones are fake structure member macros, which are gross
and not very safe. Also, it seems that a lot of places in the lfs code
were using the ffsv1 branch of them unconditionally, and this way it's
guaranteed all those places have been updated.
Found while doing this: for non-devices, have getattr produce NODEV
in the rdev field instead of leaking the address of the first direct
block.
(This part changes the native lfs code; the ufs-derived code already
has 64 vs. 32 logic, but as aspects of it are unsafe, and don't
entirely interoperate cleanly with the lfs 64/32 stuff, pass 2 will be
rehashing that.)
other lfs64 on-disk inode numbers; I've been doing that since this is
a new format and we may as well take the opportunity. This does assume
that more than 4 billion files on a single volume becomes desirable;
but for an average file size of 10K all that takes is a 40 TB volume,
and it's not that hard to make one of those these days if you want to
badly enough.
Keep the old BLOCK_INFO as BLOCK_INFO_70, and version the fcntls that
use it.
Note that BLOCK_INFO_70 has 64-bit padding issues so that it's
different on 32-bit and 64-bit machines. This has been fixed. However,
BLOCK_INFO also contains a pointer, so compat32 stuff for 32-on-64 is
still needed and doesn't currently exist.
(instead of the system call entry points)
Avoids duplication.
While touching these, pass the lwp around instead of the proc -- the
latter was there for no other reason than because once upon a time
struct proc was the first argument of all syscalls.
(For that matter, why not just use curlwp instead of passing it around
all over the place? The cost of passing it to every syscall probably
exceeds the cost of loading it from curcpu, even on machines where
it's not just kept in a register all the time.)
Also make note of a cleaner limitation: it seems that when it goes to
coalesce discontiguous files, it mallocs an array with one BLOCK_INFO
for every block in the file. Therefore, with 64-bit LFS, on a 32-bit
platform it will be possible to have files large enough to overflow
the cleaner's address space. Currently these will be skipped and cause
warnings via syslog.
At some point someone should rewrite the logic to coalesce files to
use chunks of some reasonable size, as discontinuity between such
chunks is immaterial and mallocing this much space is silly and
fragile. Also, the kernel only accepts up to 65536 blocks at a time
for bmapv and markv, so processing more than this at once probably
isn't useful and may not even work currently. I don't want to change
this around just now as it's not entirely trivial.
Last year when I killed off some evil dirop-related macros, I added
these assertions because if the things they asserted weren't true we'd
be leaking vnodes. Well, it seems that the code at the time did leak
vnodes, so certain failure cases (e.g. mkdir with disk full) would
assert. Nobody apparently tripped on this in the past fourteen months,
until I broke balloc so it always failed (unrelatedly) while working
on some LFS64 changes.
However, the vnode leak has since been removed by hannken@ as part of
the vnode cache changes, so the assertions are now superfluous;
instead, just make sure *vpp gets nulled on failure, and don't worry
about whether or not VU_DIROP is set as it shouldn't matter any more.
XXX: there's still a lot of gratuitous pointer aliasing in here that
should be tidied away.
Remove some hackish intentional 64->32 truncations next to the checks
using LFS_MAX_DADDR, and tackle the problem they handled in bmap
instead.
The problem: the magic block pointer value UNWRITTEN has magic value
-2, and if it's not handled specifically, uint32 -> uint64 promotion
turns it into 4294967294, which then causes consternation and
monkeyhouse downstream.
What's here is still kind of a hack, but it's a step forward.
Add pieces of support for using both superblock types where
convenient, and specifically to the superblock accessors, but don't
actually enable it anywhere.
First substantive step on PR 50000.
(lfs_curseg/lfs_nextseg in the superblock) using the wrong units.
These fields are for whatever reason the start addresses of segments
(measured in frags) rather than the segment numbers 0..n.
This only apparently affects dumping from a mounted fs; however, it
trashes the fs.
I would really, really like to have a static analysis tool that can
keep track of the units things are measured in, since fs code is full
of conversion macros and the macros are named inscrutable things like
"sntod" whose letters don't necessarily even correspond to the units
they convert. It is surprising that more of these are not wrong.
This contains all the accessor functions and macros out of lfs.h.
Add an include of lfs_accessors.h after all uses of lfs.h... except
for code that wants to define its own struct lfs-alike that the
accessors are supposed to play along with. For these, set STRUCT_LFS
and include lfs_accessors.h after the necessary structure has been
defined, so that lfs_accessors.h can emit functions in terms of it.
(This changes the rest of the code over; all the accessors were
already added.)
The difference between this commit and the previous one is arbitrary,
but the previous one passed the regression tests on its own so I'm
keeping it separate to help with any bisections that might be needed
in the future.
superblock. This will allow switching between 32/64 bit forms on the
fly; it will also allow handling LFS_EI reasonably tidily. (That
currently doesn't work on the superblock.)
It also gets rid of cpp abuse in the form of fake structure member
macros.
Also, instead of doing sleep/wakeup on &lfs_avail and &lfs_nextseg
inside the on-disk superblock, add extra elements to the in-memory
struct lfs for this. (XXX: these should be changed to condvars, but
not right now)
XXX: this migrates a structure needed by the lfs code in libsa (struct
salfs) into lfs.h, where it doesn't belong, but for the time being
this is necessary in order to allow the accessors (and the various
lfs macros and other goop that relies on them) to compile.
- Change lfs_valloc() to return an inode number and version instead of
a vnode and move lfs_ialloc() and lfs_vcreate() to new lfs_init_vnode().
- Add lfs_valloc_fixed() to allocate a known inode, used by kernel
roll forward.
- Remove lfs_*ref(), these functions cannot coexist with vcache and
their commented behaviour is far away from their implementation.
- Add the cleaner lwp and blockinfo to struct ulfsmount so lfs_loadvnode()
may use hints from the cleaner.
- Remove vnode locks from ulfs_lookup() like we did with ufs_lookup().
BLOCK_INFO and vnode lock type instead of the inode disk address and
return the vnode locked.
Change lfs_markv() and lfs_bmapv() to work on locked vnodes.
installs from booting.
Catch the common case and warn about it, pointing to a web page describing
the issue - but allow mounting. In all other cases, print more details about
the inconsistency and fail the mount.
device is an AppleUFS FS, 0 otherwise.
This changes the behavior a bit: if the kernel cannot determine whether the
disk is an AppleUFS one or not, it now considers it as a normal UFS rather
than returning an error and not mounting/reloading it.
No particular comment on tech-kern@
These are vestigial from ufs_readwrite.c with wapbl -- lfs does not
have a journal but only the explicit wapbl calls, not these flags,
got ripped out in the transition to ulfs_readwrite.c.
Still don't understand why the fstrans_done must happen after the
vput, and that will cause trouble once we move responsibility for the
vrele and unlock outside the vop as it seems obvious we ought to do
-- it's the caller's reference, not the vop's.
Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.
Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.
New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.
I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.
No externally visible semantic change. All atf fs tests still pass.
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
- instead of always calling DPRINTF with __func__, put __func__ directly
in the macro
- ffs_mountfs(): rename fsblockloc -> fs_sblockloc, initialize fs_sbsize
to zero
No real functional change
- rename ext2fs_checksb() -> ext2fs_sbcheck(): more consistent
- in ext2fs_sbcheck(), add a check to ensure e2fs_inode_size!=0,
otherwise division by zero
- add ext2fs_sbcompute(), to compute dynamic values of the superblock.
It is done twice in _reload() and _mountfs(), so put it in a function.
- reorder the code in charge of loading the superblock: now, read the
superblock, swap it directly, and *then* pass it to ext2fs_sbcheck().
It is similar to what ffs now does. It is better since the fields don't
need to be swapped on the fly in ext2fs_sbcheck().
Tested on amd64.
move the swap code inside the loop.
'fs->fs_sbsize' is swapped twice: the first time in order to get the
correct superblock size, and later when swapping the whole superblock
structure. As a result, we need to check 'fs->fs_sbsize' twice.
This:
- fixes my previous changes for swapped FSes
- allows the kernel to look for other superblock locations if the
current superblock is not validated
And now:
- ffs_superblock_validate() takes only one argument: the fs structure
- 'fs_bsize' is unused, so delete it
Add some comments to explain a bit what we are doing.