In v2.5, we introduced an optimization to avoid rereading data when
seeking inside the file cache. Unfortunately this used a slightly
wrong condition to check if the cache was "live", which meant seeks from
end-of-blocks could end up with invalid caches and wrong data. Not
great.
The problem is the nuance of when a file's cache is "live":
1. The file is marked as LFS_F_READING or LFS_F_WRITING.
But we can't reuse the cache when writing, so we only care about
LFS_F_READING.
2. file->off != lfs->cfg->block_size (end-of-block).
This is an optimization to avoid eagerly reading blocks we may not
actually care about.
We weren't checking for the end-of-block case, which meant if you seeked
_from_ the end of a block to a seemingly valid location in the file
cache, you could end up with an invalid cache.
Note that end-of-block may not be powers-of-two due to CTZ skip-list
pointers.
---
The fix is to check for the end-of-block case in lfs_file_seek. Note
this now matches the need-new-block logic in lfs_file_flushedread.
This logic change may also make lfs_file_seek call lfs_file_flush more
often, but only in cases where lfs_file_flush is a noop.
I've also extended the test_seek tests to cover a few more boundary-read
cases and prevent a regression in the future.
Found by wjl and lrodorigo
- before: lfs_file_open("missing/") => LFS_ERR_ISDIR
- after: lfs_file_open("missing/") => LFS_ERR_NOTDIR
As noted by bmcdonnell-fb, returning LFS_ERR_ISDIR here was inconsistent
with the case where the file exists:
case before after
lfs_file_open("dir_a") => LFS_ERR_ISDIR LFS_ERR_ISDIR
lfs_file_open("dir_a/") => LFS_ERR_ISDIR LFS_ERR_ISDIR
lfs_file_open("reg_a/") => LFS_ERR_NOTDIR LFS_ERR_NOTDIR
lfs_file_open("missing_a/") => LFS_ERR_ISDIR LFS_ERR_NOTDIR
Note this is consistent with the behavior of lfs_stat:
lfs_file_open("reg_a/") => LFS_ERR_NOTDIR
lfs_stat("reg_a/") => LFS_ERR_NOTDIR
And the only other function that can "create" files, lfs_rename:
lfs_file_open("missing_a/") => LFS_ERR_NOTDIR
lfs_rename("reg_a", "missing_a/") => LFS_ERR_NOTDIR
There is some ongoing discussion about if these should return NOTDIR,
ISDIR, or INVAL, but this is at least an improvement over the
rename/open mismatch.
- test_paths_noent_trailing_slashes
- test_paths_noent_trailing_dots
- test_paths_noent_trailing_dotdots
These managed to slip through our path testing but should be tested, if
anything just to know exactly what errors these return.
Before this, the empty path ("") was treated as an alias for the root.
This was unintentional and just a side-effect of how the path parser
worked.
Now, the empty path should always result in LFS_ERR_INVAL:
- before: lfs_stat("") => 0
- after: lfs_stat("") => LFS_ERR_INVAL
Unlike normal files, dots (".") should not change the depth when
attempting to skip dotdot ("..") entries.
A weird nuance in the path parser, but at least it had a relatively easy
fix.
Added test_paths_dot_dotdots to prevent a regression.
This changes the behavior of paths that attempt to navigate above root
to now return LFS_ERR_INVAL:
- before: lfs_stat("/../a") => 0
- after: lfs_stat("/../a") => LFS_ERR_INVAL
This is a bit of an opinionated change while making other path
resolution tweaks.
In terms of POSIX-compatibility, it's a bit unclear exactly what dotdots
above the root should do.
POSIX notes:
> As a special case, in the root directory, dot-dot may refer to the
> root directory itself.
But the word choice of "may" implies it is up to the implementation.
I originally implement this as a root-loop simply because that is what
my Linux machine does, but I now think that's not the best option. Since
we're making other path-related tweaks, we might as well try to adopt
behavior that is, in my opinion, safer and less... weird...
This should also help make paths more consistent with future theoretical
openat-list APIs, where saturating at the current directory is sort of
the least expected behavior.
- lfs_mkdir now accepts trailing slashes:
- before: lfs_mkdir("a/") => LFS_ERR_NOENT
- after: lfs_mkdir("a/") => 0
- lfs_stat, lfs_getattr, etc, now reject trailing slashes if the file is
not a directory:
- before: lfs_stat("reg_a/") => 0
- after: lfs_stat("reg_a/") => LFS_ERR_NOTDIR
Note trailing slashes are accepted if the file is a directory:
- before: lfs_stat("dir_a/") => 0
- after: lfs_stat("dir_a/") => 0
- lfs_file_open now returns LFS_ERR_NOTDIR if the file exists but the
path contains trailing slashes:
- before: lfs_file_open("reg_a/") => LFS_ERR_NOENT
- after: lfs_file_open("reg_a/") => LFS_ERR_NOTDIR
To make these work, the internal lfs_dir_find API required some
interesting changes:
- lfs_dir_find no longer sets id=0x3ff on not finding a parent entry in
the path. Instead, lfs_path_islast can be used to determine if the
modified path references a parent entry or child entry based on the
remainder of the path string.
Note this is only necessary for functions that create new entries
(lfs_mkdir, lfs_rename, lfs_file_open).
- Trailing slashes mean we can no longer rely on the modified path being
NULL-terminated. lfs_path_namelen provides an alternative to strlen
that stops at slash or NULL.
- lfs_path_isdir also tells you if the modified path must reference a
dir (contains trailing slashes). I considered handling this entirely
in lfs_dir_find, but the behavior of entry-creating functions is too
nuanced.
At least lfs_dir_find returns LFS_ERR_NOTDIR if the file exists on
disk.
Like strlen, lfs_path_namelen/islast/isdir are all O(n) where n is the
name length. This isn't great, but if you're using filenames large
enough for this to actually matter... uh... open an issue on GitHub and
we might improve this in the future.
---
There are a couple POSIX incompatibilities that I think are not
worth fixing:
- Root modifications return EINVAL instead of EBUSY:
- littlefs: remove("/") => EINVAL
- POSIX: remove("/") => EBUSY
Reason: This would be the only use of EBUSY in the system.
- We accept modifications of directories with trailing dots:
- littlefs: remove("a/.") => 0
- POSIX: remove("a/.") => EBUSY
Reason: Not worth implementing.
- We do not check for existence of directories followed by dotdots:
- littlefs: stat("a/missing/..") => 0
- POSIX: stat("a/missing/..") => ENOENT
Reason: Difficult to implement non-recursively.
- We accept modifications of directories with trailing dotdots:
- littlefs: rename("a/b/..", "c") => 0
- POSIX: rename("a/b/..", "c") => EBUSY
Reason: Not worth implementing.
These are at least now documented in tests/test_paths.toml, which isn't
the greatest location, but it's at least something until a better
document is created.
Note that these don't really belong in SPEC.md because path parsing is
a function of the driver and has no impact on disk.
As expected these are failing and will need some work to pass.
The issue with lfs_file_open allowing trailing slashes was found by
rob-zeno, and the issue with lfs_mkdir disallowing trailing slashes was
found by XinStellaris, PoppaChubby, pavel-kirienko, inf265, Xywzel,
steverpalmer, and likely others.
This should be a superset of the previous test_paths test suite, while
covering a couple more things (more APIs, more path synonyms, utf8,
non-printable ascii, non-utf8, etc).
Not yet tested are some corner cases with known bugs, mainly around
trailing slashes.
- Added METADATA_MAX to test_runner.
- Added METADATA_MAX to bench_runner.
- Added a simple metadata_max test to test_superblocks, for lack of
better location.
There have been several issues floating around related to metadata_max
and LFS_ERR_NOSPC which makes me think there's a bug in our metadata_max
logic.
metadata_max was a quick patch and is relatively untested, so an
undetected bug isn't too surprising. This commit adds at least some
testing over metadata_max.
Sure enough, the new test_superblocks_metadata_max test reveals a
curious LFS_ERR_NAMETOOLONG error that shouldn't be there.
More investigation needed.
The block allocator is an area where inferred block counts (when
cfg.block_count=0) are more likely to cause problems.
As is shown by the recent divide-by-zero-exhaustion issue.
The documentation does not match the implementation here. The intended
behavior of superblock expansion was to duplicate the current superblock
entry into the new superblock:
.--------. .--------.
.|littlefs|->|littlefs|
||bs=4096 | ||bs=4096 |
||bc=256 | ||bc=256 |
||crc32 | ||root dir|
|| | ||crc32 |
|'--------' |'--------'
'--------' '--------'
The main benefit is that we can rely on the magic string "littlefs"
always residing in blocks 0x{0,1}, even if the superblock chain has
multiple superblocks.
The downside is that earlier superblocks in the superblock chain may
contain out-of-date configuration. This is a bit annoying, and risks
hard-to-reach bugs, but in theory shouldn't break anything as long as
the filesystem is aware of this.
Unfortunately this was lost at some point during refactoring in the
early v2-alpha work. A lot of code was moving around in this stage, so
it's a bit hard to track down the change and if it was intentional. The
result is superblock expansion creates a valid linked-list of
superblocks, but only the last superblock contains a valid superblock
entry:
.--------. .--------.
.|crc32 |->|littlefs|
|| | ||bs=4096 |
|| | ||bc=256 |
|| | ||root dir|
|| | ||crc32 |
|'--------' |'--------'
'--------' '--------'
What's interesting is this isn't invalid as far as lfs_mount is
concerned. lfs_mount is happy as long as a superblock entry exists
anywhere in the superblock chain. This is good for compat flexibility,
but is the main reason this has gone unnoticed for so long.
---
With the benefit of more time to think about the problem, it may have
been more preferable to copy only the "littlefs" magic string and NOT
the superblock entry:
.--------. .--------.
.|littlefs|->|littlefs|
||crc32c | ||bs=4096 |
|| | ||bc=256 |
|| | ||root dir|
|| | ||crc32 |
|'--------' |'--------'
'--------' '--------'
This would allow for simple "littlefs" magic string checks without the
risks associated with out-of-date superblock entries.
Unfortunately the current implementation errors if it finds a "littlefs"
magic string without an associated superblock entry, so such a change
would not be compatible with old drivers.
---
This commit tweaks superblock expansion to duplicate the superblock
entry instead of simply moving it to the new superblock. And adds tests
over the magic string "littlefs" both before and after superblock
expansion.
Found by rojer and Nikola Kosturski
Some forms of storage, mainly anything with an FTL, eMMC, SD, etc, do
not guarantee a strict write order for writes to different blocks. It
would be good to test that this doesn't break littlefs.
This adds LFS_EMUBD_POWERLOSS_OOO to lfs_emubd, which tells lfs_emubd to
try to break any order-dependent code on powerloss.
The behavior right now is a bit simple, but does result in test
breakage:
1. Save the state of the block on first write (erase really) after
sync/init.
2. On powerloss, revert the first write to its original state.
This might be a bit confusing when debugging, since the block will
appear to time-travel, but doing anything fancier would make emubd quite
a bit more complicated.
You could also get a bit fancier with which/how many blocks to revert,
but this should at least be sufficient to make sure bd sync calls are in
the right place.
Inlined files live in metadata and decrease storage requirements, but
may be limited to improve metadata-related performance. This is
especially important given the current plague of metadata performance.
Though decreasing inline_max may make metadata more dense and increase
block usage, so it's important to benchmark if optimizing for speed.
The underlying limits of inlined files haven't changed:
1. Inlined files need to fit in RAM, so <= cache_size
2. Inlined files need to fit in a single attr, so <= attr_max
3. Inlined files need to fit in 1/8 of a block to avoid metadata
overflow issues, this is after limiting by metadata_max,
so <= min(metadata_max, block_size)/8
By default, the largest possible inline_max is used. This preserves
backwards compatibility and is probably a good default for most use
cases.
This does have the awkward effect of requiring inline_max=-1 to
indicate disabled inlined files, but I don't think there's a good
way around this.
This extends lfs_fs_gc to now handle three things:
1. Calls mkconsistent if not already consistent
2. Compacts metadata > compact_thresh
3. Populates the block allocator
Which should be all of the janitorial work that can be done without
additional on-disk data structures.
Normally, metadata compaction occurs when an mdir is full, and results in
mdirs that are at most block_size/2.
Now, if you call lfs_fs_gc, littlefs will eagerly compact any mdirs that
exceed the compact_thresh configuration option. Because the resulting
mdirs are at most block_size/2, it only makes sense for compact_thresh to
be >= block_size/2 and <= block_size.
Additionally, there are some special values:
- compact_thresh=0 => defaults to ~88% block_size, may change
- compact_thresh=-1 => disables metadata compaction during lfs_fs_gc
Note that compact_thresh only affects lfs_fs_gc. Normal compactions
still only occur when full.
- Renamed lfs.free -> lfs.lookahead
- Renamed lfs.free.off -> lfs.lookahead.start
- Renamed lfs.free.i -> lfs.lookahead.next
- Renamed lfs.free.ack -> lfs.lookahead.ckpoint
- Renamed lfs_alloc_ack -> lfs_alloc_ckpoint
These have been named a bit confusingly, and I think the new names make
their relevant purposes a bit clearer.
At the very it's clear lfs.lookahead is related to the lookahead buffer.
(and doesn't imply a closed free-bitmap).
The idea is in the future this function may be extended to support other
block janitorial work. In such a case calling this lfs_fs_gc provides a
more general name that can include other operations.
This is currently just wishful thinking, however.
- Test that the code actually runs.
- Test that lfs_fs_findfreeblocks does not break block allocations.
- Test that lfs_fs_findfreeblocks does not error when no space is
available, it should only errors when the block is actually needed.
The initial implementation for this was provided by kaetemi, originally
as a mount flag. However, it has been modified here to be self-contained
in an explicit runtime function that can be called after mount.
The reasons for an explicit function:
1. lfs_mount stays a strictly readonly operation, and avoids pulling in
all of the write machinery.
2. filesystem-wide operations such as lfs_fs_grow can be a bit risky,
and irreversable. The action of growing the filesystem should be very
intentional.
---
One concern with this change is that this will be the first function
that changes metadata in the superblock. This might break tools that
expect the first valid superblock entry to contain the most recent
metadata, since only the last superblock in the superblock chain will
contain the updated metadata.
In separating the configuration of littlefs from the physical geometry
of the underlying device, we can no longer rely solely on lfs_config to
contain all of the information necessary for the simulated block devices
we use for testing.
This adds a new lfs_*bd_config struct for each of the block devices, and
new erase_size/erase_count fields. The erase_* name was chosen since
these reflect the (simulated) physical erase size and count of
erase-sized blocks, unlike the block_* variants which represent logical
block sizes used for littlefs's bookkeeping.
It may be worth adopting erase_size/erase_count in littlefs's config at
some point in the future, but at the moment doesn't seem necessary.
Changing the lfs_bd_config structs to be required is probably a good
idea anyways, as it moves us more towards separating the bds from
littlefs. Though we can't quite get rid of the lfs_config parameter
because of the block-device API in lfs_config. Eventually it would be
nice to get rid of it, but that would require API breakage.
The intention is to help interop with older minor versions of littlefs.
Unfortunately, since lfs2.0 drivers cannot mount lfs2.1 images, there are
situations where it would be useful to write to write strictly lfs2.0
compatible images. The solution here adds a "disk_version" configuration
option which determines the behavior of lfs2.1 dependent features.
Normally you would expect this to only change write behavior. But since the
main change in lfs2.1 increased validation of erased data, we also need to
skip this extra validation (fcrc) or see terrible slowdowns when writing.
In terms of ease-of-use, a user familiar with other filesystems expects
block_usage in fsinfo. But in terms of practicality, block_usage can be
expensive to find in littlefs, so if it's not needed in the resulting
fsinfo, that operation is wasteful.
It's not clear to me what the best course of action is, but since
block_usage can always be added to fsinfo later, but not removed without
breaking backwards compatibility, I'm leaving this out for now.
Block usage can still be found by explicitly calling lfs_fs_size.
Version are now returned with major/minor packed into 32-bits,
so 0x00020001 is the current disk version, for example.
1. This needed to change to use a disk_* prefix for consistency with the
defines that already exist for LFS_VERSION/LFS_DISK_VERSION.
2. Encoding the version this way has the nice side effect of making 0 an
invalid value. This is useful for adding a similar config option
that needs to have reasonable default behavior for backwards
compatibility.
In theory this uses more space, but in practice most other config/status
is 32-bits in littlefs. We would be wasting this space for alignment
anyways.
This function naturally doesn't exist in the previous version. We should
eventually add these calls when we can expect the previous version to
support this function, though it's a bit unclear when that should happen.
Or maybe not! Maybe this is testing more of the previous version than we
really care about.
LFS_VERSION -> LFS_DISK_VERSION
These tests shouldn't depend on LFS_VERSION. It's a bit subtle, but
LFS_VERSION versions the API, and LFS_DISK_VERSION versions the
on-disk format, which is what test_compat should be testing.
Currently this includes:
- minor_version - on-disk minor version
- block_usage - estimated number of in-use blocks
- name_max - configurable name limit
- file_max - configurable file limit
- attr_max - configurable attr limit
These are currently the only configuration operations that need to be
written to disk. Other configuration is either needed to mount, such as
block_size, or does not change the on-disk representation, such as
read/prog_size.
This also includes the current block usage, which is common in other
filesystems, though a more expensive to find in littlefs. I figure it's
not unreasonable to make lfs_fs_stat no worse than block allocation,
hopefully this isn't a mistake. It may be worth caching the current
usage after the most recent lookahead scan.
More configuration may be added to this struct in the future.
lfs_fs_mkconsistent allows running the internal consistency operations
(desuperblock/deorphan/demove) on demand and without any other
filesystem changes.
This can be useful for front-loading and persisting consistency operations
when you don't want to pay for this cost on the first write to the
filesystem.
Conveniently, this also offers a way to force the on-disk minor version
to bump, if that is wanted behavior.
Idea from kasper0
The underlying issue is that lfs_fs_deorphan did not updating gstate
correctly. The way it determined if there are any orphans remaining in
the filesystem was by subtracting the number of found orphans from an
internal counter.
This internal counter is a leftover from a previous implementation that
allowed leaving the lfs_fs_deorphan loop early if we know the number of
expected orphans. This can happen during recursive mdir relocations, but
with only a single bit in the gstate, can't happen during mount. If we
detect orphans during mount, we set this internal counter to 1, assuming
we will find at least one orphan.
But this presents a problem, what if we find _no_ orphans? If this happens
we never decrement the internal counter of orphans, so we would never
clear the bit in the gstate. This leads to a running lfs_fs_deorphan
on more-or-less every mutable operation in the filesystem, resulting in
an extreme performance hit.
The solution here is to not subtract the number of found orphans, but assume
that when our lfs_fs_deorphan loop finishes, we will have no orphans, because
that's the whole point of lfs_fs_deorphan.
Note that the early termination of lfs_fs_deorphan was dropped because
it would not actually change the runtime complexity of lfs_fs_deorphan,
adds code cost, and risks fragile corner cases such as this one.
---
Also added tests to assert we run lfs_fs_deorphan at most once.
Found by kasper0 and Ldd309
This just means a rewrite of the superblock entry with the new minor
version.
Though it's interesting to note, we don't need to rewrite the superblock
entry until the first write operation in the filesystem, an optimization
that is already in use for the fixing of orphans and in-flight moves.
To keep track of any outdated minor version found during lfs_mount, we
can carve out a bit from the reserved bits in our gstate. These are
currently used for a counter tracking the number of orphans in the
filesystem, but this is usually a very small number so this hopefully
won't be an issue.
In-device gstate tag:
[-- 32 --]
[1|- 11 -| 10 |1| 9 ]
^----^-----^--^--^-- 1-bit has orphans
'-----|--|--|-- 11-bit move type
'--|--|-- 10-bit move id
'--|-- 1-bit needs superblock
'-- 9-bit orphan count
This is a bit tricky since we need two different version of littlefs in
order to test for most compatibility concerns.
Fortunately we already have scripts/changeprefix.py for version-specific
symbols, so it's not that hard to link in the previous version of
littlefs in CI as a separate set of symbols, "lfsp_" in this case.
So that we can at least test the compatibility tests locally, I've added
an ifdef against the expected define "LFSP" to define a set of aliases
mapping "lfsp_" symbols to "lfs_" symbols. This is manual at the moment,
and a bit hacky, but gets the job done.
---
Also changed BUILDDIR creation to derive subdirectories from a few
Makefile variables. This makes the subdirectories less manual and more
flexible for things like LFSP. Note this wasn't possible until BUILDDIR
was changed to default to "." when omitted.
Removed the weird alignment requirement from the general truncate tests.
This explicitly hid off-by-one truncation errors.
These tests now reveal the same issue as the block-sized truncation test
while also testing for other potential off-by-one errors.
When truncation is done on a file to the block size, there seems to be
an error where it points to an incorrect block. Perform a write /
truncate / readback operation to verify this issue.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>