Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

lfs_file_seek() poor performance when seeking backward by one byte with no apparent reason #810

Closed
LuchkaOnTheMoon opened this issue Apr 26, 2023 · 15 comments

Comments

@LuchkaOnTheMoon
Copy link

LuchkaOnTheMoon commented Apr 26, 2023

After experiencing poor performance in regular everyday use of LittleFS v2.5.1 apparently when working with large files (~100kB), I decided to develop a small deterministic test to stress the filesystem and gain a more accurate control of what was going wrong when long wait times occurred (~40 seconds or worse).

This is how my test looks:
Log1
After mounting the filesystem (you can see every filesystem configuration detail at the beginning of the log), I open "test.bin" file with LFS_O_RDWR | LFS_O_CREAT | LFS_O_TRUNC options (white), then I write a chunk of 512 bytes (red), I seek back to the beginning of the file (orange), then I write one "magic" byte (yellow) as first byte, I seek back again to the beginning of the file (green), I read 512 bytes back from file (cyan) and verify they match both the magic byte and what I wrote later, then I finally close the file (purple).
I then repeat this test sequence again and again, but on each loop I multiply read/write size by 2, so first iteration writes and checks 512 bytes, second 1024, third 2048, etc. so "test.bin" file is growing exponentially in size during the test.

In the first column of all my logs, on the left, you can see the timestamp in milliseconds of each operation.

This is what happens, sporadically and with no apparent reason:
Log2
As you can see, after writing one byte at the beginning of the file, when seeking file back by just one byte takes ~2 seconds.

Here, later:
Log3
in the same situation, seeking back by one byte takes ~4 seconds.

I forgot to mention, this test is performed on i.MX RT1052 evaluation board at 600MHz, so it's running directly on the target. I can also share the call stack, during this intervals in which the filesystem looks "freezed" during seek:
CallStack
From the call stack, you see lfs_file_seek() is invoking my own Filesystem_FlashErase() implementation, so it's erasing the FLASH. I have no idea while the filesystem is erasing during seek, but that's it.

Thanks in advance for your attention

@geky
Copy link
Member

geky commented Apr 26, 2023

Hi @LuchkaOnTheMoon, thanks for creating an issue.

Performance is a complicated issue in littlefs, and in the general case, poor, with several pitfalls. It hasn't been a priority, and this shows, though it's something I'm looking into and hoping to improve in the future.

First thing to note:

I have no idea while the filesystem is erasing during seek, but that's it.

It's not uncommon for filesystems to finish write operations during seek. In lfs_file_write, littlefs doesn't know if there will be more data coming, so it may just store data in the file's cache and wait for more data. But when lfs_file_seek is called, it's unlikely more data we be written at exactly that position, so littlefs goes ahead and writes out the pending data.

This helps keep the file state consistent, since the cache's contents depends on the file's internal file position.

I suppose it could be made a bit smarter by keeping a separate cache position, but since the file cache is used for both reads and writes, this would lead to lfs_file_read writing to disk which is probably even more unintuitive.

This is what happens, sporadically and with no apparent reason:

This looks suspiciously like the issue in #783

TLDR: Metadata is compacted on-demand when a metadata block fills up. This runs in $O(n^2)$ due to a poor assumption about the cost of block sizes and a lack of benchmarking. It's something I'm looking into but any fix will be non-trivial.

There are also some ideas around moving this cost to background work: #791

And one workaround is the using the metadata_max config to artificially limit the size of metadata blocks. More info the in above thread: #783 (comment)

@geky
Copy link
Member

geky commented Apr 26, 2023

I have no idea while the filesystem is erasing during seek, but that's it.

Oh, it's also worth noting this only happens when a file is open for writing. Read-only files will never write to disk.

@LuchkaOnTheMoon
Copy link
Author

LuchkaOnTheMoon commented Apr 27, 2023

Hi @geky, first of all thanks for your time and your comments.

It's not uncommon for filesystems to finish write operations during seek. In lfs_file_write, littlefs doesn't know if there will be more data coming, so it may just store data in the file's cache and wait for more data. But when lfs_file_seek is called, it's unlikely more data we be written at exactly that position, so littlefs goes ahead and writes out the pending data.

Ok, now this is clear and could be considered "logic" at least to me, so I tried to replace lfs_file_seek() with lfs_file_close() followed by lfs_file_open() - because in my test what I want to do is basically seeking to the beginning of the file - and all the wait time has now moved to lfs_file_close() operation instead of seek, as expected. But what sounds strange to me is the following:

image

First close() operation - marked in green - is taking just 10 ms; so, expanding this sentence to something more detailed, closing a file about 1Mb big, after writing about 1048576 bytes of valid data from the beginning, swapping cache from RAM to FLASH, writing metadata, etc., takes only 10 ms.
Second close() operation - marked in red - is taking about 4 seconds; again, expanding, closing the same file of about 1Mb, just changing its first byte, takes 4 seconds?

@BenBE
Copy link

BenBE commented Apr 27, 2023

Not sure how much work it would be to skip the cache invalidation on seek if the new seek position is still within the same block that the cache is pointing to anyway. Especially when just seeking backwards one byte it's quite likely that subsquent operations will affect the same block that the cache currently holds.

@LuchkaOnTheMoon
Copy link
Author

Not sure how much work it would be to skip the cache invalidation on seek if the new seek position is still within the same block that the cache is pointing to anyway. Especially when just seeking backwards one byte it's quite likely that subsquent operations will affect the same block that the cache currently holds.

Don't know too, but it looks it's not just a matter of cache invalidation, because it occurrs when seeking back by one byte but doesn't occurr when seeking back the whole file.

@LuchkaOnTheMoon
Copy link
Author

LuchkaOnTheMoon commented Apr 27, 2023

EDIT 1:
Tried the same test with block_cycle=100 (in previous runs it was set to 1), nothing changes significantly. Avoid posting logs about this case, pointless.

EDIT 2:
Tried the same test with block_cycle=1 but metadata_max=CACHE_SIZE=256 as per #783 but even in this case nothing seems to change significantly:
image
Investigating a little bit more, while lfs_file_close() is stucked, the call stack is the same as when lfs_file_seek() was stucked:
image
and stepping I see user Filesystem_FlashErase() is called repeatedly erasing so much adjacent blocks... and each block is 256 bytes, so the filesystem is cleaning up an area which is quite larger than cache or metadata.

EDIT 3:
Unfortunately, due to how it's implemented FLASH read/write/erase on i.MX RT1052 which is executing code from the same FLASH on which the filesystem is instantiated, during Filesystem_FlashErase() calls SysTick interrupt is masked so timestamps in my previous screenshots are not taking into account FLASH sector erase time... so, when I said "this operation took ~ 2 seconds" in reality it may took 5 seconds (i.e. 2 seconds spent in "code", and 3 seconds spend in 10 sectors erase, 300ms each).

@LuchkaOnTheMoon
Copy link
Author

LuchkaOnTheMoon commented Apr 28, 2023

Today I experimented rolling back version from 2.5.1 to 1.7.2, because some days ago I was checking out issue #214 and I read @diederikvdv has obtained an improvement by switching to LFSv1 (even if I'm not sure from which version to which version he switched - maybe from 2.0.2 to 1.0.6?) but, even if the logs looks different because version 1.7.2 doesn't include LFS_TRACE and I had to implement my own printf to log, I see no improvements in performance.

image

EDIT:
Tested also with version 1.0 (1.0.6 doesn't exist, so I cannot figure out what LFS1:0x00010006 is referring to in #214...) ---> ~ 8 seconds spent on lfs_file_close().

@geky
Copy link
Member

geky commented Apr 30, 2023

Ah, I was thinking too low-level. I think this is a symptom of #27. Not sure how I missed that.

littlefs's implementation of random-writes could be described as "excruciatingly naive", and you could argue littlefs really don't support random-writes on large files. The file data structure is append-only, so rewriting the beginning of a file ends up writing out the rest of the file contents.

This is something I'm looking into fixing, but it requires a new data-structure, which is going to take some time to introduce.

Unfortunately there's really not a way around this in the current design.


Not sure how much work it would be to skip the cache invalidation on seek if the new seek position is still within the same block that the cache is pointing to anyway. Especially when just seeking backwards one byte it's quite likely that subsquent operations will affect the same block that the cache currently holds.

This is a good question, it's likely the caching can be made smarter. This would need separate pointers for the file and the cache, but that is likely not a big deal.

Though hopefully the above issue highlights why this isn't a high priority for me at the moment.

so I tried to replace lfs_file_seek() with lfs_file_close() followed by lfs_file_open()

Ah, you can use lfs_file_sync to do the same thing without the file open cost.

Today I experimented rolling back version from 2.5.1 to 1.7.2, [...] I see no improvements in performance.

This makes sense. v2 mainly involved a redesign of the metadata layout, the data layout remained the same and has the same limitation.

I would highly discourage using v1 still as it had a number of fundamental problems,

@LuchkaOnTheMoon
Copy link
Author

Unfortunately these days I'm busy on other tests not involving LittleFS, I'll be back as soon as possibile for further investigation. Just to anticipate something, I have in mind to move validation byte (1 byte written at file beginning) to file end, to see if this changes performance benchmarks dramatically according to this:

littlefs's implementation of random-writes could be described as "excruciatingly naive", and you could argue littlefs really don't support random-writes on large files. The file data structure is append-only, so rewriting the beginning of a file ends up writing out the rest of the file contents.

@LuchkaOnTheMoon
Copy link
Author

LuchkaOnTheMoon commented May 12, 2023

Hi @geky
Today evening I had some time to perform other tests, according to what already discussed together.
I confirm that, about the problem mentioned in issue title, it looks like I found a working solution, and this solution seems to be the following:

image

Using this test sequence, which is presented above for a total file size of 2049 bytes (2048 bytes of data + 1 application-related validation byte at file end), I don't see anomalous wait times anymore.
I just want to warn about the fact that the following sentence:

Ah, you can use lfs_file_sync to do the same thing without the file open cost.

appears to be false, at least when the file is opened in R-only mode (I didn't do the test in W-only or RW modes), method succeeds but file pointer is not rewind back to file start position.

@LuchkaOnTheMoon
Copy link
Author

LuchkaOnTheMoon commented May 12, 2023

I confirm that lfs_file_sync() just flushes cached data to flash, but current file position is preserved in both R-only, W-only and R/W open modes.

I'm not saying this is not fine, I'm just saying that I replaced lfs_file_seek() with lfs_file_close() and lfs_file_open() pair because my need was to rewind the file back to start, and replacing with lfs_file_sync() as suggested by @geky simply doesn't work.

To replace lfs_file_seek(lfs, file, 0, LFS_SEEK_SET) there's lfs_file_rewind(lfs, file) but I don't like it very much since it's a pure wrap of lfs_file_rawseek() so at this point I personally prefer to continue using lfs_file_seek(lfs, file, 0, LFS_SEEK_SET) to rewind the file to the begin.

@robekras
Copy link
Contributor

I just didn't read the entire thread.
Maybe this issue is realted to this one: #632
Performance issue when doing a seek foreward should have been fixed (Although I didn't had time to check).
This has been done by checking whether the the new seek position is within the currently cached block data.
On forward seek this could easily be done.
As far as I remember on backward seek it is not really easy to determine if the new seek position is within the currently cached block data. As far as I remeber that was due to the fact, that the cached block data contain 'control' data at the beginning of the cached block. And the number of 'control' data at the beginning of the data block is not always the same.
That means when doing a seek backwards the relevant seek position data block has to be searched again. Which means in succession you have to traverse (and reading) a lot of data blocks (on large files) again and again, although the related (backward) seek position might be within an already cached data block.

BTW as far as I remember (it's over a year ago) searching the next data block is a time consuming process, as lfs has to read the blocks from file start. And this results in reading some (and more) data blocks just to determine the next block, although this information might be read previously (on a previous seek).

Example: For a larger file lfs has to read block 1, 4, 9, 15, 19, 25 to determine the block for the current seek position.
If you want to read the data from a previous (or the next) data block. At least, the blocks 1, 4, 9, 15, 19 has to be read again, just to determine the destination block (whith the current seek position).
That means, to read the data block which is of interrest (the one the new seek position points to) you have to read much more blocks. The larger the file is the more blocks has to be read.
With the fix in #632 the forward seek should be much more performant. But the backward seek is not affected by this.

A small cache might be helpful. Small blocks that only contains the block number and the block chain data.
When searching the block for the next file data to read, take a look into the block chain cache, and take this data instead of reading the hole block from storage memory.

Those were my thoughts over a year ago. Of course, I don't know if they still apply.

@LuchkaOnTheMoon
Copy link
Author

LuchkaOnTheMoon commented May 15, 2023

Hi @robekras thanks for your contribution, I think this issue is not related with #631 and related because initially in my tests (please, read before jumping to conclusions) I was seeking from end of file to begin of file (which requires FS to read from Flash with no doubts) and this operation was taking ~ 10ms, then I was writing a single byte and seeking back to begin again, which was more probably not just refreshing cache but re-writing the whole file to Flash, as mentioned by @geky here:

littlefs's implementation of random-writes could be described as "excruciatingly naive", and you could argue littlefs really don't support random-writes on large files. The file data structure is append-only, so rewriting the beginning of a file ends up writing out the rest of the file contents.

From my point of view, this issue could be closed, I'll continue to run other performance tests but for this specific issue the solution is to avoid to randomly-write a file, preferring linear writes instead.

@LuchkaOnTheMoon
Copy link
Author

For this specific issue the solution is to avoid write files at random position, preferring linear writes instead (from begin to end).

@geky
Copy link
Member

geky commented May 23, 2023

for this specific issue the solution is to avoid to randomly-write a file, preferring linear writes instead.

I think this is the best solution at the moment, for better or worse.

A small cache might be helpful. Small blocks that only contains the block number and the block chain data.
When searching the block for the next file data to read, take a look into the block chain cache, and take this data instead of reading the hole block from storage memory.

I haven't been able to look into more creative forms of caching. There's probably some low-hanging fruit to be found.

But an interesting thing to note about caching here. We can think of the CTZ skip-list as just a tree, since it must be tree shaped if lookups are $O(\log n)$, and trying to cache tree lookups turns out to be much trickier than it first appears.

If we read a file, i.e., traverse each branch of the tree one after the other, a simple least-recently-used (LRU) cache will throw out the root of the tree unless there is enough cache to save the entire height of the tree. Apparently some database people have had success with a sort of most-recently-used (MRU) cache, but that seems like an easy way to get cached entries "stuck" without some sort of periodic eviction.

I looked into this a bit for another data structure, and I think if you go deeper you end up needing a full branch-predictor of sorts.

Also it's worth noting littlefs is a bit smarter than "reading the whole block". Internally each read has a cache hint so we only read the prefix of pointers at each block. Though if read_size = block_size reading the whole block does happen:

littlefs/lfs.c

Lines 2787 to 2789 in ec3ec86

int err = lfs_bd_read(lfs,
pcache, rcache, sizeof(head),
head, 4*skip, &head, sizeof(head));

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants