Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

The filesystem is broken - Read returns bad data under certain parameters combinations (cache not invalidated) #1057

Open
lrodorigo opened this issue Dec 18, 2024 · 5 comments
Labels

Comments

@lrodorigo
Copy link

lrodorigo commented Dec 18, 2024

Tested on v 2.10

We have discovered a critical issue with the lfs_file_read/lfs_file_seek functions where, under certain conditions (involving cache size, read data size, and read offset), the function returns wrong data (basically a shifted copy from the previous read).
image

This behavior leads to entirely incorrect data being read and seems to be caused by a lack of cache invalidation.
On the block device side, the reading of the block that actually contains data is skipped at all.
The returned data corresponds to the previous read operation and does not reflect the actual content at the specified offset. The issue can be workarounded by calling API functions that force cache invalidation (e.g., lfs_file_sync).

If the amount for previously read data changes, or the address of previously read data changes, or the cache size changes, the issue disappear, it seems to be related to a luckily combination of these factors.

We suspect this issue is related to #728, which appears unresolved.

Single file gists reproducing the issue, it can run on a Linux machine since it uses a temporary RAM block storage :
https://gist.github.com/lrodorigo/54c86e9cdcbd813aa6f4fc15c405427f

Many thanks for your work, I hope this will be addressed soon since it seems a critical lucky-dependent issue.

@lrodorigo lrodorigo changed the title Read returns bad data (cache not invalidated) The filesystem is broken - Read returns bad data under certain combination (cache not invalidated) Dec 18, 2024
@lrodorigo lrodorigo changed the title The filesystem is broken - Read returns bad data under certain combination (cache not invalidated) The filesystem is broken - Read returns bad data under certain parameers combination (cache not invalidated) Dec 18, 2024
@lrodorigo lrodorigo changed the title The filesystem is broken - Read returns bad data under certain parameers combination (cache not invalidated) The filesystem is broken - Read returns bad data under certain parameters combinations (cache not invalidated) Dec 18, 2024
@geky geky added the needs fix we know what is wrong label Dec 19, 2024
@geky
Copy link
Member

geky commented Dec 19, 2024

Hi @lrodorigo, thanks for creating an issue. And thanks for the reproducible case, it does look like this is an issue.

It looks like this was a mistake introduced by me in #632, which only happens if you're seek from the end of a block (note CTZ skip-list pointers make this wonky) to another location that appears in the invalid cache. You're right it is luck based, which is why I guess this hasn't hit more users. You probably need a random-read-heavy workload to run into this.

Unfortunately #728 flew under my radar. It was posted during a period I wasn't able to spend much time on littlefs and triaging has been difficult since many bugs end up porting mistakes.

Anyways, I've created a fix+tests for this, will create a PR shortly.

@geky
Copy link
Member

geky commented Dec 19, 2024

PR here: #1058

@lrodorigo
Copy link
Author

Fixed for me.
Many thanks.

Just as a side note, it was pretty easy for me to hit this bug, even without a reading-intensive applications .

I was using SQLite over LFS (by writing a custom virtual file system layer) with an STM32 microcontroller.
2048 bytes was both the cache size and also the SQLite page size.

The SQLite access pattern is exactly the one triggering this bug.

So, important fix for all SQLite users.

Many thanks

Ps can I ask you if there is some implementation example providing a caching at the block level?

@geky
Copy link
Member

geky commented Dec 19, 2024

Just as a side note, it was pretty easy for me to hit this bug, even without a reading-intensive applications .

I think then it may be the case that SQLite is not that common on MCUs. Usually the application state is simple enough to not need a full database.

But to be honest I don't have that much insight into how people are using littlefs when things aren't broken (maybe a poll would be interesting at some point?)

It's also likely that SQLite's read pattern depends on exactly what database operations you are doing, so more luck is involved...

Ps can I ask you if there is some implementation example providing a caching at the block level?

Sorry I don't currently have an example for this. The tricky part would be the data-structure mapping block address to cache entries (maybe via a hashmap?).

But it would be nice to add more examples/block device adapters at some point.

@lrodorigo
Copy link
Author

Fixed for me

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants