Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

mtd: Change API to return 0 on success #13896

Merged

Conversation

fabian18
Copy link
Contributor

Returning the number of bytes written/read could return a negative integer
because a uint32_t is expected for the length in read()/write() operations.

Contribution description

MTD drivers returned the number of bytes read/written as a plain integer but this could overflow because a uint32_t is expected as a function parameter, being the length to read/write.
So this PR changes the return value to be 0 on a successful read/write operation.

Testing procedure

cd tests/mtd_flashpage
make BOARD=bluepill PORT=/dev/ttyUSB0 flash term

2020-04-17 17:47:00,291 # Help: Press s to start test, r to print it is ready
s
2020-04-17 17:47:02,154 # START
2020-04-17 17:47:02,162 # main(): This is RIOT! (Version: 2020.04-devel-2062-gb4655-mtd-do-not-return-length-on-success)
2020-04-17 17:47:02,627 # ....
2020-04-17 17:47:02,627 # OK (4 tests)

cd tests/mtd_mapper
make BOARD=bluepill PORT=/dev/ttyUSB0 flash term
2020-04-17 17:48:19,200 # �Help: Press s to start test, r to print it is ready
s
2020-04-17 17:48:21,032 # START
2020-04-17 17:48:21,040 # main(): This is RIOT! (Version: 2020.04-devel-2062-gb4655-mtd-do-not-return-length-on-success)
2020-04-17 17:48:21,047 # ....
2020-04-17 17:48:21,048 # OK (4 tests)

cd tests/pkg_spiffs
make BOARD=nucleo-f767zi flash term

2020-04-17 19:01:04,472 # START
2020-04-17 19:01:04,480 # main(): This is RIOT! (Version: 2020.04-devel-2062-gb4655-mtd-do-not-return-length-on-success)
2020-04-17 19:01:04,503 # .........
2020-04-17 19:01:04,504 # OK (9 tests)

cd examples/filesystem
make BOARD=nucleo-f767zi flash term

> vfs r /const/hello-world
2020-04-17 19:04:17,346 #  vfs r /const/hello-world
2020-04-17 19:04:17,352 # 00000000: 4865 6c6c 6f20 576f 726c 6421 0a00       Hello World!..
2020-04-17 19:04:17,354 # -- EOF --

cd tests/unittests/
make tests-vfs BOARD=nucleo-f767zi flash term

2020-04-17 19:06:21,916 # main(): This is �main(): This is RIOT! (Version: 2020.04-devel-2062-gb4655-mtd-do-not-return-length-on-success)
2020-04-17 19:06:21,920 # Help: Press s to start test, r to print it is ready
s
2020-04-17 19:06:23,240 # START
2020-04-17 19:06:23,245 # .......................................
2020-04-17 19:06:23,247 # OK (39 tests)

Issues/PRs references

Came up in #13877

@fabian18
Copy link
Contributor Author

I hope i did not miss anything as the change is less than I thought it would be.

I also found other structs nvram_t and vfs_file_ops having a similar API and returning the number of bytes read/written.
nvram_t::int (*read)(struct nvram *dev, uint8_t *dst, uint32_t src, size_t size);
vfs_file_ops::ssize_t (*read) (vfs_file_t *filp, void *dest, size_t nbytes);

Conversion from mtd to vfs happens for example in
drivers/mtd/mtd-vfs: mtd_vfs_read()

I am not sure how far they can coexist.

@benpicco benpicco added Area: fs Area: File systems Area: sys Area: System Process: API change Integration Process: PR contains or issue proposes an API change. Should be handled with care. labels Apr 17, 2020
@maribu
Copy link
Member

maribu commented Apr 17, 2020

TL;DR: I think the return value should be of type ssize_t and the length of type size_t, while the offset should remain of type uint32_t.

Reasoning:

Returning a size as int certainly is not a good idea. It is really so out of place to use int for sizes, that programmers seeing the signature of such a function without reading the documentation become mislead to believing that a return value of 0 would indicate success.

However: The overflow issue cannot happen for any currently supported platform: The read function reads from storage and writes into RAM, the write functions reads from RAM. So in both cases the capacity of the RAM is the limiting factor for the length. And for all platforms supported currently, an int is big enough. However: While for the offset an uint32_t indeed makes sense (the storage can be much bigger than RAM and only the size of the storage is limiting the offset), the length should just be size_t like everywhere else.

Using an ssize_t as return value would IMHO be the best. This would match also the read() and write() functions of POSIX very well and feel natural to most RIOT users. In the end a design goal of RIOT is (optional) POSIX compatibility, and there having the number of bytes read/written needs to be returned.

@benpicco
Copy link
Contributor

There is the case to be made that returning 0 on success makes the code cleaner.

@fabian18
Copy link
Contributor Author

There is the case to be made that returning 0 on success makes the code cleaner.

If the function returns ssize_t, it means it can fail, or it returns a number of bytes between 0 and some variable length. For example mtd_vfs_read() does not necessarily return the full length on success. Where mtd either fails or writes the full given length. At least this is the difference I see.

@fabian18
Copy link
Contributor Author

fabian18 commented Apr 18, 2020

But I see another argument for ssize_t. Because of equal semantics, it is unlikely that something breaks.

@maribu
Copy link
Member

maribu commented Apr 18, 2020

There is the case to be made that returning 0 on success makes the code cleaner.

Indeed. And likely this translates into a few bytes less ROM as well.

I tried to come up with a convincing example where having the number of bytes actually written before an error occurred would be useful. However, this is something so unlikely to happen with so little benefit of the additional knowledge, that investing the ROM space to handle this is kind of hard to sell. Also, this error handling code likely ends up poorly tested (as those errors won't pop up frequently in practice) and likely turns out to be dysfunctional when (after years of waiting) the error handling actually gets triggered in the wild for somebody.

Consider me convinced that returning zero makes more sense. However, I would like to additionally use size_t for the length (but not for the offset) parameter. But maybe it makes sense to have this in a follow up PR, so that the discussion here can focus on the change of the return type.

@fabian18 fabian18 requested a review from MrKevinWeiss as a code owner April 19, 2020 11:23
@fabian18
Copy link
Contributor Author

I found some more lines to be changed.

More test output:

cd tests/unittests
make tests-mtd BOARD=bluepill flash term PORT=/dev/ttyUSB0

2020-04-19 12:51:35,096 # main(): This is RIOT! (Version: 2020.04-devel-2070-g2f611-mtd-do-not-return-length-on-success)
2020-04-19 12:51:35,100 # Help: Press s to start test, r to print it is ready
s
2020-04-19 12:51:36,690 # START
2020-04-19 12:51:36,692 # .....
2020-04-19 12:51:36,693 # OK (5 tests)

cd examples/filesystem
USEMODULE=esp_spiffs make BOARD=esp32-wroom-32 flash term PORT=/dev/ttyUSB1

2020-04-19 12:54:02,118 # main(): This is RIOT! (Version: 2020.04-devel-2070-g2f611-mtd-do-not-return-length-on-success)
2020-04-19 12:54:02,119 # constfs mounted successfully
vfs r /const/hello-world
2020-04-19 12:54:19,940 #  vfs r /const/hello-world
2020-04-19 12:54:19,950 # 00000000: 4865 6c6c 6f20 576f 726c 6421 0a00       Hello World!..
2020-04-19 12:54:19,950 # -- EOF --


cd examples/pkg_littlefs
make BOARD=bluepill flash term PORT=/dev/ttyUSB0

2020-04-19 13:06:50,897 # Help: Press s to start test, r to print it is ready
s
2020-04-19 13:06:52,760 # START
2020-04-19 13:06:52,768 # main(): This is RIOT! (Version: 2020.04-devel-2070-g2f611-mtd-do-not-return-length-on-success)
2020-04-19 13:06:52,806 # ........
2020-04-19 13:06:52,807 # OK (8 tests)

cd examples/pkg_littlefs
make BOARD=bluepill flash term PORT=/dev/ttyUSB0

2020-04-19 13:08:06,251 # Help: Press s to start test, r to print it is ready
s
2020-04-19 13:08:07,914 # START
2020-04-19 13:08:07,923 # main(): This is RIOT! (Version: 2020.04-devel-2070-g2f611-mtd-do-not-return-length-on-success)
2020-04-19 13:08:07,988 # ........
2020-04-19 13:08:07,988 # OK (8 tests)

@fabian18
Copy link
Contributor Author

However, I would like to additionally use size_t for the length (but not for the offset) parameter. But maybe it makes sense to have this in a follow up PR, so that the discussion here can focus on the change of the return type.

agree

@fabian18 fabian18 force-pushed the mtd-do-not-return-length-on-success branch from c628d4d to 1aa3231 Compare April 22, 2020 08:17
@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Apr 30, 2020
@benpicco
Copy link
Contributor

benpicco commented Apr 30, 2020

I'm very much in favor of this change.

And while we discuss the MTD API, why does it expose page_size, sector_count and pages_per_sector? Aren't those details that the API is supposed to hide?
(Almost?) all of the time they are just used to get the total size: mtd->page_size * mtd->sector_count * mtd->pages_per_sector.

Also those values are hard-coded instead of being queried from the underlying driver.
In the case of SPI Flash and SD Cards it is a very valid use case to detect those at run-time.

nvm those are actually used by file systems.
And mtd_sdcard_init() will take care of setting the right parameters

@benpicco benpicco added the Process: needs >1 ACK Integration Process: This PR requires more than one ACK label Apr 30, 2020
@benpicco benpicco requested a review from bergzand April 30, 2020 19:59
@maribu maribu requested a review from vincent-d April 30, 2020 21:15
@maribu
Copy link
Member

maribu commented Apr 30, 2020

@vincent-d: As one of the authors of this API, it would be good to have your opinion on this, too. I would like to have this in, but I have zero prior experience with the MTD API. Thus, I would like to avoid being the person giving the second ACK.

@benpicco
Copy link
Contributor

benpicco commented May 2, 2020

While discussing the MTD API, I noticed another problem.
.read(), .write() and .erase() all take a uint32_t addr - this will break for any address > 4GiB. It's pretty hard to get SD cards smaller than that these days…

We could just make that 64 bit or change it into write(sector, offset) - the latter one would match how this is used by the file systems.

@maribu
Copy link
Member

maribu commented May 3, 2020

We could just make that 64 bit or change it into write(sector, offset) - the latter one would match how this is used by the file systems.

If the latter would not only avoid 64 bit arithmetic but also make the file system code cleaner, we should go for that :-)

@vincent-d
Copy link
Member

@maribu I'm OK with these changes.

ACK

@maribu
Copy link
Member

maribu commented May 6, 2020

The error remains - but I can't reproduce it locally. Also not when rebasing to master.

Don't forget that the CI will always rebase the PR on top of current master before running the tests to insure that no breaking change has been merged in the meantime. I guess this is what has happened here.

Sorry, I'm too tired to concentrate.

@fabian18
Copy link
Contributor Author

fabian18 commented May 6, 2020

For me the error does not appear locally either:

2020-05-06 17:14:47,660 # START
2020-05-06 17:14:47,663 # main(): This is RIOT! (Version: 2020.04-devel-2063-g1aa32-mtd-do-not-return-length-on-success)
2020-05-06 17:14:57,850 # ........
2020-05-06 17:14:57,851 # OK (8 tests)

@benpicco
Copy link
Contributor

benpicco commented May 6, 2020

Don't forget that the CI will always rebase the PR on top of current master before running the tests to insure that no breaking change has been merged in the meantime. I guess this is what has happened here.

But that's what I'm doing too.

@fabian18
Copy link
Contributor Author

fabian18 commented May 6, 2020

I am going to rebase and push again.

@fabian18 fabian18 force-pushed the mtd-do-not-return-length-on-success branch from 1aa3231 to b6a772f Compare May 6, 2020 15:39
@benpicco
Copy link
Contributor

benpicco commented May 6, 2020

tests/mtd_at25xxx will need an update too now - and then just revert 8a2a936 and 6ca7ac9
(tests/pkg_littlefs still works on my esp32)

@benpicco
Copy link
Contributor

benpicco commented May 6, 2020

Jup, tests/mtd_at25xxx still works too.

@fabian18
Copy link
Contributor Author

fabian18 commented May 6, 2020

So i squash

@fabian18 fabian18 force-pushed the mtd-do-not-return-length-on-success branch from d484033 to 1c29ed4 Compare May 6, 2020 16:19
@benpicco
Copy link
Contributor

benpicco commented May 6, 2020

Hm I'm beginning to think the esp on CI simply has a broken flash.
It's faililng other PRs as well - just with a different test (and here a hard fault, but as it uses the flash also to store the code, that would be consistent).
But those are two different workers…

@benpicco
Copy link
Contributor

benpicco commented May 6, 2020

I'm willing to just merge this now and ignore the CI test - I tested it locally and it works, CI build fails for other PRs too but with a different test - that can still explained by broken flash.

Just one more thing: there will be a merge conflict with #11081 which touches tests-mtd/tests-mtd.c .

Should I merge that first so you can quickly rebase again? :)

@fabian18
Copy link
Contributor Author

fabian18 commented May 6, 2020

Should I merge that first so you can quickly rebase again? :)

Ok

@benpicco
Copy link
Contributor

benpicco commented May 6, 2020

There you go 😉

@benpicco benpicco removed the CI: run tests If set, CI server will run tests on hardware for the labeled PR label May 6, 2020
@fabian18 fabian18 force-pushed the mtd-do-not-return-length-on-success branch from 1c29ed4 to e14c6a8 Compare May 6, 2020 18:02
@fabian18
Copy link
Contributor Author

fabian18 commented May 6, 2020

I guess I can squash directly?

fabian18 added 3 commits May 6, 2020 20:24
Returning the number of bytes written/read could return a negative integer
because a uint32_t is expected for the length in read()/write() operations.
@fabian18 fabian18 force-pushed the mtd-do-not-return-length-on-success branch from e14c6a8 to d52c716 Compare May 6, 2020 18:25
@benpicco benpicco merged commit ed95944 into RIOT-OS:master May 6, 2020
@fabian18
Copy link
Contributor Author

fabian18 commented May 6, 2020

Thanks

@benpicco
Copy link
Contributor

benpicco commented May 6, 2020

Thank you for the cleanup!

@benpicco
Copy link
Contributor

benpicco commented May 8, 2020

I do realize now that it does make sense for the low-level component to return the number of bytes written if the upper (MTD) layer handles this.

E.g. if write() would write beyond a page, it should only write as many bytes as it could and report that to the MTD layer which would then write the next chunk on the new page.

But instead it would return -EOVERFLOW - that is not helpful.

I'm now re-introducing an API to return the actual number of written bytes, but also use page-wise addressing to being able to write to media > 4GiB in size.

@maribu
Copy link
Member

maribu commented May 9, 2020

E.g. if write() would write beyond a page, it should only write as many bytes as it could and report that to the MTD layer which would then write the next chunk on the new page.

Should the upper layer make sure that only valid parameters are passed to the lower layer? Why would that upper layer call the lower one with writes across pages, unless the upper layer is sure that this is supported?

If you indent to add an intermediate layer on top of the drivers anyway, why not introduce two flavors of backends: One that supports arbitrary writes and reads, and one that only works on a page level. E.g. flash drivers will very naturally map to the page-wise access, while e.g. EEPROM or the native file based MTD map very naturally to arbitrary reads and writes. A union could be used that the driver struct does not get to big and the intermediate layer would make sure that implementation details are transparent to the MTD users.

@benpicco
Copy link
Contributor

benpicco commented May 9, 2020

Should the upper layer make sure that only valid parameters are passed to the lower layer? Why would that upper layer call the lower one with writes across pages, unless the upper layer is sure that this is supported?

You are right, some file systems will already align their writes accordingly.
However, there is not much overhead in adding handling to this to the MTD layer: If the buffer fits into one low-level write(), it will exit early.

Adding two code paths would be a much bigger overhead.

EEPROM or the native file based MTD map very naturally to arbitrary reads and writes.

EEPROMs still can't write over page boundaries. If a single write transaction goes beyond the current page, it will wrap around instead of writing to the next page.

@maribu
Copy link
Member

maribu commented May 9, 2020

If a single write transaction goes beyond the current page, it will wrap around instead of writing to the next page.

Ah, yes. So there is actually no real hardware that can do random access read/writes anyway (except for the one emulated by native).

It would still be nice for hardware that can only do page-wise operations (e.g. not starting a read / write with given offset) to not lot the driver manage random access within the page. Instead, implementing random access on top once for all drivers in mtd.c could safe a lot of code duplication.

Maybe random access within a page could just be assumed to be supported unless a certian submodule of mtd is used. Drivers not supporting random access within a page have to depend on that submodule. That way code taking care of turning an write(offset = 42, data = foo, size = bar) into an write(offset = 0, data = read(offset = 0, size = 42) + foo, size = 42 + bar) would only be compiled if at least on driver needs this. And in addition it seems to make sense to have a flag in case both types of hardware is used, so that the work around is only used when really needed.

@miri64 miri64 added this to the Release 2020.07 milestone Jun 24, 2020
@miri64 miri64 added the Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation label Jul 16, 2020
@fabian18 fabian18 deleted the mtd-do-not-return-length-on-success branch January 11, 2025 15:36
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Area: fs Area: File systems Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: API change Integration Process: PR contains or issue proposes an API change. Should be handled with care. Process: needs >1 ACK Integration Process: This PR requires more than one ACK Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants