-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Stop using mem::zeroed
for FFI
#136737
Comments
I don't think But for bigger structs I do agree. |
Afaik in the linux kernel zero-initialization for structs is the default as a security measure. And some APIs explicitly require it, e.g. cmsghdr. |
Fair point. I just wanted a very clear example of the pattern. Edit: Also, I think using |
I don't think this is necessary in our case. For one, there is an FFI and linkage unit boundary in our case, meaning the optimizer can't rely introduce any optimizations that rely on the initialization state of the structure. Also, I guess Linux probably does this to make sure that sensitive data is not accidentally leaked when copying out kernel memory to userspace. This isn't relevant for us as we assume the environment to be trusted.
Obviously we should evaluate the correctness individually for each instance of the pattern. But for e.g. |
I think it makes sense to do for potentially big structs, but I think it should be carefully evaluated on a case-by-case basis, as using MaybeUninit makes the code more complicated. |
This kind of argument worries me. C programmers have been proven wrong about it many times as LTO has become more commonplace. It’s generally true today with libcs written in C and compiled to machine code before Rust even enters the picture, but this can and quite possibly should change. There’s cross-language LTO, LLVM libc has the explicit goal of supporting LTO of libc code + application code, and there are even (highly experimental) projects implementing libc interfaces in Rust. |
And the wobblyness of an uninit struct backed by MADV_FREE'd memory would leak through an FFI barrier. I.e. |
I'd argue that the additional complexity is a good thing: Having to write
Exactly. The question is how much we trust the C code to be correct. I'd argue that in nearly all of the cases inside |
This actually bit us on linux #115868. The API guarantees that it's non-zero but older kernels had a bug where it failed to uphold that guarantee. On the C side that's probably just a correctness issue, not UB. We elevated it UB by trusting the API and feeding it into nonzero.
Not just the code but also the API. When talking about OS APIs we play a game of telephone from the internal interfaces to the public API to the libc/windows-sys translations.
On the OS side it's usually not UB though afaik. Passing a zero-initialized struct to an OS api will just result in an error return, not UB because they don't have any non-zero types and check the pointers because they don't trust userspace. So |
We do have to trust the C API to some extent. The number of unsound things that it can do (or provoke us to do) is infinite but even if you restrict it to the set of things that can be done by accident then it's still a large surface area that we don't want to be defensively programming against. For example, writing to memory that's supposed to only be read can happen purely by accident. Or scribbling over our stack due to an offset bug. We can react to known bugs but we can't reasonably defend against everything. That's not to say we should 100% trust the C API but there is a trade-off and we do have to draw a line somewhere. |
I'm not saying we must zero everything. Imo it's just a conservative choice that might lessen the impact of some subset of bugs. So I don't like framing this as "stop using mem::zeroed". |
Yes, fair enough.
This isn't about undefined behaviour, but rather about easier correctness. As the system APIs don't return |
Using an |
Hi. Is there any prerequisite knowledge or background required for me to try my hand at this issue? This is my first attempt at this project. What do I need to do besides checking out the development guide? |
Not much, really, besides checking your code for the relevant targets with As for which instance of the pattern to address, I'd recommend cases where either |
@rustbot claim |
Hey. I'm not sure I found your id correctly inside zulip, so I came here to ask a question.
And I found these uses for mem::zeroed():
|
MaybeUninit::zeroed seems to have the same performance overhead as mem::zeroed, should I use MaybeUminit::uinit instead of MaybeUninit::zeroed? @joboet |
If I read it correctly:
|
Yes, that's right. |
We should never assume that kernel/OS has any "common sense" behavior. And of course, we should not base our safety guarantees on OS correctness. For example, let's consider Linux's Update 2025-02-14: this affects raw Linux syscall, but not glibc's |
Please, close this issue without changing anything! (Text below will discuss both Linux and Windows syscalls.) For many interfaces it is recommended or even required to zero structures before using. For example, let's talk about Linux's
Moreover, zero-initializing structs is so easy and natural in C, that documentation writers sometimes do not even bother explicitly saying you that you should zero-initialize everything. For example, here is article Brauner wrote about newly introduced Linux syscalls: https://brauner.io/2024/12/16/list-all-mounts.html . As well as I understand, Brauner also is author of actual implementations of these syscalls, so he definitely knows what he writing. Notice this line in his example code: Okay, why Brauner wrote this? Simply out of habit? Or because this is actually required? I don't know. Even if this is requirement, I think it is very easy for Brauner to forget to say this explicitly. Because zero-initializing is so common in C that people usually don't mention explicitly, that it is required. Now let's talk about Windows. Here is Windows tutorial: https://learn.microsoft.com/en-us/windows/win32/learnwin32/creating-a-window . Notice line In WinAPI people usually zero-initialize everything. Unfortunately I cannot find a link, which will explain this. But I remember that in good old Delphi days it was commonplace to zero-initialize everything when you call WinAPI. Delphi used Pascal as its language. And yes, zero-initializing was totally unidiomatic and wordy in Pascal, similarly to how Okay, so. Zero-initializing is good default. Sometimes it is requirement. And if you don't know whether it is required, then, please, zero-initialize. It is safer option. This is harmless. This is faster than syscall overhead. Mass-replacing Even if you really want to replace something, then this should be some function, which doesn't do any syscall (and thus you can actually save some time by replacing So, again: just close this. If you don't want to close, then at least mark the issue as "hard", not as "easy". (Note: "E-hard" description doesn't say "rustc contributing experience", it just says "experience", so I assume general programming experience. This issue definitely needs a lot of system programming experience.) @rustbot label: -E-easy @rustbot label -E-easy @rustbot label: +E-hard (rustbot often ignores me, I don't know whether these commands will work.) |
I'm not advocating to change cases where structs are filled by |
Replace mem::zeroed with mem::MaybeUninit::uninit for large struct in Unix As discussion in rust-lang#136737. - Replace `mem::zeroed()` with `MaybeUninit::uninit()` for `sockaddr_storage` in `accept()` and `recvfrom()` since these functions fill in the address structure - Replace `mem::zeroed()` with `MaybeUninit::uninit()` for `pthread_attr_t` in thread-related functions since `pthread_attr_init()` initializes the structure - Add references to man pages to document this behavior
Rollup merge of rust-lang#136826 - xizheyin:issue-136737, r=thomcc Replace mem::zeroed with mem::MaybeUninit::uninit for large struct in Unix As discussion in rust-lang#136737. - Replace `mem::zeroed()` with `MaybeUninit::uninit()` for `sockaddr_storage` in `accept()` and `recvfrom()` since these functions fill in the address structure - Replace `mem::zeroed()` with `MaybeUninit::uninit()` for `pthread_attr_t` in thread-related functions since `pthread_attr_init()` initializes the structure - Add references to man pages to document this behavior
Quite a lot of
std
s FFI code usesmem::zeroed
to create empty structures that are to be filled by FFI. E.g.:rust/library/std/src/sys/pal/windows/time.rs
Lines 68 to 72 in d2f335d
This is unnecessary, since the C code does not require the structures to be initialized (you wouldn't zero out structures in C either). Thus, this pattern just reduces performance, as it results in the initialization of potentially very large structures such as
sockaddr_storage
. We should get rid of this pattern and replace it with proper handling of uninitialized data throughMaybeUninit
.Edit (after discussion below): In some instances one might decide to keep the zero-initialization behaviour, but I think this should still go through
MaybeUninit::zeroed
instead ofmem::zeroed
to make the point of initialization explicit (by introducing.assume_init()
calls in the right places.I'll probably do the network code myself (I want to clean some things up there anyway), but I'm happy to mentor you if you'd like to help with other instances such as the filesystem code (
library/std/src/sys/pal/*/fs.rs
). Just contact me here or on Zulip. The best way to find the pattern is probably by searching formem::zeroed
inlibrary/std/src/sys
.The text was updated successfully, but these errors were encountered: