-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
.pop_front() causing memory corruption (macOS) #57
Comments
Hi @aldanor thank you for the report and sorry for the slow reply (i'm travelling). Could you share a full snippet that reproduces the issue ? It's ok if it depends on Also it would be nice to know whether this happens with the latest version of the library or some older version (could you try with |
@gnzlbg I've managed to isolate this. Here's an example that always fails: #[test]
fn test_deque() {
#[derive(Clone, Copy, Debug, PartialEq)]
pub struct Foo {
a: i64,
b: Option<(bool, i64)>,
}
use slice_deque::SliceDeque;
use rand::{StdRng, SeedableRng, distributions::Uniform};
let mut rng: StdRng = SeedableRng::seed_from_u64(0);
let mut deque = SliceDeque::new();
loop {
let n = rng.sample(Uniform::new_inclusive(0, 1000));
for i in 0..n {
deque.push_front(Foo { a: 42, b: None });
}
let n = rng.sample(Uniform::new_inclusive(1, deque.len()));
for i in 0..n {
assert_eq!(deque.pop_front(), Some(Foo { a: 42, b: None }));
if !deque.is_empty() {
// this assertion fails (becomes corrupt after pop_front())
assert_eq!(unsafe { *deque.get_unchecked(deque.len() - 1) },
Foo { a: 42, b: None });
}
}
}
} fails like so:
Note that the same example if you replace a struct with int works just fine. Could it have anything to do with alignment? (Haven't tried master/unix_sysv yet, will do next.) |
@gnzlbg Just checked the master branch and |
This is indeed a bug somewhere in const C: [i16; 3] = [42; 3];
let mut deque = SliceDeque::new();
for _ in 0..918 {
deque.push_front(C);
}
for _ in 0..237 {
assert_eq!(deque.pop_front(), Some(C));
assert!(!deque.is_empty());
assert_eq!(*deque.back().unwrap(), C); // fails B != C
} |
Indeed, your example is even more minimal. Does this only occur on macOS? |
I have only tested this on macOS, working on a fix. I suspect the bug is platform independent, but I can't say for sure yet. |
Just checked on 64-bit Linux:
|
Another thing to note re: your example, if you replace |
It appears that this was caused by a "by |
Could you test if that branch solves the problem for you? |
That's usually the nastiest type of errors
I've checked the branch on 64-bit Linux, seems to work fine so far, I think that fixed it. |
I thought about alignment too at first, but at the end the problem had nothing to do with that. When the For simplicity, the whole implementation assumes that if all elements fit in a single mirrored region, they are always in the first one. The job of the code with the bug is to make sure that this is the case. It worked in many cases, but it was missing all cases in which the elements of the second memory region lied exactly on the boundary between both regions. This introduced a memory error that allows safe Rust code to read uninitialized memory if the deque was put in the state described above. |
Version 0.1.16 has been released with the fix. Sorry that it took so long to get to the bottom of this, i was travelling for the last couple of days. |
@gnzlbg Thanks a lot for the quick fix! Will have to switch from vecdeque back to slicedeque... again :) |
I've tried to make that as painless as possible by providing
|
Reads from uninitialized memory can be exploited to obtain secret data, bypass exploit mitigations or even execute arbitrary code. Please add this issue to the Rust security advisory database so that anyone depending on the crate has a way to check whether they depend on a vulnerable version. |
Seems still hitting this problem even using 0.1.16. I cannot say it's the same bug, but the behavior is quite alike. At some time memory of the items in the deque is corrupted. If I turn on |
@zimond Could you try isolating a minimal example? |
Seems like #59 would be helpful to have after all. |
@zimond do the tests pass on your system? which system are you on? |
I'm on a 17 macbook pro. The tests pass. It's hard to isolate the problem as I run into this in a quite complex system i'm building. I created roughly 1000x |
By the way I can confirm this is related to |
@zimond I haven't been able to reproduce this yet on macos x. It would be extremely helpful if you could come up with a minimal (or not so minimal, e.g. point me to a github repo where this fails) working example. If your code is not available online, maybe you could provide a version where everything that doesn't have much to do with the deques is stripped out, at least as long as its reasonable to do so. In the meantime, i'm going to start fuzzing the library and see if that finds it. |
Ok I'll try... I will update in this thread once I get something |
Thanks! In the mean time I've set up fuzzing (see #59 (comment)) but it hasn't found any issues yet :/ |
Have you checked |
So Until now, the majority of bugs have been due to failure to update the head and tail of the deque properly. I don't see anything wrong with that code, and there are a lot of I didn't ask about this before, but I suppose that you are able to reproduce the memory corruption in debug builds with debug assertions turned on, and you do not get a panic coming up from any of the asserts, right? |
Just tried debug build and you are right, the assertions did not catch this. It's so hard to reproduce this and I checked my code once again, I created a custom Iterator based on the slice returned by |
I managed to narrow the bug down to
So I think maybe in certain situation, (note: after pop, |
This is basically what was happening in this bug. That's very suspicious . @zimond is your project in github ? or could you send me a reduced test per email if it isn't ? |
Not on github. It's really hard to create a reduced test on this. So I just updated several replies here. I will try again this weekend. Sorry for the long wait. |
Don't be sorry, I am! I want to fix this, but without a program to reproduce it I really can't :/ |
I haven't forgotten about this, a reproducer would still be appreciated. |
@gnzlbg Hey I tried several times but it's still hard to reproduce. But today I reviewed the code and decided to replace |
I'm also encounterring a memory corruption issue when using SliceDeque. It always happens after a call to |
I've implemented an optimization on |
I think I have a working example for you. In my latest project using slice_deque I am getting a segmentation fault within 60 seconds of startup. Refactoring to use vecdeque instead of slice_deque makes the problem go away. When I run Yesterday instead of a segmentation fault I was getting a weird error about "overflow while adding Duration". I'm guessing the VecDeque was still corrupting memory, but it belonged to the application so the OS didn't notice. You can find the code here: https://gitlab.com/szaver/mate3/tree/master. I would recommend testing rev NOTE: this is a separate project from the one I mentioned in #64. This time I can reproduce the crash on my MacBook. |
I found an even simpler reproduction! 🎉 Published it here: https://gitlab.com/szaver/slicedeque-crash use slice_deque::SliceDeque;
fn main() {
let mut deque = SliceDeque::new();
loop {
deque.push_back(String::from("test"));
if deque.len() == 8 {
deque.pop_front();
}
}
} On my computer it will crash within milliseconds of running with the following error:
I hope that helps! EDIT: It even reproduces inside GitLab CI. See here: https://gitlab.com/szaver/slicedeque-crash/-/jobs/199158446. I think it's safe to say that this is not a MacOS specific bug. |
Wow thank you so much, I work on MacOS, will dig into this tomorrow!! |
wow I'm so curious ! Waiting for the answer now 😂 |
… indices. This commit refactors the implementation use an internal slice instead of indices, which significantly simplifies the implementation and closes #57. The problem with #57 was that using a pair of indices to keep track of where the head and the tail of the slice are located only works correctly for `mem::size_of::<T>() % allocation_granularity() == 0` because in that case, there is a unique map from indices to elements in memory. consider a T that's 3 bytes wide, for which `mem::size_of::<T>() % allocation_granularity() != 0`, then we can have: ``` // first region mirrored region // [T02, T10, T11, T12, -, -, T00, T01] | [T02, T10, T11, T12, -, -, T00, T01] ``` such that poping the first element leaves ``` // first region mirrored region // [-, T10, T11, T12, -, -, -, -] | [-, T10, T11, T12, -, -, -, -] ``` An integer indexing scheme in multiples of `size_of::<T>()` that starts at the beginning of the first memory region cannot cope with this (e.g. we'd need to say that the head of the slice starts at index 0.5). One way to work around that would be to use indices to bytes instead (or equivalently, pointers). This PR does that, by changing the layout from a pair of indices to Ts, to a pointer to the first T, and a length (that is, a slice).
I'm sorry that it took a while, but #66 should fix this for good. The problem of the previous implementation is described in 8074e61 Basically, the previous implementation only always worked properly if In a nutshell, before, a pair of indices was used to track the head and the tail of the slice within the ring buffer, e.g., suppose that the allocation granularity is 8, the size of
where 0, 1, 2, 3 are indices that can be used to refer to the memory of an element within the ring buffer. If Now consider what happens if the above does not hold, and we put 3-byte wide Ts in that memory:
Note that only 2 3-byte wide elements fit in the 8 bytes that we have, so we have 2 1-byte holes at the end of the physical memory region. The problem is that this indexing scheme is only valid as long as we don't wrap around the deque. For example, suppose that we pop-front T0, and push_back T2, then we get:
The index 1 still indexes T1 properly, and if we were to index T2 with index 2, that would also work due to how memory is mirrored, but the issue is that we can't index T2 with index 0. This case also used to work, because before we did not tried to access T2 with index 0, but if we break it a little bit more, then everything breaks. For example, let's pop front T1, push back T3, and pop front T2. Then we get:
Now there is no real way to access T3 via or This is basically what the example provided by @whmountains was doing. It was pushing a String into a deque, which is 3 pointers wide, or 1 pointer and 2 usizes - doesn't matter. What matters is that when the allocation granularity is 4096 bytes, String is 24 bytes on 64-bit architectures, and then To fix this, one could keep the Instead of doing that, I decided to switch to a "byte" based indexing, where the indices point to bytes in memory, and not to So that's basically why it took me so long to fix this. I allocated half an hour to look into this, to discover that it would take me a bit more to fully understand the issue. Then a couple of days later I looked again 1 hour at this, and found the issue, and toyed with different ways to solve this. And then I decided that the refactor was the right thing to do, but I couldn't allocate 2 hours for this till today =/ |
Uh, I forgot the most important part of the post. Please try the PR branch, and see if that fixes your issues, and report here with the results! |
slice-deque 0.2 has been released with this hopefully fixed for everybody, thank you all involved! |
Since this looks like a memory corruption and could be a security issue, could you file an advisory at https://github.com/RustSec/advisory-db so that anyone interested could check their dependencies for versions with this bug? Also, if the fixed version is 0.2, does that mean that there's no semver-compatible version with the fix? |
Yes. The fix required a semver API-breaking change to
Done: rustsec/advisory-db#95 |
Thanks to all the hard work. I'll try 0.2 soon and feedback |
This took me a long while to figure out, but I narrowed it down to just a single line:
The elements are simple Clone/Copy structs; the first println outputs the queue in its normal state, whereas in the second all values are like 4294999990 or 123145302343606. This only happens in a quickcheck-like test suite where the queue is used super intensively, and pushes/pop are done thousands of times (it constantly fails at the same spot though).
I could try running it through valgrind if it helps, not sure how else I could help. Is this a known issue perhaps?
The text was updated successfully, but these errors were encountered: