Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Make Input::new guard against incorrect AsRef implementations #1154

Merged
merged 1 commit into from
Jan 21, 2024

Conversation

SkiFire13
Copy link
Contributor

Currently Input::new calls haystack.as_ref() twice, once to get the actual haystack slice and the second time to get its length. It makes the assumption that the second call will return the same slice, but malicious implementations of AsRef can return different slices and thus different lengths. This is important because there's unsafe code relying on the Input's span being inbounds with respect to the haystack, but if the second call to .as_ref() returns a bigger slice this won't be true.

For example, this snippet causes MIRI to report UB on an unchecked slice access in find_fwd_imp (though it will also panic sometime later when run normally, but at that point the UB already happened):

use regex_automata::{Input, meta::{Builder, Config}};
use std::cell::Cell;

struct Bad(Cell<bool>);

impl AsRef<[u8]> for Bad {
    fn as_ref(&self) -> &[u8] {
        if self.0.replace(false) {
            &[]
        } else {
            &[0; 1000]
        }
    }
}

let bad = Bad(Cell::new(true));
let input = Input::new(&bad);
let regex = Builder::new()
    .configure(Config::new().auto_prefilter(false)) // Not setting this causes some checked access to occur before the unchecked ones, avoiding the UB
    .build("a+")
    .unwrap();
regex.find(input);

The proposed fix is to just call .as_ref() once and use the length of the returned slice as the span's end value. A regression test has also been added.

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! Out of curiosity, how did you find this?

I think this overall looks good, but I'd like to find a word other than "malicious." The issue here really isn't "malicious" per se, because the threat model here doesn't really involve some bad actor doing something sneaky (if a bad actor can insert a malicious AsRef impl, then they can do a whole bunch of other stuff without need for such things). Perhaps "guarding against incorrect AsRef impls" is a better way to phrase it.

@SkiFire13
Copy link
Contributor Author

Nice fix! Out of curiosity, how did you find this?

I happened to gave a quick look at Input::new's source code and the two calls to .as_ref() reminded me of rust-lang/rust#80335 so I quickly checked if there was unsafe code relying on the span's end and there was.

@SkiFire13 SkiFire13 changed the title Make Input::new robust against malicious AsRef implementations Make Input::new guard against incorrect AsRef implementations Jan 20, 2024
Before this commit, Input::new calls haystack.as_ref() twice, once to
get the actual haystack slice and the second time to get its length. It
makes the assumption that the second call will return the same slice,
but malicious implementations of AsRef can return different slices
and thus different lengths. This is important because there's unsafe
code relying on the Input's span being inbounds with respect to the
haystack, but if the second call to .as_ref() returns a bigger slice
this won't be true.

For example, this snippet causes Miri to report UB on an unchecked
slice access in find_fwd_imp (though it will also panic sometime later
when run normally, but at that point the UB already happened):

    use regex_automata::{Input, meta::{Builder, Config}};
    use std::cell::Cell;

    struct Bad(Cell<bool>);

    impl AsRef<[u8]> for Bad {
        fn as_ref(&self) -> &[u8] {
            if self.0.replace(false) {
                &[]
            } else {
                &[0; 1000]
            }
        }
    }

    let bad = Bad(Cell::new(true));
    let input = Input::new(&bad);
    let regex = Builder::new()
        // Not setting this causes some checked access to occur before
        // the unchecked ones, avoiding the UB
        .configure(Config::new().auto_prefilter(false))
        .build("a+")
        .unwrap();
    regex.find(input);

This commit fixes the problem by just calling .as_ref() once and use
the length of the returned slice as the span's end value. A regression
test has also been added.

Closes rust-lang#1154
@BurntSushi BurntSushi force-pushed the fix-unsound-input-new branch from 1c2aa52 to 07246d4 Compare January 21, 2024 13:15
@BurntSushi BurntSushi merged commit fbd2537 into rust-lang:master Jan 21, 2024
16 checks passed
@BurntSushi
Copy link
Member

This PR is on crates.io in regex 1.10.3.

@SkiFire13 SkiFire13 deleted the fix-unsound-input-new branch January 21, 2024 14:16
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants