Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

char boundary byte indexing panic when using Regex::split #417

Closed
frewsxcv opened this issue Nov 24, 2017 · 3 comments
Closed

char boundary byte indexing panic when using Regex::split #417

frewsxcv opened this issue Nov 24, 2017 · 3 comments

Comments

@frewsxcv
Copy link
Member

frewsxcv commented Nov 24, 2017

extern crate regex;

fn main() {
    let a = std::str::from_utf8(b"\\B(?-u)|0").unwrap();
    let b = std::str::from_utf8(b"\n\xcd\x86").unwrap();
    let c = regex::Regex::new(a).unwrap();
    c.split(b).collect::<Vec<_>>(); 
}
thread 'main' panicked at 'byte index 2 is not a char boundary; it is inside '͆' (bytes 1..3) of `
͆`', src/libcore/str/mod.rs:2232:4
note: Run with `RUST_BACKTRACE=1` for a backtrace.

line where the panic happens

found via afl.rs using this fuzz target

@frewsxcv frewsxcv changed the title char boundary byte indexing panic when using Regex::spit char boundary byte indexing panic when using Regex::split Nov 24, 2017
@BurntSushi
Copy link
Member

BurntSushi commented Nov 24, 2017 via email

@frewsxcv
Copy link
Member Author

@BurntSushi yep, just confirmed it happens in master too. here's a backtrace

@BurntSushi
Copy link
Member

This was fixed in regex 0.2.7. In particular, negated word boundaries can match invalid UTF-8, which the new regex-syntax crate now detects correctly. Previous it didn't.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants