char boundary byte indexing panic when using Regex::split #417

frewsxcv · 2017-11-24T16:03:54Z

extern crate regex;

fn main() {
    let a = std::str::from_utf8(b"\\B(?-u)|0").unwrap();
    let b = std::str::from_utf8(b"\n\xcd\x86").unwrap();
    let c = regex::Regex::new(a).unwrap();
    c.split(b).collect::<Vec<_>>(); 
}

thread 'main' panicked at 'byte index 2 is not a char boundary; it is inside '͆' (bytes 1..3) of `
͆`', src/libcore/str/mod.rs:2232:4
note: Run with `RUST_BACKTRACE=1` for a backtrace.

line where the panic happens

found via afl.rs using this fuzz target

The text was updated successfully, but these errors were encountered:

BurntSushi · 2017-11-24T21:33:47Z

I'm away for a bit without access to computer. Does this bug happen in master?

…

On Nov 24, 2017 11:03, "Corey Farwell" ***@***.***> wrote: extern crate regex; fn main() { let a = std::str::from_utf8(b"\\B(?-u)|0").unwrap(); let b = std::str::from_utf8(b"\n\xcd\x86").unwrap(); let c = regex::Regex::new(a).unwrap(); c.split(b).collect::<Vec<_>>(); } thread 'main' panicked at 'byte index 2 is not a char boundary; it is inside '͆' (bytes 1..3) of ` ͆`', src/libcore/str/mod.rs:2232:4 note: Run with `RUST_BACKTRACE=1` for a backtrace. line where the panic happens <https://github.com/rust-lang/regex/blob/d504c82275101d016b125beaf21d64e44bfe099f/src/re_unicode.rs#L834> found via afl.rs <https://github.com/rust-fuzz/afl.rs> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#417>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAb34lyhabwVYisrFRc76M9Iog9JQdGAks5s5ujrgaJpZM4Qp_ZQ> .

frewsxcv · 2017-11-24T23:42:31Z

@BurntSushi yep, just confirmed it happens in master too. here's a backtrace

BurntSushi · 2018-04-28T14:58:02Z

This was fixed in regex 0.2.7. In particular, negated word boundaries can match invalid UTF-8, which the new regex-syntax crate now detects correctly. Previous it didn't.

frewsxcv changed the title ~~char boundary byte indexing panic when using Regex::spit~~ char boundary byte indexing panic when using Regex::split Nov 24, 2017

frewsxcv mentioned this issue Dec 5, 2017

apply AFL to regex #203

Closed

BurntSushi closed this as completed Apr 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

char boundary byte indexing panic when using Regex::split #417

char boundary byte indexing panic when using Regex::split #417

frewsxcv commented Nov 24, 2017 •

edited

Loading

BurntSushi commented Nov 24, 2017 via email

frewsxcv commented Nov 24, 2017

BurntSushi commented Apr 28, 2018

char boundary byte indexing panic when using Regex::split #417

char boundary byte indexing panic when using Regex::split #417

Comments

frewsxcv commented Nov 24, 2017 • edited Loading

BurntSushi commented Nov 24, 2017 via email

frewsxcv commented Nov 24, 2017

BurntSushi commented Apr 28, 2018

frewsxcv commented Nov 24, 2017 •

edited

Loading