Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Runtime stack overflow when lexing certain strings #424

Open
Philogy opened this issue Sep 22, 2024 · 4 comments
Open

Runtime stack overflow when lexing certain strings #424

Philogy opened this issue Sep 22, 2024 · 4 comments
Labels
bug Something isn't working help wanted Extra attention is needed question Further information is requested

Comments

@Philogy
Copy link

Philogy commented Sep 22, 2024

I was trying to get C-style multiline comment working in logos, the state machine for it is quite simple but I can't seem to get it working in logos. This regex seemed to work but it causes a panic when parsing certain things.

The code for report:

use logos::Logos;

#[derive(Logos, Debug, PartialEq)]
#[logos(skip r"[ \t\n\f\r]+")] // Ignore this regex pattern between tokens
enum Token {
    // Tokens can be literal strings, of any length.
    #[token("#define")]
    Define,

    #[token("macro")]
    Macro,

    #[regex("//[^\n]*\n?", logos::skip)]
    LineComment,

    #[regex("/\\*([^\\*]*(\\*[^/])?)*\\*/", logos::skip)]
    MultiLineComment,

    // Or regular expressions.
    #[regex("0x[0-9a-fA-F]+")]
    HexLiteral,

    #[regex("[a-zA-Z_]\\w*:")]
    Label,

    #[regex("[a-zA-Z_]\\w*")]
    Ident,
}

fn main() {
    let src = "
/*wow amazing!!!!!*** /*  **/
// wow very nice
#define macro hi: very nice

";

    let mut lexer = Token::lexer(src);
    while let Some(token) = lexer.next() {
        println!("{:?} {}", token, lexer.slice());
    }
}

The error I'm getting:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
[1]    5609 abort      cargo run
@Philogy Philogy changed the title Runtime panic when parsing lexing Runtime panic when lexing based on certain regex Sep 22, 2024
@Philogy Philogy changed the title Runtime panic when lexing based on certain regex Runtime stack overflow when lexing certain strings Sep 22, 2024
@jeertmans jeertmans added question Further information is requested bug Something isn't working labels Sep 26, 2024
@jeertmans
Copy link
Collaborator

Hello @Philogy, I currently haven't much time to invest in this issue, but I would recommend to you the same thing as I do for all comment-style lexing: just create a token that matches the start of a comment, and then process the comment with a callback. This is usually much better, as comments can contain almost any characters, like escaped /, which makes it super hard to write a regex that handles all specific cases.

See #421 (comment) for an example on XML comments, which is very similar to multiline strings.

@jeertmans
Copy link
Collaborator

jeertmans commented Sep 26, 2024

Looks like this is a duplicate of #400, so closing this anyway :-)

@conradludgate
Copy link

conradludgate commented Oct 22, 2024

Looks like this is a duplicate of #400, so closing this anyway :-)

Looks different to me. #400 seems to error in the derive, whereas this errors at runtime.

For what it's worth, I also encountered a stack overflow/infinite loop at runtime with a small test case:

#[derive(Logos, Debug, PartialEq)]
enum TestToken {
    #[regex("c(a*b?)*c")]
    Token
}

#[cfg(test)]
mod logos_test {
    use logos::Logos;

    use crate::TestToken;

    #[test]
    fn overflow() {
        let _ = TestToken::lexer("c").next();
    }
}

@jeertmans jeertmans reopened this Nov 16, 2024
@jeertmans jeertmans added the help wanted Extra attention is needed label Nov 16, 2024
@Melyodas
Copy link

I toyed a bit with adding a Mermaid output from the lexer.

Looking at the graph from the test above, the issue seems to be that the graph does not handle a loop between nodes when it misses.

flowchart LR
1("::Token")
3("rope#3")
  3 -- "a" --> 5
  3 --x 6
4("rope#4")
  4 -- "b" --> 3
  4 --x 3
5("rope#5")
  5 -- "a" --> 5
  5 --x 4
6("fork#6")
  6 -- "b" --> 3
  6 -- "c" --> 1
  6 --x 3
8("Start")
  8 -- "c" --> 3
Loading

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants