Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Stack overflow at ~20,000 characters #384

Open
rscarson opened this issue Mar 29, 2024 · 5 comments
Open

Stack overflow at ~20,000 characters #384

rscarson opened this issue Mar 29, 2024 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@rscarson
Copy link

rscarson commented Mar 29, 2024

I have a StringLiteral token with the following expressions on it:

    #[regex(r#"(?:/(?:\\.|[^\\/])+/[a-zA-Z]*)"#)] // Regex literal
    #[regex(r#"(?:"(?:(?:[^"\\])|(?:\\.))*")"#)] // " string literal "
    #[regex(r#"(?:'(?:(?:[^'\\])|(?:\\.))*')"#)] // ' string literal '

And encountering a string literal above ~20,000 characters stack overflows within logos generated code
Is there an error in one of these expressions?

@maciejhirsz
Copy link
Owner

maciejhirsz commented Mar 31, 2024

Groups are never capturing in logos so you don't need ?:, and you don't need groups around |, so string literal can be rewritten as #[regex(r#""([^"\\]|\\.)*""#)]. If that still overflows try #[regex(r#""([^"\\]+|\\.)*""#)]. I believe it loops internally by doing tail call recursion so the greedily looping all non-escaped non-quote characters inside the group could help (but I have not tested it).

This is one of those cases that really needs the codegen rewrite (#291) into a loop with enum branching working as faux-goto instead of function calls.

@rscarson
Copy link
Author

I had already rewritten it as #[regex(r#""([^"\\]|\\.)*""#)] with no luck but #[regex(r#""([^"\\]+|\\.)*""#)] did the trick!
Thanks!

@rscarson
Copy link
Author

rscarson commented Mar 31, 2024

it still overflows at some point though, at a larger number

Should I leave this issue open for tracking?

@maciejhirsz
Copy link
Owner

Should I leave this issue open for tracking?

Ye, let's do that.

@maciejhirsz maciejhirsz reopened this Mar 31, 2024
@jeertmans jeertmans added bug Something isn't working help wanted Extra attention is needed labels Apr 8, 2024
mikkeldenker added a commit to StractOrg/stract that referenced this issue Jul 22, 2024
this should fix a reported stack overflow (might be related to maciejhirsz/logos#384) and should also make it easier to add additional scripts besides latin in the future
@0x2a-42
Copy link
Contributor

0x2a-42 commented Oct 17, 2024

You can also avoid a stack overflow by compiling with optimizations. Apparently opt-level 1 is enough for tail call optimization to occur.

[profile.dev]
opt-level = 1

There currently seems to be no way to selectively optimize the generated lex function. Maye rust-lang/rust#54882 can be used, if it is stabilized before the logos code generator is rewritten.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants