Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix memory leak while parsing improperly terminated inherit-expressions #31

Merged
merged 1 commit into from
Sep 23, 2021

Conversation

Ma27
Copy link
Member

@Ma27 Ma27 commented Sep 19, 2021

First of all, big thanks to @fufexan who helped me to reliably reproduce
this.

Originally discovered in rnix-lsp[1], but I confirmed that
nixpkgs-fmt is also affected.

Basically, when having an expression such as

let
  inherit

the parser would wait for a TOKEN_SEMICOLON indefinitely. The actual
problem however is that self.parse_val() always detects the SAME
syntax-error, i.e. "unexpected EOF". This will be written indefinetely
into self.errors. However, errors is of type Vec<ParseError> and a
vector in Rust grows in an amortized fashion[2] which means that if an
entry is pushed and the vector exceeds the currently allocated size, it
will be ~doubled (though the exact growth-factor isn't constant).

This essentially means that the buffer is growing exponentially pretty fast
and - according to KDE heaptrack - my system allocated ~9.5GB after 20s
while running some tests.

I added an exit-condition to the loop traversing through
inherit-subexpressions to avoid that. Checking for an "unexpected EOF"
is actually sufficient here:

  • There's either a ; later in the expression causing the loop to
    terminate and causing an actual "unexpected token" error then.

  • Otherwise, parse_val will go through the tokens until a matching
    semicolon is found (which is not the case) and then reach the end of
    the file. In that case, unexpected EOF is returned by parse_val.

[1] nix-community/rnix-lsp#33
[2] https://www.cs.cornell.edu/courses/cs3110/2011sp/Lectures/lec20-amortized/amortized.htm

…ions

First of all, big thanks to @fufexan who helped me to reliably reproduce
this.

Originally discovered in `rnix-lsp`[1], but I confirmed that
`nixpkgs-fmt` is also affected.

Basically, when having an expression such as

    let
      inherit

the parser would wait for a `TOKEN_SEMICOLON` indefinitely. The actual
problem however is that `self.parse_val()` always detects the SAME
syntax-error, i.e. "unexpected EOF". This will be written indefinetely
into `self.errors`. However, `errors` is of type `Vec<ParseError>` and a
vector in Rust grows in an amortized fashion[2] which means that if an
entry is pushed and the vector exceeds the currently allocated size, it
will be ~doubled (though the exact growth-factor isn't constant).

This essentially means that the buffer is growing exponentially pretty fast
and - according to KDE heaptrack - my system allocated ~9.5GB after 20s
while running some tests.

I added an exit-condition to the loop traversing through
`inherit`-subexpressions to avoid that. Checking for an "unexpected EOF"
is actually sufficient here:

* There's either a `;` later in the expression causing the loop to
  terminate and causing an actual "unexpected token" error then.

* Otherwise, `parse_val` will go through the tokens until a matching
  semicolon is found (which is not the case) and then reach the end of
  the file. In that case, `unexpected EOF` is returned by `parse_val`.

[1] nix-community/rnix-lsp#33
[2] https://www.cs.cornell.edu/courses/cs3110/2011sp/Lectures/lec20-amortized/amortized.htm
@Ma27
Copy link
Member Author

Ma27 commented Sep 19, 2021

@fufexan btw would you mind building your rnix-lsp with this branch of rnix-parser and check for a few days if the issue is really gone for you? :)

@fufexan
Copy link

fufexan commented Sep 19, 2021

@Ma27 I would, I'm just not sure how to build it, as I haven't worked with rust/cargo before, so I've got no clue how to replace the default rnix package with the branch.

@Ma27
Copy link
Member Author

Ma27 commented Sep 19, 2021

Oh right, sorry!

The thing is, naersk doesn't seem to support Git repositories as crate overrides, so I took a bit more manual way myself:

Copy link
Member

@aaronjanse aaronjanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa! Thank you so much @Ma27 @fufexan!

Everything here LGTM, although I haven't yet had a chance to run the code. Tomorrow morning I'll try to write a flake to make it easy to build rnix-lsp with a custom rnix-parser

@aaronjanse
Copy link
Member

This should build rnix-parser, using this PR:

nix build git+https://gist.github.com/aaronjanse/0b6145784fcb1aa6ee1419248ae53872

@Ma27
Copy link
Member Author

Ma27 commented Sep 22, 2021

@fufexan did you have a chance to test this? :)

@fufexan
Copy link

fufexan commented Sep 22, 2021

@Ma27 yes, all works fine. Haven't had a memleak since switching :)

Copy link
Member

@aaronjanse aaronjanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, and the code looks good to me. Thank you @Ma27!

@Ma27 Ma27 merged commit 61f19a9 into master Sep 23, 2021
@Ma27 Ma27 deleted the fix-memleak branch September 23, 2021 10:51
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants