Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

High number of memory allocations when parsing timeago dates #1037

Open
FireMasterK opened this issue Mar 6, 2023 · 2 comments
Open

High number of memory allocations when parsing timeago dates #1037

FireMasterK opened this issue Mar 6, 2023 · 2 comments
Labels
enhancement New feature or request youtube service, https://www.youtube.com/

Comments

@FireMasterK
Copy link
Member

The cause is this line:

.anyMatch(agoPhrase -> textualDateMatches(textualDate, agoPhrase)))

It looks like compiling a pattern makes a lot of memory allocations, and we should avoid it if possible.

Screenshot from profiler:
image

@FireMasterK FireMasterK added enhancement New feature or request youtube service, https://www.youtube.com/ labels Mar 6, 2023
@lrusso96
Copy link
Contributor

lrusso96 commented Jul 2, 2023

Also pinging @AudricV since he recently discussed about refactoring the parser in #1068.

@FireMasterK the code is highly inefficient. The pattern is re-compiled every time one wants to check for a match (even more than 20 times per match!)
The easy fix would be to cache the Pattern and call multiple times the matches method on it.
However, since the pattern in this case is very simple, regex could be an overkill. Take a look at my fork where I implemented the regex-free (yet locale-aware) parsing. Seems like we save 10x memory allocations.
The source code is not yet very clean, but it is sufficient for tests :)

Some minor notes

  • I tried to not cheat and further optimize things just to make the comparison legit. Indeed, the code in the fork is right now semantically equivalent to the one in the dev branch
  • Note that I haven't modified parseTimeAgoAmount, which however relies on a regex behind the scenes. Right now, textualDate is parsed multiple times, but one could just parse it once and get the necessary information.

@lrusso96
Copy link
Contributor

@FireMasterK did you have time to take a look at it?
Also, recent changes in #1082 should share similar memory allocation issues.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request youtube service, https://www.youtube.com/
Projects
None yet
Development

No branches or pull requests

2 participants