Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

unnecessary PREG_OFFSET_CAPTURE usage (?) #110

Open
ViliusS opened this issue Apr 18, 2024 · 3 comments
Open

unnecessary PREG_OFFSET_CAPTURE usage (?) #110

ViliusS opened this issue Apr 18, 2024 · 3 comments

Comments

@ViliusS
Copy link

ViliusS commented Apr 18, 2024

Just wondering is PREG_OFFSET_CAPTURE is needed here

$count = preg_match_all($this->lexerRegex, $text, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE);

I don't see offset used anywhere in the following code however during tests I found that it uses up memory pretty heavily. If I did my testing right, in one case preg_match_all() uses up 80 MB with PREG_OFFSET_CAPTURE and 23 MB without.

@thunderer
Copy link
Owner

Interesting, it looks like a leftover from the time I actually used the offset, now replaced with calculation via mb_strlen(). Would you be interested in preparing a PR for this?

@ViliusS
Copy link
Author

ViliusS commented Apr 18, 2024

I've actually tried that, but the logic in the following switch statement is a little bit over my head. What I could share is why I have investigated this, and minimal test cases which shows memory issues with that preg_match_all() statement https://1drv.ms/f/s!AgnMn-haWyFrkcN50beuEO4A0m6PQw?e=GjwcEV . Ideally I wanted to find out why these content cases uses so much memory, but other cases don't. PREG_OFFSET_CAPTURE looked like a low hanging fruit, but there could be other issues still.

@thunderer
Copy link
Owner

thunderer commented Dec 20, 2024

@ViliusS @funkjedi the PR is merged now, and I also got another idea to make it even better using PCRE MARK verb. Unfortunately this will work in PHP 7+ only, so I'll merge it after figuring out the strategy to bump the required PHP version and modernize the codebase. My plan so far is v0.8.0 for PHP 5.3+, v1.0.0 for PHP 7.1+ and v2.0.0 for PHP 8.1+, with dedicated branches for each major version - master stays for 0.x, then v1.x, and v2.x respectively. PR is here: #116

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants