-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
gsub lookahead cannot allocate memory #2354
Comments
TL;DR: (1) The example you gave (using "(<=u)" involves a look-behind RE. Using it, I was not able to reproduce the problem on a Mac OS, though it's interesting that gojq doesn't like the RE at all, and prints it out in a curious way. Details are below. (2) Details for look-behind:
Output:
|
@pkoppstein thank you for your findings. I'd like to add that the same regex works as expected in tools using another regex flavor, e.g.
|
… uniq(stream) The primary purpose of this commit (which supercedes PR jqlang#2624) is to rectify most problems with `gsub` (and also `sub` with the "g" option), in particular jqlang#1425 ('\b'), jqlang#2354 (lookahead), and jqlang#2532 (regex == "^(?!cd ).*$|^cd ";"")). This commit also partly resolves jqlang#2148 and jqlang#1206 in that `gsub` no longer loops infinitely; however, because the new `gsub` depends critically on match(_;"g"), the behavior when regex == "" is sometimes non-standard. [*1] Since the new sub/3 relies on uniq/1, that has been added as well [*2]. The documentation has been updated to reflect the fact that `sub` and `gsub` are intended to be regular in the second argument. [*3] Also, _nwise/1 has been tweaked to take advantage of TCO. Footnotes: [*1] Using the new gsub, '"a" | gsub( ""; "a")' emits "aa" rather than "aaa" as would be standard. This is nevertheless better than the infinite loop behavior of jq 1.6 in this case. With one exception (as explained in [*2]), the new gsub is implemented as though match/2 behavior is correct. That is, bugs in `gsub` behavior will most likely have their origin in `match/2`. [*2] `uniq/1` adopts the Unix/Linux name and semantics; it is needed for the following test case: gsub("(?=u)"; "u") "qux" "quux" Without this functionality: Test jqlang#23: 'gsub("(?=u)"; "u")' at line number 100 *** Expected "quux", but got "quuux" for test at line number 102: gsub("(?=u)"; "u") The root of the problem here is `match`: if `match` is fixed, then gsub would not need `untie`. The addition of `uniq` as a top-level function should be a non-issue relative to general concern about builtins.jq bloat: the line count of the new builtin.jq is significantly reduced overall, and the number of defs is actually reduced by 1 (from 111 (ignoring a redundant def) to 110). [*3] See e.g. jqlang#513 (comment)
The primary purpose of this commit is to rectify most problems with `gsub` (and also `sub` with the `g` option), in particular fix #1425 ('\b'), fix #2354 (lookahead), and fix #2532 (regex == `"^(?!cd ).*$|^cd "`). This commit also partly resolves #2148 and resolves #1206 in that `gsub` no longer loops infinitely; however, because the new `gsub` depends critically on `match/2`, the behavior when regex == `""` is sometimes non-standard. The documentation has been updated to reflect the fact that `sub` and `gsub` are intended to be regular in the second argument. Also, `_nwise/1` has been tweaked to take advantage of TCO.
Describe the bug
When using regex lookahead
jq
freezes, eats up all system memory, prints an error message and eventually abortsTo Reproduce
Expected behavior
jq
outputs the result and exitEnvironment
Additional context
I'm aware that this may be an issue in Oniguruma but I wasn't able to reproduce it due to my lack of experience with C
The text was updated successfully, but these errors were encountered: