Skip to content

Commit

Permalink
Attempt to fix gsub with zero length matches
Browse files Browse the repository at this point in the history
Problem: If a regex given to gsub can match a zero length string, it
will never terminate and eventually crash. This is because after
matching once it keeps checking for the same match at the same location
and finding it.

Solution: We can't skip ahead because there might be a non-zero length
match at the same location. Instead, keep track of whether the last
match was zero length, and when it was, ignore any zero length match at
the same offset.

Remaining problem: This fails one of the new tests I added. For some
reason, match doesn't seem to correctly return (some) zero length
matches at the end of the input string. I'm going to report that as a
separate bug, I can't see why it behaves like that.

Ref: jqlang#2148
  • Loading branch information
weeble committed Mar 30, 2023
1 parent cff5336 commit 66cba01
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 5 deletions.
20 changes: 15 additions & 5 deletions src/builtin.jq
Original file line number Diff line number Diff line change
Expand Up @@ -132,20 +132,30 @@ def sub($re; s; flags):
def subg: [explode[] | select(. != 103)] | implode;
# "fla" should be flags with all occurrences of g removed; gs should be non-nil if flags has a g
def sub1(fla; gs):
def mysub:
def find_first_new_match($lastzero):
[label $out|
match($re; fla + "g")|
if $lastzero then
# After a zero length match, don't return the same match again!
select(.offset != 0 or .length != 0)
else . end|
., break $out
];
def mysub($lastzero):
. as $in
| [match($re; fla)]
| find_first_new_match($lastzero)
| if length == 0 then $in
else .[0] as $edit
else (.[0]) as $edit
| ($edit | .length == 0) as $zerolength
| ($edit | .offset + .length) as $len
# create the "capture" object:
| reduce ( $edit | .captures | .[] | select(.name != null) | { (.name) : .string } ) as $pair
({}; . + $pair)
| $in[0:$edit.offset]
+ s
+ ($in[$len:] | if length > 0 and gs then mysub else . end)
+ ($in[$len:] | if gs then mysub($zerolength) else . end)
end ;
mysub ;
mysub(false) ;
(flags | index("g")) as $gs
| (flags | if $gs then subg else . end) as $fla
| sub1($fla; $gs);
Expand Down
8 changes: 8 additions & 0 deletions tests/onig.test
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,14 @@ gsub( "(.*)"; ""; "x")
""
""

gsub(""; "_")
"abc"
"_a_b_c_"

gsub("\\b"; "X")
"one two three"
"XoneX XtwoX XthreeX"

[.[] | scan(", ")]
["a,b, c, d, e,f",", a,b, c, d, e,f, "]
[", ",", ",", ",", ",", ",", ",", ",", "]
Expand Down

0 comments on commit 66cba01

Please # to comment.