Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

gsub not works while or lookahead and normal pattern #2532

Closed
fish2bird opened this issue Jan 30, 2023 · 6 comments · Fixed by #2641
Closed

gsub not works while or lookahead and normal pattern #2532

fish2bird opened this issue Jan 30, 2023 · 6 comments · Fixed by #2641

Comments

@fish2bird
Copy link

fish2bird commented Jan 30, 2023

Describe the bug

I want output capture pattern if matched else empty
example input

not a cd cmd
cd /ok

I want extract /ok which follows cd , and replace invalid lines to empty, expected

""
"/ok"

my try is jq -nR 'inputs|gsub("^(?!cd ).*$|^cd "; "")'
^(?!cd ).*$ works, ^cd also works, but failed while out them together.

To Reproduce


[root build]#(echo "not a cd cmd"; echo "cd /ok") | ./jq -nR 'inputs|gsub("^cd "; "")'
"not a cd cmd"
"/ok"
[root build]#(echo "not a cd cmd"; echo "cd /ok") | ./jq -nR 'inputs|gsub("^(?!cd ).*$"; "")'
""
"cd /ok"
[root build]#(echo "not a cd cmd"; echo "cd /ok") | ./jq -nR 'inputs|gsub("^(?!cd ).*$|^cd "; "")'
""
""

Expected behavior

jq works likes perl

[root build]#(echo "not a cd cmd"; echo "cd /ok")
not a cd cmd
cd /ok
[root build]#(echo "not a cd cmd"; echo "cd /ok") | perl -pe 's/^(?!cd ).*$|^cd //g'

/ok
[root build]#

Environment (please complete the following information):

  • OS: CentOS 7
  • jq version: jq-1.6

Additional context

@pkoppstein
Copy link
Contributor

Use sub, not gsub. gsub interprets ^ naively.

@fish2bird
Copy link
Author

Use sub, not gsub. gsub interprets ^ naively.

[root build]#(echo "not a cd cmd"; echo "cd /ok") | ./jq -nR 'inputs|sub("^(?!cd ).*$|^cd "; ""; "")'
""
"/ok"

But, My final aim is to replace multi occurrences in a line,
for example cd asdf; cd bbbb; cd ddd to asdf;bbbb;ddd

I also checked sub(pattern; tostring; "g"), which failed as gsub

[root build]#(echo "not a cd cmd"; echo "cd /ok") | ./jq -nR 'inputs|sub("^cd "; ""; "g")'
"not a cd cmd"
"/ok"
[root build]#(echo "not a cd cmd"; echo "cd /ok") | ./jq -nR 'inputs|sub("^(?!cd ).*$"; ""; "g")'
""
"cd /ok"
[root build]#(echo "not a cd cmd"; echo "cd /ok") | ./jq -nR 'inputs|sub("^(?!cd ).*$|^cd "; ""; "g")'
""
""

@pkoppstein
Copy link
Contributor

pkoppstein commented Jan 31, 2023

@chencang1980 wrote:

My final aim is to replace multi occurrences in a line

So if you want to use jq, you'll either have to write your own gsub or choose a different parsing strategy. How about splitting on the ';'? Or if you want a purely-regex approach:

def sub:
  if . == null then empty
  elif length == 0 then .
  else capture("^( *cd  *)?(?<dir>[^;]*)(;(?<etc>.*))?")
  | .dir, (.etc|sub)
  end;

@fish2bird
Copy link
Author

you'll either have to write your own gsub or choose a different parsing strategy

Thank you, My final workaround is capture + sed:

make_to_compile_commands_json ()
{
    make --always-make --dry-run \
        | grep -wE '^(cd .* && )?/?(\w+/)*(gcc|g\+\+|cc|c\+\+) .*-c ' \
        | ${JQ_PATH:-jq} -nR '[inputs|capture("^(cd (?<directory>.*) && )?(?<command>.* (?<file>[^ \\t]+))$")]' \
        | sed -e 's/: null,$/: ".",/g' \
        > compile_commands.json
}

There may be many traps in sub/gsub, whose document just say as PCRE......
Will document change or code change?

@itchyny
Copy link
Contributor

itchyny commented Jun 5, 2023

Inaccurate documentation about PCRE was raised by #2439 before. No PR fixing this was created though.

@pkoppstein
Copy link
Contributor

pkoppstein commented Jun 20, 2023

@itchyny - This issue (specifically jq -nR 'inputs|gsub("^(?!cd ).*$|^cd "; "")') is resolved by PR #2624

pkoppstein added a commit to pkoppstein/jq that referenced this issue Jun 29, 2023
… uniq(stream)

The primary purpose of this commit (which supercedes PR
jqlang#2624) is to rectify most problems
with `gsub` (and also `sub` with the "g" option), in particular jqlang#1425
('\b'), jqlang#2354 (lookahead), and jqlang#2532 (regex == "^(?!cd ).*$|^cd ";"")).

This commit also partly resolves jqlang#2148 and jqlang#1206 in that `gsub` no
longer loops infinitely; however, because the new `gsub` depends
critically on match(_;"g"), the behavior when regex == "" is sometimes
non-standard. [*1]

Since the new sub/3 relies on uniq/1, that has been added as well [*2].

The documentation has been updated to reflect the fact that `sub` and
`gsub` are intended to be regular in the second argument. [*3]

Also, _nwise/1 has been tweaked to take advantage of TCO.

Footnotes:

[*1] Using the new gsub, '"a" | gsub( ""; "a")' emits "aa" rather than
"aaa" as would be standard.  This is nevertheless better than the
infinite loop behavior of jq 1.6 in this case.

With one exception (as explained in [*2]), the new gsub is implemented
as though match/2 behavior is correct.  That is, bugs in `gsub`
behavior will most likely have their origin in `match/2`.

[*2] `uniq/1` adopts the Unix/Linux name and semantics; it is needed for the following test case:

gsub("(?=u)"; "u")
"qux"
"quux"

Without this functionality:

Test jqlang#23: 'gsub("(?=u)"; "u")' at line number 100
*** Expected "quux", but got "quuux" for test at line number 102: gsub("(?=u)"; "u")

The root of the problem here is `match`: if `match` is fixed, then gsub would not need `untie`.

The addition of `uniq` as a top-level function should be a non-issue
relative to general concern about builtins.jq bloat: the line count of
the new builtin.jq is significantly reduced overall, and the number of
defs is actually reduced by 1 (from 111 (ignoring a redundant def) to 110).

[*3] See e.g. jqlang#513 (comment)
@itchyny itchyny added this to the 1.7 release milestone Jul 2, 2023
itchyny pushed a commit that referenced this issue Jul 3, 2023
The primary purpose of this commit is to rectify most problems with
`gsub` (and also `sub` with the `g` option), in particular fix #1425 ('\b'),
fix #2354 (lookahead), and fix #2532 (regex == `"^(?!cd ).*$|^cd "`).

This commit also partly resolves #2148 and resolves #1206 in that
`gsub` no longer loops infinitely; however, because the new `gsub`
depends critically on `match/2`, the behavior when regex == `""` is
sometimes non-standard.

The documentation has been updated to reflect the fact that `sub`
and `gsub` are intended to be regular in the second argument.

Also, `_nwise/1` has been tweaked to take advantage of TCO.
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants