Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Unexpected behavior with duplicate attributes #1795

Closed
asarkar opened this issue Jan 23, 2019 · 16 comments
Closed

Unexpected behavior with duplicate attributes #1795

asarkar opened this issue Jan 23, 2019 · 16 comments

Comments

@asarkar
Copy link

asarkar commented Jan 23, 2019

Actual behavior:

$ echo '{"a" : "b", "a" : "c" }' | jq '.a'
"c"

Duplicate keys are valid, but discouraged, in JSON. https://dzone.com/articles/duplicate-keys-in-json-objects

Expected behavior:
Both b and c are output.

Environment:

  • OS and Version: macOS 10.14.2
  • jq version: 1.5
@nicowilliams
Copy link
Contributor

nicowilliams commented Jan 23, 2019

jq uses a hash table to represent objects, and does not allow duplicates.

You can actually handle duplicates in input using the --stream command-line option, in which case you'll get all the duplicates.

@pkoppstein
Copy link
Contributor

@asarkar - As you say:

Duplicate keys are valid, but discouraged, in JSON.

And indeed jq does allow them (no error message is generated), and one interpretation of the fact that the JSON spec discourages duplicate keys is that it is recognized that JSON parsers are free to interpret them as they see fit.

None of this is to say that the jq documentation could not be improved!

@asarkar
Copy link
Author

asarkar commented Jan 23, 2019

@nicowilliams I'm not a jq Ninja, can you show me a working command using the example in my post?

@asarkar
Copy link
Author

asarkar commented Jan 23, 2019

@pkoppstein Discouragement != Liberty to go crazy. Unless the JSON spec explicitly says the parser is free to do whatever it feels like, or that it's ok for the behavior to be undefined, this is a bug

I stand corrected. Section 4:

When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates.

@pkoppstein
Copy link
Contributor

asarkar wrote:

I'm not a jq Ninja, ...

From the jq FAQ:

𝑸: Can jq process objects with duplicate keys? Can jq help convert objects with duplicate keys to an alternative format so that no information is lost?

A: The JSON syntax formally allows objects with duplicate keys, and jq can accordingly read them, but the regular jq parser effectively ignores all but the last occurrence of each key within any given object.
jq's streaming parser, however, can be used to convert a JSON object with duplicate keys to an alternative format so that none of the values are lost. This is illustrated at https://stackoverflow.com/questions/36956590/json-fields-have-the-same-name.

@nicowilliams
Copy link
Contributor

@asarkar sure, check this out:

$ echo '{"a" : "b", "a" : "c" } {"a": "b"} {"a": "c"}' | jq -c --stream .
[["a"],"b"]
[["a"],"c"]
[["a"]]
[["a"],"b"]
[["a"]]
[["a"],"c"]
[["a"]]
$ 

In streaming mode you get a representation of the input as path + value tuples. "Closing" of arrays/objects is denoted as a tuple of just a path, which allows you to disambiguate inputs like the above example's. This means you can have as many duplicate keys as you'd like, and you'll see each and every one, but the price you pay is that for some tasks it's a bit harder to work with this representation. There's a few utility builtin functions for dealing with streamed JSON data (fromstream, tostream, and truncate_stream) that you can read about in the manual.

Without knowing what you're trying to accomplish, I can only show you what jq can do, not how to apply it to your problem.

@asarkar
Copy link
Author

asarkar commented Jan 23, 2019

@nicowilliams Thanks, I'm simply looking for something like { "a": ["b", "c"] }. Any other way of representing multiple values associated with the same key is fine too.

@nicowilliams
Copy link
Contributor

It should be possible to write a version of fromstream that collects the values of keys into arrays. And another variant could unwrap values of arrays of one value (but if that value is itslef an array, then you'll have an ambiguity).

@nicowilliams
Copy link
Contributor

nicowilliams commented Jan 23, 2019

@asarkar here:

$ printf '{"a":0,"a":1,"b":2}\n'|jq -cn --stream 'def fromstream_with_dups(i):
  foreach i as $i (
    [null, null];

    if ($i | length) == 2 then
      if ($i[0] | length) == 0 then .
      elif $i[0][-1]|type == "string" then
        [ ( .[0] | setpath($i[0]; getpath($i[0]) + [$i[1]]) ), .[1] ]
      else [ ( .[0] | setpath($i[0]; $i[1]) ), .[1] ]
      end
    elif ($i[0] | length) == 1 then [ null, .[0] ]
    else .
    end;

    if ($i | length) == 1 then
      if ($i[0] | length) == 1 then .[1]
      else empty
      end
    elif ($i[0] | length) == 0 then $i[1]
    else empty
    end
  );
  fromstream_with_dups(inputs)'
{"a":[0,1],"b":[2]}
$ 

Now you may want to unwrap some of the arrays:

$ printf '{"a":0,"a":1,"b":2}\n'|jq -cn --stream 'def fromstream_with_dups(i; fix):
  foreach i as $i (
    [null, null];

    if ($i | length) == 2 then
      if ($i[0] | length) == 0 then .
      elif $i[0][-1]|type == "string" then
        [ ( .[0] | setpath($i[0]; getpath($i[0]) + [$i[1]]) ), .[1] ]
      else [ ( .[0] | setpath($i[0]; $i[1]) ), .[1] ]
      end
    elif ($i[0] | length) == 1 then [ null, .[0] ]
    else .
    end;

    if ($i | length) == 1 then
      if ($i[0] | length) == 1 then .[1] | fix
      else empty
      end
    elif ($i[0] | length) == 0 then $i[1]
    else empty
    end
  );
def fix:
  if type != "object" then .
  else
    reduce keys_unsorted[] as $key (.;
      if .[$key]|length == 1 then
        .[$key] |= .[0]
      else
        .
      end)
  end;
fromstream_with_dups(inputs; fix)'
{"a":[0,1],"b":2}
$ 

@nicowilliams
Copy link
Contributor

We might want to add fromstream_with_dups/1 and fromstream_with_dups/2 to the builtin library.

@nicowilliams
Copy link
Contributor

@asarkar did that work for you? Anyways, I'll integrate this into the next release -- it's a pretty nifty feature, so thanks for asking for it.

@asarkar
Copy link
Author

asarkar commented Jan 26, 2019

@nicowilliams I didn't get around to trying it. As you can probably tell, the actual JSON is more complicated than the puny one I'd posted, and I didn't feel like jostling with jq on top of the already-quite-complicated code that you'd kindly provided. I ended up using a Groovy parser in Groovy Console.
Thanks though, I appreciate you taking the time.

@nicowilliams
Copy link
Contributor

Fair enough. They complicated code i pasted is really just a small variation on code already inside jq, fyi :)

@asarkar
Copy link
Author

asarkar commented Jan 27, 2019

We might want to add fromstream_with_dups/1 and fromstream_with_dups/2 to the builtin library.

@nicowilliams I thought you were going to leave this ticket open for ^^^

nicowilliams added a commit to nicowilliams/jq that referenced this issue Jan 27, 2019
This commit adds a pair of built-ins for dealing with streamed JSON
texts that have duplicate keys.

Eventually we may need a `tostream_with_dups` as well that implements a
convention that array values in objects are to be presented as duplicate
keys with array elements as values.  There are ambiguity issues, like
how should one stream `{"a":[]}`.
nicowilliams added a commit to nicowilliams/jq that referenced this issue Feb 6, 2019
This commit adds a pair of built-ins for dealing with streamed JSON
texts that have duplicate keys.

Eventually we may need a `tostream_with_dups` as well that implements a
convention that array values in objects are to be presented as duplicate
keys with array elements as values.  There are ambiguity issues, like
how should one stream `{"a":[]}`.
@lafkpages
Copy link

We might want to add fromstream_with_dups/1 and fromstream_with_dups/2 to the builtin library.

Did this ever happen?

@nicowilliams
Copy link
Contributor

We might want to add fromstream_with_dups/1 and fromstream_with_dups/2 to the builtin library.

Did this ever happen?

There's an open PR for that, but IIRC it's not ready. You can write such utility functions yourself in your jq programs.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants