Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

JSON: Parsing and serializing numbers, often undesired E notation #1583

Open
ChristianGruen opened this issue Nov 18, 2024 · 6 comments
Open
Labels
Enhancement A change or improvement to an existing feature Serialization An issue related to the Serialization spec

Comments

@ChristianGruen
Copy link
Contributor

If JSON numbers are converted to XML and serialized as JSON, it is confusing to end up with an E notation for large numbers. An example:

'100000000000000000000'
=> parse-json()
=> serialize(map { 'method': 'json' })

Obviously, lossless roundtripping is not possible (1e20 is a valid JSON number, so we cannot distinguish it from 100000000000000000000), but as the E notation is much less common than integers, maybe we could try to return more numbers in their integer representation if the result would be equivalent?

Related: #1445

@ChristianGruen ChristianGruen added Enhancement A change or improvement to an existing feature Serialization An issue related to the Serialization spec labels Nov 18, 2024
@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Nov 18, 2024

I see that the Serialization spec states (https://qt4cg.org/specifications/xslt-xquery-serialization-40/Overview.html#json-output):

Implementations may serialize the numeric value using any lexical representation of a JSON number defined in [RFC 7159].

Ideally, we could define a representation that is not implementation-dependent.

@ChristianGruen
Copy link
Contributor Author

We could use 6.1.6.1.20 Number::toString as a guideline (and preserve the existing rules for NaN and Infinity).

@ChristianGruen
Copy link
Contributor Author

There is an asymetry between parsing and serializing JSON:

  • By default, JSON numbers are parsed to xs:double items (unless the number-parser option is used).
  • All numeric types can be serialized to JSON numbers (doubles, integers, decimals, etc.).

Again, the behavior depends on the implementation. For example, Saxon, eXist and BaseX serialize the xs:decimal 1.00000000000000000000000000001 unchanged, while XMLPrime returns 1:

1.00000000000000000000000000001
=> serialize(map { 'method': 'json' })

Do we think it’s advantageous to serialize numeric types differently, or should we rather serialize all as doubles?

@michaelhkay
Copy link
Contributor

Clearly, adding the number-parser option to parse-json() was an attempt to solve this problem while retaining backwards compatibility. It seems you want to do something a bit more agressive that affects the default behaviour in a way that might not retain backwards compatibility. One option clearly is for parse-json to deliver an integer, decimal, or double depending on the lexical form of the number, in the same way that we do for numeric literals. That could be overridden (to reinstate the 3.1 behaviour) by setting number-parser=xs:double#1.

I'm a bit reluctant to change the serialization rules. If we change them to be more prescriptive, then some implementations will need to change and users may not like the change. I'm reluctant to add serialization parameters to give users more control. Saxon's rule, incidentally is (a) for xs:decimal, never use exponential notation, (b) for xs:double, use exponential notation only outside the range 1e-18 to 1e+18. That seems to be good enough for most people.

@ChristianGruen
Copy link
Contributor Author

Clearly, adding the number-parser option to parse-json() was an attempt to solve this problem while retaining backwards compatibility. It seems you want to do something a bit more agressive that affects the default behaviour in a way that might not retain backwards compatibility. One option clearly is for parse-json to deliver an integer, decimal, or double depending on the lexical form of the number, in the same way that we do for numeric literals. That could be overridden (to reinstate the 3.1 behaviour) by setting number-parser=xs:double#1.

Yes, an alternative would be to change the parsing. However, I believe this approach would be more invasive than changing serialization.

I'm a bit reluctant to change the serialization rules. If we change them to be more prescriptive, then some implementations will need to change and users may not like the change. I'm reluctant to add serialization parameters to give users more control. Saxon's rule, incidentally is (a) for xs:decimal, never use exponential notation, (b) for xs:double, use exponential notation only outside the range 1e-18 to 1e+18. That seems to be good enough for most people.

What is the reason for choosing 1e+18/why does it differ from what ECMA does (see the link above)?

I believe most users would appreciate it if all implementations behaved similarly. The problem is pretty similar to our fn:xml-to-json challenge.

@michaelhkay
Copy link
Contributor

What is the reason for choosing 1e+18/why does it differ from what ECMA does (see the link above)?

I guess I was unaware that ECMA had chosen 1e+21 as the threshold.

My thinking was primarily to ensure that all integers that had "accidentally" been converted to floating point would be output as integers, and I think 18 is sufficient for that (in fact, 15 probably is).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Enhancement A change or improvement to an existing feature Serialization An issue related to the Serialization spec
Projects
None yet
Development

No branches or pull requests

2 participants