Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Incorrect serialization of float subclass with custom __str__ method #57

Closed
ryanmarquardt opened this issue Jul 31, 2022 · 19 comments
Closed
Assignees
Labels
available Indicates which bugs have been triaged and reproduced, rather than just filed bug

Comments

@ryanmarquardt
Copy link

I have a float subclass with a customized str method, and its str output is appearing in the results of json5.dumps. Here's a simplified example:

>>> import json5
>>> class MyFloat(float):
...     def __str__(self):
...         return f"<{super().__str__()}>"

>>> print(json5.dumps(MyFloat(1)))
<1.0>

>>> json5.loads(json5.dumps(MyFloat(1))
Traceback (most recent call last):
...
ValueError: <string>:1 Unexpected "<" at column 1
@dpranke
Copy link
Owner

dpranke commented Jul 31, 2022

Thanks for reporting the issue. I have reproduced it locally (thanks to the easy repro!). I am a bit surprised that this hasn't been reported before.

It looks like json.dumps(MyFloat(1)) returns 1.0 instead of <1.0>. I'm not immediately sure why that is; it looks like the json module is calling repr(MyFloat(1)) instead of str(MyFloat(1)) (and json5 is doing the latter).

My first reaction is that json5 is right and json is wrong, i.e., we should be calling str here, not repr.

That said, I view matching json's behavior to be much more important than trying to say which one is "right". So, I'll definitely change it, but I'll also poke into this a bit more and see if there's something that explains why json might be correct to use repr.

Also, I'll point out that you shouldn't count on loads(dumps())he to always work in the presence of custom classes, at least not if you want to preserve the fact that the object is a MyFloat and not a str, at least not unless you're using a custom parse_float method. But you probably already know that :).

@dpranke dpranke added bug available Indicates which bugs have been triaged and reproduced, rather than just filed labels Jul 31, 2022
@ryanmarquardt
Copy link
Author

Thanks for taking a look at this so quick. It looks like the same issue crops up for int subclasses too.

I did some searching, and found that str and repr used to be different for float before around python 3.2, according to https://bugs.python.org/issue9337. repr was more precise, which is probably why it was used when json was originally written. Either way, the methods are identical now, so it doesn't make any difference.

Is there any reason why using float.__str__(MyFloat(1)) directly to sidestep any customization would cause problems?

@dpranke
Copy link
Owner

dpranke commented Jul 31, 2022

I did some searching ...

Thanks for those leads! :).

Is there any reason why using float.__str__(MyFloat(1)) directly to sidestep any customization would cause problems?

I have no idea. Changing the default behavior of any public function can usually cause problems. More importantly, reaching deep into a class's private methods doesn't feel great to me.

I think I can probably get equivalent behavior by actually calling JSONEncoder().encode(MyFloat(1)) and that would provide a stronger guarantee that json5's behavior matches json's. I'll probably code that up and see if it flies.

@dpranke
Copy link
Owner

dpranke commented Aug 1, 2022

I forgot that back in 10e4e82 I removed (the last) dependency on json.
I don't remember why I didn't want the dependency, exactly: it might have been just for aesthetics, or it might
have been by request from someone.

At any rate, in order to not reintroduce the dependency on json now, I'll call float.repr and int.repr directly
as you suggested instead.

dpranke added a commit that referenced this issue Aug 1, 2022
[GitHub issue #57](#57)
Fixed serialization for objects that subclass `int` or `float`:
Previously we would use the objects __str__ implementation, but
that might result in an illegal JSON5 value if the object had
customized __str__ to return something illegal. Instead,
we follow the lead of the `JSON` module and call `int.__repr__`
or `float.__repr__` directly.

While I was at it, I added tests for dumps(-inf) and dumps(nan)
when those were supposed to be disallowed by `allow_nan=False`.
@dpranke
Copy link
Owner

dpranke commented Aug 1, 2022

Okay, I've fixed this in 1f7f806 and pushed that to PyPI as v0.9.9.

Please let me know if you find any issues, and thanks again!

@dpranke dpranke closed this as completed Aug 1, 2022
@L-A-F-B
Copy link

L-A-F-B commented Nov 20, 2024

I have a list of integers that I want to serialize in hex format. This fix now makes that impossible.

With an older version of the json5 package, I was able to do this:

import json5
class hexint(int):
... def str(self):
... return hex(self)
...
mylist = [10,20,30]
mylist = [hexint(x) for x in mylist]
json5.dumps(mylist)
'[0xa, 0x14, 0x1e]'

Now this does not work. I get '[10, 20, 30]' instead. How do I accomplish what I want now?

@dpranke
Copy link
Owner

dpranke commented Nov 20, 2024

Good question. You could use json.dumps(mylist, cls=CustomEncoder) to get something close, but if you wanted the output to be json5 in a good style (i.e., keys as identifiers instead of strings), you'd basically have to reimplement all of the logic in json5.dumps(), at which point you might as well not use dumps() at all and just use your own code.

I would not want to revert the bugfix, because matching json's behavior by default is a primary goal.

However, currently the cls argument to dumps() isn't supported, and in thinking about it, I don't think there's a particularly good reason not to support it. I think I could refactor the code in dumps() to use a class that was easily subclassed, the way json does. So, I'll look into that.

[ This is unlike loads(), where I don't think there's an easy way to restructure the code to support a subclassable-class, as the parsing algorithm is totally different and not easily amenable to custom parsing. ]

Would supporting a custom encoder where you'd have to write something like:

class HexintEncoder(json5.JSON5Encoder):
    def encode(self, o):
        if isinstance(o, hexint):
            return format(o, '#x')
        return super().encode(o)

json5.dumps(mylist, cls=HexintDecoder)

work for you? There could possibly be a way to do this w/ a custom encoder function (as opposed to a class) as well, but I'd have to add a new keyword arg to specify the function, and I'd prefer to avoid that if possible.

@L-A-F-B
Copy link

L-A-F-B commented Nov 20, 2024 via email

@dpranke
Copy link
Owner

dpranke commented Nov 20, 2024

Yes, that would be the idea.

@dpranke dpranke reopened this Nov 21, 2024
@dpranke dpranke self-assigned this Nov 21, 2024
@dpranke
Copy link
Owner

dpranke commented Nov 21, 2024

I've started working on this and have an initial version that I think should work. I still have to add tests, and I'm debating
whether I want to add an iterencode() method to mirror the json.JSONEncoder implementation.

@dpranke
Copy link
Owner

dpranke commented Nov 26, 2024

Okay, that took substantially more work than I thought it would, but I think I have everything ready to go. See #88 and, if you have a chance take a look and see if looks like it'll work for you. I'll likely merge it in tomorrow.

@dpranke
Copy link
Owner

dpranke commented Nov 26, 2024

(Also, I did not add an iterencode method.)

@L-A-F-B
Copy link

L-A-F-B commented Nov 26, 2024 via email

@dpranke
Copy link
Owner

dpranke commented Nov 26, 2024

By "try it out tomorrow" did you mean you want to look at the changes and/or try them before I merge the PR? Or did you want me to (or not care if) I merged things first?

@L-A-F-B
Copy link

L-A-F-B commented Nov 26, 2024 via email

@dpranke
Copy link
Owner

dpranke commented Nov 26, 2024

Fixed in #88, tagged as v0.10.0, and released to PyPI.

@dpranke dpranke closed this as completed Nov 26, 2024
@L-A-F-B
Copy link

L-A-F-B commented Dec 11, 2024 via email

@L-A-F-B
Copy link

L-A-F-B commented Dec 11, 2024 via email

@dpranke
Copy link
Owner

dpranke commented Dec 12, 2024

Thanks for confirming. You had me worried for a second :).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
available Indicates which bugs have been triaged and reproduced, rather than just filed bug
Projects
None yet
Development

No branches or pull requests

3 participants