Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add configurable limit for the maximum number of bytes or chars that we will parse in a number #815

Closed
pjfanning opened this issue Sep 21, 2022 · 6 comments
Labels
processing-limits Issues related to limiting aspects of input/output that can be processed without exception
Milestone

Comments

@pjfanning
Copy link
Member

Number parsing can be expensive and is not O(n) - a string with twice the number of digits as another string takes more than twice as long to parse.

A malicious actor might craft dangerous a JSON payload with very long numbers.

Idea would be to have a default limit - eg 1000 chars/bytes and to allow users to configure a bigger limit. A setting maybe on JsonFactory class.

Relates to #813 - @cowtowncoder @plokhotnyuk this might be a more achievable v2.14 work item than the other new issues I raised.

@cowtowncoder
Copy link
Member

I think this would fall under #637 as another case?

@pjfanning
Copy link
Member Author

pjfanning commented Sep 23, 2022

I guess this is point 4 in #637 but in practise, it would be easier to treat 637 as an epic and to have subtasks to implement the individual tasks.

I guess this will need to be done within the various JSON parser classes (I think there are 4) because you have access to the JsonParser config there and the earlier the issue is caught the better. Would JsonFactory be a good place to put the config?

@cowtowncoder
Copy link
Member

Right. And not just JSON parsers, many limits should apply to other formats too.

As to configuration, I think there's probably need for base class (for likely shared limits, across most formats) and probably format-specific subtypes.

Or perhaps it'd be more practical NOT to create general limit system and only start with JSON.

Either way, yes, JsonFactory/TokenStreamFactory would be the place. And limit class should be immutable, builder-style.

As to Epic/tasks; yes, I guess. Could have one setup task for initial wiring, although that is likely to be done as part of implementing first limits.

Woodstox has something similar for XML, for what that is worth. Not sure how useful it is, but for limits it imposes I found that the easiest places to add checks tended to be places where buffer boundary is reached (need to increase buffer). This avoids having to do check for each and every character for example.

@pjfanning
Copy link
Member Author

pjfanning commented Oct 22, 2022

@cowtowncoder I had a look at adding a setting to TokenStreamFactory (inherited by JsonFactory) but accessing the factory settings in the JsonParser instance will require quite a bit of plumbing. I think getting the maxNumLen value into the TextBuffer would allow for some useful checks there.

JsonFactory has this

    protected JsonParser _createParser(InputStream in, IOContext ctxt) throws IOException {
        try {
            return new ByteSourceJsonBootstrapper(ctxt, in).constructParser(_parserFeatures,
                    _objectCodec, _byteSymbolCanonicalizer, _rootCharSymbols, _factoryFeatures);

It's sort of a pity that constructParser doesn't just take the JsonFactory as a param as opposed to passing through all those instance variables.

  • I could add an extra param to constructParser and deprecate the existing version of constructParser
  • Or I could add a setMaxNumLen to JsonParser and call that setter after constructParser but before returning the parser instance.
  • Or I could modify ObjectCodec or one the existing to set the maxNumLen there so constructParser could pass on the maxNumLen that way

Which of these approaches make the most sense to you? Or would you have an alternative approach that could be used?

I have a very rough draft in #827

@cowtowncoder
Copy link
Member

cowtowncoder commented Oct 23, 2022

Ok, I'll add same note(s) as in PR: I think we should have immutable configuration value objects (like StreamReadConfig, StreamWriteConfig?), constructed using builder() (defaulting to some sane values). This would then be passed to parser/generator (accessed from builder). This would reduce number of changes when we add new settings, avoid API changes wrt plumbing.
These types should also allow sub-classing on per-format basis. Probably should use "self-type" approach for builders but not values.
Finally, parser/generator should not have setters, unless 2.x absolutely requires one. Could expose getter, might be useful.

I can help with the implementation.

Note that it's perfectly fine to require use of Builder-style construction for new functionality like this (for 2.15), just cannot retrofit older configurability. But we may -- if we want to -- also expose setter(s) for JsonFactory if that makes sense. But try to keep parser/generator instances as immutable as possible.

@cowtowncoder
Copy link
Member

Fixed via #827, need to slightly modify release notes to point to this issue.

@cowtowncoder cowtowncoder added the processing-limits Issues related to limiting aspects of input/output that can be processed without exception label Jun 12, 2023
@cowtowncoder cowtowncoder added this to the 2.15.0 milestone Jun 12, 2023
@cowtowncoder cowtowncoder changed the title add configurable limit for the number of bytes or chars that we will parse in a number Add configurable limit for the maxium number of bytes or chars that we will parse in a number Jun 12, 2023
@cowtowncoder cowtowncoder changed the title Add configurable limit for the maxium number of bytes or chars that we will parse in a number Add configurable limit for the maximum number of bytes or chars that we will parse in a number Jun 12, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
processing-limits Issues related to limiting aspects of input/output that can be processed without exception
Projects
None yet
Development

No branches or pull requests

2 participants