Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

writeRaw binary content #914

Open
rashtao opened this issue Jul 14, 2022 · 6 comments
Open

writeRaw binary content #914

rashtao opened this issue Jul 14, 2022 · 6 comments

Comments

@rashtao
Copy link

rashtao commented Jul 14, 2022

What is the recommended way to write raw binary content? As far as I can see com.fasterxml.jackson.core.JsonGenerator#writeRaw() variants only support String/char[]/SerializableString arguments, which are not suitable for binary formats.

Is there any plan to extend it and accept binary arguments, i.e. byte[]/InputStream?

@cowtowncoder
Copy link
Member

cowtowncoder commented Jul 15, 2022

JsonGenerator.writeBinary()? It only uses Base64 encoding if the underlying format does not have native binary values and binary formats typically do.

I do not recommend writeRaw() for any use as that requires caller to know a lot about underlying format (ditto for writeBinaryValue() that is slightly less dangerous); especially so with binary formats.

Also note that the answer may depend on specific format backend, at least when using last-effort methods like writeRaw().

@rashtao
Copy link
Author

rashtao commented Jul 15, 2022

I mean specifically writeRaw(), in order to append an already properly encoded byte sequence. I would like to achieve the same behavior of writeRaw(String|char[]|SerializableString) for binary dataformats. Of course not all format backends can support this, but at least Smile could support this.

Currently, if I have an already encoded valid byte array (e.g. in Smile format), in order to append it, I need first to parse it (i.e. with JsonParser#readValueAsTree() and then write it (i.e. with JsonGenerator#writeTree()), which consumes unneeded cpu and memory resources.

@cowtowncoder
Copy link
Member

Ok. That makes sense then. At this points, PRs would be welcome and/or specific per-format tickets to denote where exactly functionality is missing.

As a work-around one can usually hold on to OutputStream used (and use generator.flush()) but support for writeRaw() would definitely be cleaner as it can auto-flush.

@rashtao
Copy link
Author

rashtao commented Jul 18, 2022

Thanks!

I think it would be even better adding new methods to com.fasterxml.jackson.core.JsonGenerator, e.g. writeRaw((byte[]|InputStream) data).

This would allow further enhancements at databind level by adding support for byte[] value to com.fasterxml.jackson.databind.util.RawValue and will ultimately enable the capability to map binary raw values in databind API, i.e.:

  • in all usages of raw values in com.fasterxml.jackson.databind.node (eg. com.fasterxml.jackson.databind.node.ObjectNode#putRawValue())
  • supporting byte[] property value for @JsonRawValue annotation
  • serializing RawValue fields in binary data formats backends

What is your opinion about it?

@cowtowncoder
Copy link
Member

From API perspective it is sort of tricky, given that now we have set of methods that work/don't-work on different backends (text vs binary). I specifically would be against supporting binary-content - to -text-format (which some users would no doubt want :) ) because it could or could not work based on backend (not for Writer, yes for OutputStream).

But then again without this, writeRaw() variants are useless for binary content and one has to "merge" content with lower level constructs.

I think my initial thinking is that I would want to defer variant(s) that take streams, and start with just byte[] (with possible addition of ByteBuffer). API could be added in 2.14, with default implementation in JsonGenerator that throws an exception indicating backend does not support operation; but with implementation for at least one binary backend to verify usage.
We would want both writeRaw(byte[]) and writeRawValue(byte[]).

We might also need to either use an existing StreamWriteCapability (CAN_WRITE_BINARY_NATIVELY) or add new one(s) in case code needs to decide on text-or-binary handling for some raw values.

It might make sense to divide into first supporting low-level (JsonGenerator) functionality, test that, and then once merged, figure out how to make things work via POJOs and JsonNode. Looks like RawValue would need improvements; and existing RawSerializer wouldn't quite work as-is either.

I probably won't have tons of time to work on this myself but if you or anyone else has time, I would find time to code review and help with work.

@rashtao
Copy link
Author

rashtao commented Jul 19, 2022

Sounds good to me, thanks for sharing! I will draft a PR as I find time.
Should we move the discussion to jackson-core project in order to offer better visibility and gather further feedback?

@cowtowncoder cowtowncoder removed the 2.14 label Feb 4, 2023
@cowtowncoder cowtowncoder transferred this issue from FasterXML/jackson-dataformats-binary Feb 4, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants