Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Standard MIME content-type #19

Open
pavelnikolov opened this issue Jul 25, 2016 · 27 comments
Open

Standard MIME content-type #19

pavelnikolov opened this issue Jul 25, 2016 · 27 comments
Assignees

Comments

@pavelnikolov
Copy link

What do you think about adding new HTTP content-type for jsonlines data.
What about application/jsonl?

@jbaehr
Copy link

jbaehr commented Aug 9, 2019

I'd rather prefer application/json-lines otherwise it may look like a typo ;-)

In addition to the Media Type, a registered structured suffix may be interesting. In my eyes even more useful, to create media types like application/vnd.my-company.some-thing+json-lines.

See also:
https://www.iana.org/assignments/media-types/
https://www.iana.org/assignments/media-type-structured-suffix/

@wardi have you considered filing a registration for a json-lines Media Type and structured suffix at IANA?

@karmakaze
Copy link

karmakaze commented Sep 1, 2019

There is an IETF RFC 7464 for JSON Text Sequences that uses mime type: application/json-seq

It allows prefixing each JSON record with <RS> control character and requires ending each JSON record with <LF>.

Also see: https://en.wikipedia.org/wiki/JSON_streaming

@jbaehr
Copy link

jbaehr commented Nov 7, 2019

This seems like a duplicate of #9. The whole purpose of the Content-Type header is to communicate the media type.

@whlavina
Copy link

The lack of a definitive IANA Media Type for JSON Lines causes some difficulty for those of us using the format. In the interest of pushing the issue, I took the liberty of starting a conversation:
https://mailarchive.ietf.org/arch/msg/json/dWMWD0JDa2HiUYjWjLjrQExeIx4/

Perhaps someone here would like to join that thread?

Disclaimer: I am in no way affiliated with the IANA/IETF. I am merely interested in using the format, correctly.

@sp4ce
Copy link
Collaborator

sp4ce commented Dec 19, 2022

@whlavina the response from Tim Bray was the most helpful and it looks nothing had happened since then. I'll copy the interesting bit here for reference

to register a media type you need to link to a stable specification. The contents of https://jsonlines.org/ probably don’t qualify, so the conventional thing would be to write an Internet-Draft which AFAICT would be the same as json-seq only without the leading "ASCII Record Separator (0x1E)" but retaining the trailing \n.

@sp4ce
Copy link
Collaborator

sp4ce commented Apr 3, 2023

I am linking the relevant RFC to suggest new MIME type for standardisation:

https://www.rfc-editor.org/rfc/rfc6838.html

I propose working on adding the mime type application/jsonl into the standard tree (section 3.1). Adding to the standard tree seems the most convoluted, but also, I think this is where it would fit the best.

Among the two ways they list to get it added to the standard tree:

  1. in the case of registrations associated with IETF specifications,
    approved directly by the IESG, or

  2. registered by a recognized standards-related organization using
    the "Specification Required" IANA registration policy [RFC5226]
    (which implies Expert Review).

I think the second one is the most relevant, which leads to https://www.rfc-editor.org/rfc/rfc5226

https://www.iana.org/form/media-types

@sp4ce sp4ce self-assigned this Apr 3, 2023
@sp4ce sp4ce changed the title New content-type convention suggestion Official MIME content-type support Apr 11, 2023
@sp4ce sp4ce changed the title Official MIME content-type support Standard MIME content-type Apr 11, 2023
@frederikb
Copy link

Hi @sp4ce, good to see that someone is leading the way to an actual RFC!

I've noticied that AWS is (apparently) using JSON Lines for one of their products. I haven't seen a description of the actual output to know whether or not it is compatible with JSON Lines. In any case they are using the mime type application/jsonlines. Thoughts on application/jsonl vs. that one?

@tim-hitchins-ekkosense
Copy link

AWS Claim it's compatible with JSON Lines - it links to the JSON Lines homepage

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/S3DataExport.Output.html

dennisreimann added a commit to dennisreimann/btcpayserver that referenced this issue Nov 19, 2023
There's an [ongoing discussion](wardi/jsonlines#19) about what the MIME type for [JSONL](https://jsonlines.org/) files should be. Making it `application/jsonl` leads to the file being downloaded according to my testing, which prevents browsers from opening them in a new window and parsing them as JSON, which fixes btcpayserver#5488.
NicolasDorier pushed a commit to btcpayserver/btcpayserver that referenced this issue Nov 20, 2023
There's an [ongoing discussion](wardi/jsonlines#19) about what the MIME type for [JSONL](https://jsonlines.org/) files should be. Making it `application/jsonl` leads to the file being downloaded according to my testing, which prevents browsers from opening them in a new window and parsing them as JSON, which fixes #5488.
@dwaite
Copy link

dwaite commented Feb 25, 2024

If there's still interest in doing this, I would recommend an informational track internet-draft (I-D) to describe the jsonlines specification, with an IANA considerations section registering the media type. The idea is that drafts work towards RFCs work towards standards on a long evolutionary track of internet draft to RFC, and potentially to being an internet standard.

IETF wants to deal with immutable and permanently available documents, so you will likely need represent the encoding and parsing requirements authoritatively within the I-D itself, using IETF nomenclature. There's a lot of references to this available, and the JSON Text Sequences RFC is likely an excellent example.

I suspect there will be feedback that some areas are not needed. For example, your UTF-8 encoding rule does not have much left to it once you reference the JSON RFC. That RFC already mandates UTF-8 for everything other than closed ecosystems.At that point, you have to decide whether the application "advice" that they might want to escape the string to work on ASCII transports becomes something you might want to represent as an application note on the jsonlines site, and a discussion you have with the IETF more broadly - after all, it would also affect JSON and json sequence data over such transports.

Conversely, you may want to be quite a bit more specific for the sake of interoperability, such as whether applications MUST be able to consume \r\n line separators, and what application behavior is mandated/desired if invalid JSON text (including things like lines of just whitespace) are encountered within a stream. Variance in behaviors have led to a lot of security issues - imagine if your security compliance or logging components stopped reading a JSON lines sequence at a newline, while your application logic ignored the blank line and kept going.

@finwo
Copy link

finwo commented Feb 26, 2024

What's wrong with what ndjson is trying to implement? Their current standard is application/x-ndjson, which will likely move to application/ndjson in the future when there's more adoption.

https://bugzilla.mozilla.org/show_bug.cgi?id=1603986

@dwaite
Copy link

dwaite commented Feb 26, 2024

The x- prefix on a subtype is intended only for private use, e.g. for types with no expectation of interoperability between implementations. In that sense, your application/x-ndjson may conflict with other people's application/x-ndjson, such as presence or absence of a leading [ or of trailing ,, or even someone deciding they might as well send it in Big5 rather than UTF-8.

The lack of an immutable standard (like a RFC with a number) means that ndjson three years from now may make changes along lines like these for robustness, but implementations do not have a clear way to explain what they are compatible with.

There are plenty of commercial products which use vendor and x-prefixed media types, and which do not attempt to define fixed/robust/interoperable behavior. It is a matter of what this project is going for, which is why my first words were "If there's still interest in doing this".

In terms of ramifications, most SDOs (standard defining organizations) won't touch dependencies which do not have these and other formalisms, and may use things like publication in another SDO (like IETF) as a sign of that. That means ndjson/jsonlines may be used in public facing API, but a large category of interoperable standards work either wouldn't touch it, or will standardize their own similar effort.

@tim-hitchins-ekkosense
Copy link

which will likely move to application/ndjson in the future when there's more adoption

Well that's the problem, it might happen, at some point in the future. Given the usage of JSON lines in various commercial products, we're suggesting we do that formalisation now - or at least start the process very soon!

@wardi
Copy link
Owner

wardi commented Feb 26, 2024

I'd love to see this.

So do we copy-paste JSON-SEQ https://datatracker.ietf.org/doc/html/rfc7464 without the "ASCII Record Separator (0x1E)"? JSON-SEQ discusses detecting truncated records and continuing a fair bit, all of that could be removed in a new RFC.

Conversely, you may want to be quite a bit more specific for the sake of interoperability, such as whether applications MUST be able to consume \r\n line separators, and what application behavior is mandated/desired if invalid JSON text (including things like lines of just whitespace) are encountered within a stream. Variance in behaviors have led to a lot of security issues - imagine if your security compliance or logging components stopped reading a JSON lines sequence at a newline, while your application logic ignored the blank line and kept going.

Rule 3 in https://jsonlines.org/ mentions that a compliant parser will be able to consume \r\n because \r is ignored as surrounding whitespace by a json parser. Doesn't hurt to repeat it though.

Lines of only whitespace are already invalid by rule 2 in https://jsonlines.org/ , but again it doesn't hurt to make this clear.

To be specific let's say that any line that doesn't parse as valid JSON should be treated as an invalid record but still counts as a record for the purpose of numbering the lines.

@GabenGar
Copy link

Should it count as a record? The whole point of something called JSON Lines is that it stores lines of a well defined format called JSON, not arbitrary character sequences. Depending on the nature on malformed data in a line it might as well make all other lines after it invalid and blow up logs with parsing errors noise when the offender is a single line (a whole file).

@timtjtim
Copy link

So do we copy-paste JSON-SEQ

I think RFCs are copyrighted so to copy paste you would need permission of the original author

@whlavina
Copy link

I'm glad to see continued discussion and forward movement. It's interesting to see that YAML just recently (this month) gained IANA media type registration... 22 years after the format was first created. If YAML can do it, JSON Lines can, too! If there's any need for help with the process, maybe we could ask the folks who pushed the YAML RFC?

@tim-hitchins-ekkosense
Copy link

Here's the guidelines on how to write an Internet Draft

https://authors.ietf.org/en/home

@darrelmiller
Copy link

@whlavina You folks are welcome to come join the HTTPAPI mailing list https://datatracker.ietf.org/wg/httpapi/about/ and we can chat about a path to registering this media type. This is where the YAML media type registration RFC was created and we are working towards the OpenAPI one also.

There is ongoing discussion about allowing mediatype registrations to happen in the standards tree without necessarily going through the process of writing an RFC for the format. https://www.ietf.org/archive/id/draft-ietf-mediaman-standards-tree-00.html Although, this format might be simple enough that an RFC would straightforward.

@ferdnyc
Copy link

ferdnyc commented Jul 18, 2024

There is ongoing discussion about allowing mediatype registrations to happen in the standards tree without necessarily going through the process of writing an RFC for the format. https://www.ietf.org/archive/id/draft-ietf-mediaman-standards-tree-00.html Although, this format might be simple enough that an RFC would straightforward.

As of last month, that (expired) draft is replaced by https://www.ietf.org/archive/id/draft-ietf-mediaman-standards-tree-01.html

@finom
Copy link

finom commented Feb 16, 2025

JSON lines is a perfect solution for JSON streaming that is a common task today for features that use LLMs. I attempted to use application/jsonl but, unfortunately, if I open an endpoint URL with the browser, the content of the endpoint is being downloaded instead of showing up as text (as application/json does). I've spent significant amount of time figuring that out, and for a while the best solution was to use text/plain content-type with a custom header, like x-format=jsonlines.

I didn't enjoy this solution because I wanted the content-type represent the actual content type of the response and came up with an idea to use text/plain with a custom attribute format.

text/plain; format=jsonlines

At this case, the response is interpreted as text and isn't being downloaded, but also the actual type of the content can be read from the content-type header.

It looks kinda right and kinda wrong at the same time, but since we don't have any standard header yet, this can be used as a solid temporary solution. Let me know what you think.

@GabenGar
Copy link

format is not a valid Content-Type parameter, simple as.

@finom
Copy link

finom commented Feb 16, 2025

@GabenGar I agree, but what would be a good alternative for self-described content type of JSON lines that can be read by any client that doesn't support the proposed one? Maybe the parameter should be prefixed with x-?

text/plain; x-format=jsonlines

@GabenGar
Copy link

For what purpose do you want to violate the http spec?

@finom
Copy link

finom commented Feb 16, 2025

For what purpose do you want to violate the http spec?

Do I? Is it directly forbidden to use custom params?

@GabenGar
Copy link

You didn't answer the question.

@timtjtim
Copy link

timtjtim commented Feb 16, 2025

You didn't answer the question.

The question makes no sense. Clearly @finom doesn't want to violate any HTTP specifications - which their suggestion doesn't.

See the actual specification, rather than just the developer reference: https://httpwg.org/specs/rfc9110.html#field.content-type

There's no exhaustive list of valid parameters, so there's no rule again using format, and no requirement for any parameter that isn't explicitly documented to be prefixed with x-.

@dwaite
Copy link

dwaite commented Feb 16, 2025

JSON lines is a perfect solution for JSON streaming that is a common task today for features that use LLMs. I attempted to use application/jsonl but, unfortunately, if I open an endpoint URL with the browser, the content of the endpoint is being downloaded instead of showing up as text (as application/json does). I've spent significant amount of time figuring that out, and for a while the best solution was to use text/plain content-type with a custom header, like x-format=jsonlines.

I didn't enjoy this solution because I wanted the content-type represent the actual content type of the response and came up with an idea to use text/plain with a custom attribute format.

text/plain; format=jsonlines

At this case, the response is interpreted as text and isn't being downloaded, but also the actual type of the content can be read from the content-type header.

It looks kinda right and kinda wrong at the same time, but since we don't have any standard header yet, this can be used as a solid temporary solution. Let me know what you think.

You really should not use the text top level type unless your intention is that an end user (not developer) would be able to read the data without tooling, for this reason (you get end-user display of raw data by browsers when there is no system registered tooling)

I’d recommend if this is an API endpoint, instead leverage the Accepts header. If the http client does not prefer application/jsonlines over text/plain, they get a plaintext version with text/plain media type. You can then disable the text/plain accepts handler in production.

format is not a valid Content-Type parameter, simple as.

Format is a valid media type parameter for text/plain (RFC 2646), but it dictates things like newline interpretation. Using it to say “no I really mean this other media type” is certainly not its intended use or how media types are intended to work. Having an X-format doesn’t really make that better - X is really meant for indicating a parameter or type may have conflicts in independent usage due to not being registered, not “I want to do something entirely different than the thing with this unprefixed name”. The latter is creating something knowingly confusing, even before you get to this idea being counter to the purpose of media types

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

17 participants