jtoh stands for JSON To Human, basically makes it easier to analyze long streams of JSON objects. The main use case is to analyze structured logs from Kubernetes and GCP stack driver. But it will work with any long list/stream of JSON objects.
There is some good tools to parse JSON, like jq, which I usually use. But my problem involved processing long lists of JSON documents, like this (but much bigger):
[
{"Name": "Ed", "Text": "Knock knock."},
{"Name": "Sam", "Text": "Who's there?"},
{"Name": "Ed", "Text": "Go fmt."},
{"Name": "Sam", "Text": "Go fmt who?"},
{"Name": "Ed", "Text": "Go fmt yourself!"}
]
And jq by default does no stream processing, and the stream mode is not exactly what I want as can be seen on the docs and on this post. To be honest I can't even understand the documentation on how jq streaming works, so even if it is useful for some scenarios it is beyond me to understand it properly (and what I read on the blog post does not sound like fun).
The behavior that I wanted is the exact same behavior as Go's json.Decoder.Decode, which is to handle JSON lists as an incremental decoding of each JSON document inside the list, done in a streaming fashion, hence this tool was built (and using Go =P). But it is NOT a replacement for jq with streaming capabilities because it focuses on just projecting a few fields from JSON documents in a newline oriented fashion, there is no filtering or any advanced features and it probably won't handle well complex scenarios, it is meant for long lists of JSON objects or long streams of JSON objects.
To install it you will need Go >= 1.13. You can clone the repository and run:
make install
Or you can just run:
go install github.com/madlambda/jtoh/cmd/jtoh@latest
jtoh will produce a newline for each JSON document found on the list/stream, accepting a selector string as a parameter indicating which fields are going to be used to compose each newline and what is the separator between each field:
<source of JSON list> | jtoh "<sep>field1<sep>field2<sep>field3.name"
Where is the first character and will be considered the separator, it is used to separate different field selectors and will also be used as the separator on the output, this:
<source of JSON list> | jtoh ":field1:field2"
Will generate an stream of outputs like this:
data1:data2
data1:data2
A more hands on example, lets say you are getting the logs for a specific application on GCP like this:
gcloud logging read --format=json --project <your project> "severity>=WARNING AND resource.labels.container_name=myapp"
You will probably have a long list of something like this:
{
"insertId": "h3wh26neb0mcbkeou",
"labels": {
"k8s-pod/app": "myapp",
"k8s-pod/pod-template-hash": "56d4fdf46d"
},
"logName": "projects/a2b-exp/logs/stderr",
"receiveTimestamp": "2020-07-14T13:18:40.681669783Z",
"resource": {
"labels": {
"cluster_name": "k8s-cluster",
"container_name": "myapp",
"location": "europe-west3-a",
"namespace_name": "default",
"pod_name": "kraken-56d4fdf46d-f9trn",
"project_id": "someproject"
},
"type": "k8s_container"
},
"severity": "ERROR",
"textPayload": "cool log message",
"timestamp": "2020-07-14T13:18:38.741851348Z"
}
In this case the application does no JSON structured logging, there is a lot of data around the actual application log that can be useful for filtering but after being used for filtering it is pure cognitive noise.
Using jtoh like this:
gcloud logging read --format=json --project <your project> "severity>=WARNING AND resource.labels.container_name=myapp" | jtoh :timestamp:textPayload
You now get a stream of lines like this:
2020-07-14T13:18:38.741851348Z:cool log message
The exact same thing is possible with the stream of JSON objects you get when the application structure the log entries as JSON and you get the logs directly from Kubernetes using kubectl like this:
TODO: Kubernetes examples :-)
One thing that makes jtoh very different than usual JSON parsing tools is how it handles errors. Anything that is not JSON will be just echoed back and it will keep trying to parse the rest of the data.
The idea is to cover scenarios where application have hybrid logs, where sometimes it is JSON and sometimes it is just a stack trace or something else. These scenarios are not ideal, the software should be fixed, but life is not ideal, so if you are in this situation jtoh may help you analyze the logs :-) (and hopefully in time you will also fix the logs so they become uniform/consistent).