badly formatted logs on stderr #896

tmszdmsk · 2018-06-29T10:58:53Z

Requirement - what kind of business use case are you trying to solve?

Logs printed on stderr are not properly formatted. It slows debugging problems with jaeger-collector

Problem - what in Jaeger blocks you from solving the requirement?

Our jaeger-collector is deployed on Kubernetes. Logs from stdout/stderr are automatically collected and pushed into ELK stack. Because of badly formatted JSON Logstash is unable to parse the logs.

example (a little bit anonymised) kubectl logs output from jaeger-collector pod.

{\"traceID\":\"11a4ea9bddefe82\",\"spanID\":\"1e9c810f2c768990\",\"flags\":1,\"operationName\”:\”***\”,\”references\":[{\"refType\":\"CHILD_OF\",\"traceID\":\"11a4ea9bddefe82\",\"spanID\":\"11a4ea9bddefe82\"}],\"startTime\":1530235009818889,\"duration\":884,\"tags\":[{\"key\":\"span.kind\",\"type\":\"string\",\"value\":\"client\"},{\"key\”:\”***\”,\”type\":\"string\",\"value\”:\”***\”},{\”key\":\"backend.name\",\"type\":\"string\",\"value\":\"***\"},{\"key\":\"http.method\",\"type\":\"string\",\"value\":\"GET\"},{\"key\":\"http.url\",\"type\":\"string\",\"value\":\"***\"},{\"key\":\"http.host\",\"type\":\"string\",\"value\":\"***\"},{\"key\":\"http.status_code\",\"type\":\"int64\",\"value\":\"200\"}],\"logs\":[],\"processID\":\"\",\"process\":{\"serviceName\":\"***\",\"tags\":[{\"key\":\"jaeger.version\",\"type\":\"string\",\"value\":\"Go-2.9.0\"},{\"key\":\"hostname\",\"type\":\"string\",\"value\":\"***\"},{\"key\":\"ip\",\"type\":\"string\",\"value\":\"***\"}]},\"warnings\":null,\"startTimeMillis\":1530235009818}\n

As you can see the JSON is escaped so that it is not interpreted as a JSON.

jaeger-collector v1.5.0

Proposal - what do you suggest to solve the problem or improve the existing situation?

Properly formatted JSON on stderr

Any open questions to address

N/A

The text was updated successfully, but these errors were encountered:

jpkrohling · 2018-06-29T11:43:00Z

@tmszdmsk is that an output from kubectl logs? I did a quick run here with minikube and I seem to get proper JSON from kubectl logs:

$ kubectl logs -f jaeger-deployment-8648cb84d5-t82zv
{"level":"info","ts":1530272508.2622733,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":14269,"status":"unavailable"}
{"level":"info","ts":1530272508.262668,"caller":"memory/factory.go:55","msg":"Memory storage configuration","configuration":{"MaxTraces":0}}
{"level":"info","ts":1530272508.264795,"caller":"tchannel/builder.go:89","msg":"Enabling service discovery","service":"jaeger-collector"}
{"level":"info","ts":1530272508.2648525,"caller":"peerlistmgr/peer_list_mgr.go:111","msg":"Registering active peer","peer":"127.0.0.1:14267"}
{"level":"info","ts":1530272508.2654924,"caller":"standalone/main.go:187","msg":"Starting agent"}
{"level":"info","ts":1530272508.2661352,"caller":"standalone/main.go:227","msg":"Starting jaeger-collector TChannel server","port":14267}
{"level":"info","ts":1530272508.266198,"caller":"standalone/main.go:237","msg":"Starting jaeger-collector HTTP server","http-port":14268}
{"level":"info","ts":1530272508.2684326,"caller":"standalone/main.go:298","msg":"Registering metrics handler with jaeger-query HTTP server","route":"/metrics"}
{"level":"info","ts":1530272508.2684631,"caller":"standalone/main.go:304","msg":"Starting jaeger-query HTTP server","port":16686}
{"level":"info","ts":1530272508.2684717,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1530272508.2973397,"caller":"standalone/main.go:257","msg":"Listening for Zipkin HTTP traffic","zipkin.http-port":9411}
{"level":"info","ts":1530272509.265548,"caller":"peerlistmgr/peer_list_mgr.go:157","msg":"Not enough connected peers","connected":0,"required":1}
{"level":"info","ts":1530272509.2657406,"caller":"peerlistmgr/peer_list_mgr.go:166","msg":"Trying to connect to peer","host:port":"127.0.0.1:14267"}
{"level":"info","ts":1530272509.267476,"caller":"peerlistmgr/peer_list_mgr.go:176","msg":"Connected to peer","host:port":"[::]:14267"}

jpkrohling · 2018-06-29T11:43:53Z

By the way: this would probably belong to https://github.com/jaegertracing/jaeger-kubernetes instead

tmszdmsk · 2018-06-29T11:47:13Z

Yep, it is raw output from kubectl logs.
After pod is restarted, It shows output similar to yours (properly formatted):

{"level":"info","ts":1530272219.8962371,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}

It looks like this is a problem with logging problems on stderr then

jpkrohling · 2018-06-29T11:50:05Z

I just noticed that your logs are showing the contents of a span. Are you able to get more context for that log entry?

tmszdmsk · 2018-06-29T12:09:10Z

Ok, I think I understand what's going on. Looks like this is one "uber long" (:trollface:) log entry that was split by Logstash into multiple logs because of some length limit. What is more interesting number of logs increases over time:

This probably means that the log contents are somehow accumulated by the collector and each time he logs the message it is split into more and more pieces (because its length grows).

So, looks like the problem is the length of the JSON document that is printed on stderr. I don't know if there is any "sane limit" for log length in the industry so you can either close this issue or try to introduce some.

jpkrohling · 2018-06-29T12:24:11Z

Given that the contents of a span is being logged, I would guess, without looking at the code, that the Elasticsearch storage plugin is logging out its buffer in case of failures. Are you able to dig the first of such message?

Bonus points for the uber pun :-)

tmszdmsk · 2018-06-29T12:49:35Z

I've been able to find sth like this:

github.com/jaegertracing/jaeger/pkg/es/config.(*Configuration).NewClient.func2
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/es/config/config.go:90
github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*bulkWorker).commit
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/bulk_processor.go:506
github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*bulkWorker).work
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/bulk_processor.go:455

which suggests #779 but in context of this issue we should maybe consider limiting the size of the output (or close it and fix original issue)

yurishkuro · 2018-06-29T18:18:57Z

I don't see what the issue is. The logs are in JSON format by default, containing fields like timestamp, level, caller. When ES error happens, it is logged with:

logger.Error("Elasticsearch could not process bulk request", zap.Error(err),
    zap.Any("response", response), zap.String("requests", buffer.String()))

Here since request and response are themselves JSON strings, in order to fit in the overall log format they have to be escaped.

tmszdmsk · 2018-06-30T09:31:08Z

@yurishkuro yep, the issue is different than I originally described. It turned out that Elasticsearch backend logs the whole buffer of pending messages. The log is properly formatted but one of the fields is enormous to the extent that commonly used logging infrastructure (ELK) decides to split the message into many, smaller, but unparsable logs.

The question in the context of this bug is: should we change how this particular error is logged so that the whole buffer isn't pushed to stderr. If not, then feel free to close this issue.

yurishkuro · 2018-06-30T19:11:14Z

I think it would be sufficient to only log the ES response, assuming it won't include the input data. Would you like to create a pull request?

tmszdmsk · 2018-06-30T19:19:04Z

Sure, I was just setting up a dev env ;)

tmszdmsk mentioned this issue Jun 29, 2018

Jaeger-collector stops storing traces #897

Closed

jpkrohling added the bug label Jun 29, 2018

yurishkuro removed the bug label Jun 29, 2018

tmszdmsk mentioned this issue Jun 30, 2018

ES storage: not logging requests on error #901

Merged

yurishkuro closed this as completed in #901 Jun 30, 2018

tmszdmsk mentioned this issue Jun 30, 2018

ES storage: logging number of total and failed requests #902

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

badly formatted logs on stderr #896

badly formatted logs on stderr #896

tmszdmsk commented Jun 29, 2018

jpkrohling commented Jun 29, 2018

jpkrohling commented Jun 29, 2018

tmszdmsk commented Jun 29, 2018

jpkrohling commented Jun 29, 2018

tmszdmsk commented Jun 29, 2018 •

edited

Loading

jpkrohling commented Jun 29, 2018

tmszdmsk commented Jun 29, 2018

yurishkuro commented Jun 29, 2018

tmszdmsk commented Jun 30, 2018

yurishkuro commented Jun 30, 2018

tmszdmsk commented Jun 30, 2018

badly formatted logs on stderr #896

badly formatted logs on stderr #896

Comments

tmszdmsk commented Jun 29, 2018

Requirement - what kind of business use case are you trying to solve?

Problem - what in Jaeger blocks you from solving the requirement?

Proposal - what do you suggest to solve the problem or improve the existing situation?

Any open questions to address

jpkrohling commented Jun 29, 2018

jpkrohling commented Jun 29, 2018

tmszdmsk commented Jun 29, 2018

jpkrohling commented Jun 29, 2018

tmszdmsk commented Jun 29, 2018 • edited Loading

jpkrohling commented Jun 29, 2018

tmszdmsk commented Jun 29, 2018

yurishkuro commented Jun 29, 2018

tmszdmsk commented Jun 30, 2018

yurishkuro commented Jun 30, 2018

tmszdmsk commented Jun 30, 2018

tmszdmsk commented Jun 29, 2018 •

edited

Loading