Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

badly formatted logs on stderr #896

Closed
tmszdmsk opened this issue Jun 29, 2018 · 11 comments
Closed

badly formatted logs on stderr #896

tmszdmsk opened this issue Jun 29, 2018 · 11 comments

Comments

@tmszdmsk
Copy link
Contributor

Requirement - what kind of business use case are you trying to solve?

Logs printed on stderr are not properly formatted. It slows debugging problems with jaeger-collector

Problem - what in Jaeger blocks you from solving the requirement?

Our jaeger-collector is deployed on Kubernetes. Logs from stdout/stderr are automatically collected and pushed into ELK stack. Because of badly formatted JSON Logstash is unable to parse the logs.

example (a little bit anonymised) kubectl logs output from jaeger-collector pod.

{\"traceID\":\"11a4ea9bddefe82\",\"spanID\":\"1e9c810f2c768990\",\"flags\":1,\"operationName\”:\”***\”,\”references\":[{\"refType\":\"CHILD_OF\",\"traceID\":\"11a4ea9bddefe82\",\"spanID\":\"11a4ea9bddefe82\"}],\"startTime\":1530235009818889,\"duration\":884,\"tags\":[{\"key\":\"span.kind\",\"type\":\"string\",\"value\":\"client\"},{\"key\”:\”***\”,\”type\":\"string\",\"value\”:\”***\”},{\”key\":\"backend.name\",\"type\":\"string\",\"value\":\"***\"},{\"key\":\"http.method\",\"type\":\"string\",\"value\":\"GET\"},{\"key\":\"http.url\",\"type\":\"string\",\"value\":\"***\"},{\"key\":\"http.host\",\"type\":\"string\",\"value\":\"***\"},{\"key\":\"http.status_code\",\"type\":\"int64\",\"value\":\"200\"}],\"logs\":[],\"processID\":\"\",\"process\":{\"serviceName\":\"***\",\"tags\":[{\"key\":\"jaeger.version\",\"type\":\"string\",\"value\":\"Go-2.9.0\"},{\"key\":\"hostname\",\"type\":\"string\",\"value\":\"***\"},{\"key\":\"ip\",\"type\":\"string\",\"value\":\"***\"}]},\"warnings\":null,\"startTimeMillis\":1530235009818}\n

As you can see the JSON is escaped so that it is not interpreted as a JSON.

jaeger-collector v1.5.0

Proposal - what do you suggest to solve the problem or improve the existing situation?

Properly formatted JSON on stderr

Any open questions to address

N/A

@jpkrohling
Copy link
Contributor

@tmszdmsk is that an output from kubectl logs? I did a quick run here with minikube and I seem to get proper JSON from kubectl logs:

$ kubectl logs -f jaeger-deployment-8648cb84d5-t82zv
{"level":"info","ts":1530272508.2622733,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":14269,"status":"unavailable"}
{"level":"info","ts":1530272508.262668,"caller":"memory/factory.go:55","msg":"Memory storage configuration","configuration":{"MaxTraces":0}}
{"level":"info","ts":1530272508.264795,"caller":"tchannel/builder.go:89","msg":"Enabling service discovery","service":"jaeger-collector"}
{"level":"info","ts":1530272508.2648525,"caller":"peerlistmgr/peer_list_mgr.go:111","msg":"Registering active peer","peer":"127.0.0.1:14267"}
{"level":"info","ts":1530272508.2654924,"caller":"standalone/main.go:187","msg":"Starting agent"}
{"level":"info","ts":1530272508.2661352,"caller":"standalone/main.go:227","msg":"Starting jaeger-collector TChannel server","port":14267}
{"level":"info","ts":1530272508.266198,"caller":"standalone/main.go:237","msg":"Starting jaeger-collector HTTP server","http-port":14268}
{"level":"info","ts":1530272508.2684326,"caller":"standalone/main.go:298","msg":"Registering metrics handler with jaeger-query HTTP server","route":"/metrics"}
{"level":"info","ts":1530272508.2684631,"caller":"standalone/main.go:304","msg":"Starting jaeger-query HTTP server","port":16686}
{"level":"info","ts":1530272508.2684717,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1530272508.2973397,"caller":"standalone/main.go:257","msg":"Listening for Zipkin HTTP traffic","zipkin.http-port":9411}
{"level":"info","ts":1530272509.265548,"caller":"peerlistmgr/peer_list_mgr.go:157","msg":"Not enough connected peers","connected":0,"required":1}
{"level":"info","ts":1530272509.2657406,"caller":"peerlistmgr/peer_list_mgr.go:166","msg":"Trying to connect to peer","host:port":"127.0.0.1:14267"}
{"level":"info","ts":1530272509.267476,"caller":"peerlistmgr/peer_list_mgr.go:176","msg":"Connected to peer","host:port":"[::]:14267"}

@jpkrohling jpkrohling added the bug label Jun 29, 2018
@jpkrohling
Copy link
Contributor

By the way: this would probably belong to https://github.com/jaegertracing/jaeger-kubernetes instead

@tmszdmsk
Copy link
Contributor Author

Yep, it is raw output from kubectl logs.
After pod is restarted, It shows output similar to yours (properly formatted):

{"level":"info","ts":1530272219.8962371,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}

It looks like this is a problem with logging problems on stderr then

@jpkrohling
Copy link
Contributor

I just noticed that your logs are showing the contents of a span. Are you able to get more context for that log entry?

@tmszdmsk
Copy link
Contributor Author

tmszdmsk commented Jun 29, 2018

Ok, I think I understand what's going on. Looks like this is one "uber long" (:trollface:) log entry that was split by Logstash into multiple logs because of some length limit. What is more interesting number of logs increases over time:
logs increasing over time
This probably means that the log contents are somehow accumulated by the collector and each time he logs the message it is split into more and more pieces (because its length grows).

So, looks like the problem is the length of the JSON document that is printed on stderr. I don't know if there is any "sane limit" for log length in the industry so you can either close this issue or try to introduce some.

@jpkrohling
Copy link
Contributor

Given that the contents of a span is being logged, I would guess, without looking at the code, that the Elasticsearch storage plugin is logging out its buffer in case of failures. Are you able to dig the first of such message?

Bonus points for the uber pun :-)

@tmszdmsk
Copy link
Contributor Author

I've been able to find sth like this:

github.com/jaegertracing/jaeger/pkg/es/config.(*Configuration).NewClient.func2
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/es/config/config.go:90
github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*bulkWorker).commit
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/bulk_processor.go:506
github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*bulkWorker).work
	/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/bulk_processor.go:455

which suggests #779 but in context of this issue we should maybe consider limiting the size of the output (or close it and fix original issue)

@yurishkuro
Copy link
Member

I don't see what the issue is. The logs are in JSON format by default, containing fields like timestamp, level, caller. When ES error happens, it is logged with:

logger.Error("Elasticsearch could not process bulk request", zap.Error(err),
    zap.Any("response", response), zap.String("requests", buffer.String()))

Here since request and response are themselves JSON strings, in order to fit in the overall log format they have to be escaped.

@yurishkuro yurishkuro removed the bug label Jun 29, 2018
@tmszdmsk
Copy link
Contributor Author

@yurishkuro yep, the issue is different than I originally described. It turned out that Elasticsearch backend logs the whole buffer of pending messages. The log is properly formatted but one of the fields is enormous to the extent that commonly used logging infrastructure (ELK) decides to split the message into many, smaller, but unparsable logs.

The question in the context of this bug is: should we change how this particular error is logged so that the whole buffer isn't pushed to stderr. If not, then feel free to close this issue.

@yurishkuro
Copy link
Member

I think it would be sufficient to only log the ES response, assuming it won't include the input data. Would you like to create a pull request?

@tmszdmsk
Copy link
Contributor Author

Sure, I was just setting up a dev env ;)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants