Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
<!-- Describe what has changed in this PR --> **What changed?** I upgraded our zap version from v1.24.0 to v1.26.0, which contains support for pkg/errors. See [this issue](uber-go/zap#303) and [this commit](uber-go/zap@5fc2db7). <img width="448" alt="image" src="https://github.com/temporalio/temporal/assets/5942963/7d15d91a-27d3-45f6-9628-30bdcac38771"> <!-- Tell your future self why have you made these changes --> **Why?** Before this change, our logs would only contain the stack trace from where the logger itself was invoked, not from the source of where the error was generated or wrapped. This provided very little useful information. <!-- How have you verified this change? Tested locally? Added a unit test? Checked in staging env? --> **How did you test it?** I ran a custom [build of server](https://gist.github.com/MichaelSnowden/c649dfd1efeb92f10bc72a040a792a8d) which overwrote some deep code in history to return an error. I then set up docker-compose to output logs -> promtail -> loki -> grafana. Then, I queried Grafana to verify that the error log contained an "errorVerbose" field with the stack trace from where my error was generated. As you can see from the below image, the stack trace does appear under this field, and if you turn on JSON parsing and newline escaping, you can both see it rendered correctly, and you can copy-paste the stack trace. <img width="983" alt="image" src="https://github.com/temporalio/temporal/assets/5942963/cb9b0c83-b146-4060-9eac-3ccf9b807657"> <img width="683" alt="image" src="https://github.com/temporalio/temporal/assets/5942963/6fa28f2f-5d04-47ae-b8a3-7512c3dd85e7"> The stack trace from Grafana: ``` oopsie woopsie main.(*faultyShardEngine).StartWorkflowExecution /Users/mikey/src/temporalio/temporal/.scratches/main.go:39 go.temporal.io/server/service/history.(*Handler).StartWorkflowExecution /Users/mikey/src/temporalio/temporal/service/history/handler.go:595 go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler.func1 /Users/mikey/src/temporalio/temporal/api/historyservice/v1/service.pb.go:1300 ... ``` <!-- Assuming the worst case, what can be broken when deploying this change to production? --> **Potential risks** The stack traces are pretty deep because of all our gRPC interceptors. However, we can definitely fix that later if we want by filtering the `pkg/errors.StackTrace`. I'd rather do that in a follow-up after getting support for this initial change first, though. <!-- Is this PR a hotfix candidate or require that a notification be sent to the broader community? (Yes/No) --> **Is hotfix candidate?** No.
- Loading branch information