Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix: Use defaultCallback in LoggingBunyan class #601

Merged
merged 2 commits into from
Mar 9, 2022

Conversation

losalex
Copy link
Contributor

@losalex losalex commented Mar 9, 2022

Error handling with a default global callback has an issue - bunyan library always uses default callback, so LoggingBunyan need to override it with default callback when provided.

Fixes #<595> 🦕

@losalex losalex requested review from a team as code owners March 9, 2022 04:25
@product-auto-label product-auto-label bot added the size: xs Pull request size is extra small. label Mar 9, 2022
@product-auto-label product-auto-label bot added the api: logging Issues related to the googleapis/nodejs-logging-bunyan API. label Mar 9, 2022
@losalex losalex assigned losalex and unassigned arbrown Mar 9, 2022
@losalex losalex merged commit f4c01ab into main Mar 9, 2022
@losalex losalex deleted the losalex/fix-595-correction branch March 9, 2022 16:46
@losalex losalex added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Mar 9, 2022
@Taaanos
Copy link

Taaanos commented Mar 14, 2022

Even after upgrading to v3.2.2 the defaultCallback is not respected and the app crashes, interestingly sometimes it works. I observed that we get those errors at the same timestamp across our services.
image

@losalex
Copy link
Contributor Author

losalex commented Mar 14, 2022

Even after upgrading to v3.2.2 the defaultCallback is not respected and the app crashes, interestingly sometimes it works. I observed that we get those errors at the same timestamp across our services. image

Thanks @Taaanos for trying this out! I wonder if there is any chance you can figure out what are the cases causing the crash? Based on what you said, seems sometimes the defaultCallback helps - having some details on what causing crash would help to find a root cause.

@Taaanos
Copy link

Taaanos commented Mar 14, 2022

@losalex It's not clear what's causing the crash, "code 4" is all I get. This is the bigger picture.
image

The following screenshot is from a different service, notice the timestamp across the screenshots I posted ("2022-03-14 16:28:34.263 CET"). We get the error at the same time (we log quite often).
image
if it matters we get these on GKE and CloudRun across multiple services.

@losalex
Copy link
Contributor Author

losalex commented Mar 15, 2022

Thank you @Taaanos! Do you have at least some clue what is a rate of crashes vs. handled errors?

@Taaanos
Copy link

Taaanos commented Mar 15, 2022

@losalex actually it's only on 1/4 services that the error is handled with a rate of 75% handled. The other 3 are consistently crashing. An important point is that at those 3 services, we have our logger built in a separate module that is being used as an external dep. I can see in those 3 services, in yarn.lock, that v3.2.2 is used.

@Taaanos
Copy link

Taaanos commented Mar 16, 2022

@losalex The issue regarding the inconsistency of the handling was due to two loggers being used and only one had a defaultCallback. I will write you soon again with results.

@Taaanos
Copy link

Taaanos commented Mar 16, 2022

There are two very important observations regarding v3.2.2.

  • Our logs were severely reduced (90% down)
  • We start noticing "Heap out of memory" (see screenshot with errors: "FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory")

We released v3.2.2 on March 14th around 16:30 CET.

image
Screenshot 2022-03-16 at 16 03 40

We have now reverted back to v3.2.1. I will report back.

@losalex
Copy link
Contributor Author

losalex commented Mar 17, 2022

Thank you @Taaanos for update and really sorry for an issue you have with 3.2.2 version.
While I am looking into the issue, you can handle timeout errors with LoggingBunyan error handling mechanism by adding logger.on event handler as shown in example below and which can be used with 3.2.1:

// Imports the Google Cloud client library for Bunyan
const {LoggingBunyan} = require('@google-cloud/logging-bunyan');
// Creates a client
const loggingBunyan = new LoggingBunyan({
  projectId: 'your-project-id',
  keyFilename: '/path/to/key.json',
});

// Add event handler for errors
logger.on("error", (err:any) => console.log("Some error " + err + " !!!"));

@Taaanos
Copy link

Taaanos commented Mar 17, 2022

@losalex thank you. It seems that the defaultCallback in v3.2.1 works for us. We log with the rate that we used too, and no memory leaks were observed again.

@losalex
Copy link
Contributor Author

losalex commented Mar 17, 2022

@Taaanos , thanks for update! I am sorry for all the inconvenience, but apparently I have a doubt that defaultCallback in v3.2.1 working properly - thats also a reason why I provided a sample which leverages LoggingBunyan error handling before.
Please let me know if you see that timeout errors are handled correctly in v3.2.1 when using defaultCallback and if not, consider adopting the LoggingBunyan error handling. I am working now to resolve an issue with defaultCallback.

@losalex
Copy link
Contributor Author

losalex commented Mar 18, 2022

@Taaanos , I believe that 4.2.0 release is going to fix the issue with defaultCallback. Sorry again for any inconvenience.

@Taaanos
Copy link

Taaanos commented Mar 20, 2022

@losalex, thanks for your comments. It's been 3 days on v3.2.1 and our services look good, errors are handled (still with "code 4"), no memory leak is observed. v4.2.0 already?

@losalex
Copy link
Contributor Author

losalex commented Mar 22, 2022

@Taaanos , sorry for some confusion - the version which has a latest fix is 3.3.0 and it is ready. Please let me know if you encounter any issues - really appreciate your feedback!

@gnardini
Copy link

gnardini commented Jul 31, 2023

Leaving this in case anyone in the future runs into this error.

I see occasional errors "GoogleError: Total timeout of API google.logging.v2.LoggingServiceV2 exceeded 60000 milliseconds before any response was received." which have become a bit more common lately (maybe I'm just logging more, not sure) which was crashing my app.

I added the defaultCallback but it didn't help, it just caused the error to be logged again before crashing.

I added the event handler which worked :) Now I just see the log but no crash.

This is my final setup:

import { LoggingBunyan } from "@google-cloud/logging-bunyan";
import * as Logger from "bunyan";

const streams: Logger.Stream[] = [];

// ... code omitted for brevity

const loggingBunyan = new LoggingBunyan({
  logName: "backend",
  projectId: "aram-zone",
  keyFilename: "serviceAccountKey.json",
});
const { level, type, stream } = loggingBunyan.stream("info");
streams.push({
  level,
  type,
  stream: stream.on("error", (err) => console.error("Logger error", err)),
})

export const logger = Logger.createLogger({
  name: "aram-zone",
  streams,
  serializers: Logger.stdSerializers,
});

This is what error logs look like:

caught error GoogleError: Total timeout of API google.logging.v2.LoggingServiceV2 exceeded 60000 milliseconds before any response was received.
    at repeat (/home/bitnami/aram-zone/node_modules/google-gax/build/src/normalCalls/retries.js:66:31)
    at Timeout._onTimeout (/home/bitnami/aram-zone/node_modules/google-gax/build/src/normalCalls/retries.js:101:25)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7) {
  code: 4
}

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
api: logging Issues related to the googleapis/nodejs-logging-bunyan API. priority: p2 Moderately-important priority. Fix may not be included in next release. size: xs Pull request size is extra small. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants