-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: Add the MetricsCollector for client side metrics #1566
Conversation
separate file
*/ | ||
onOperationComplete?( | ||
metrics: OnOperationCompleteMetrics, | ||
attributes: OnOperationCompleteAttributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is much cleaner! Although I'm still a little unsure why we need to divide the data up between OnOperationCompleteAttributes and OnOperationCompleteMetrics though, since it seems like they're all used in the same places. Could it be simplified a bit more if you combine them? Or am I missing something?
* @param {string} projectId The id of the project. | ||
* @param {OperationOnlyAttributes} info Information about the completed operation. | ||
*/ | ||
onOperationComplete(projectId: string, info: OperationOnlyAttributes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as onAttemptComplete: don't we just need the status code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could definitely pass in the streamingOperation parameter when we create the metrics collector instead so we don't need to pass it in here. projectId is a different story, because the projectId property on the client is only populated by calling auth just before the grpc call is made so it's not available when the metrics collector gets created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok streamingOperation has been removed.
* Called when status information is received. Extracts zone and cluster information. | ||
* @param {object} status The received status information. | ||
*/ | ||
onStatusReceived(status: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused why onStatusReceived is different from onMetadataReceived. Aren't they both parsing metadata fields? (I noticed the argument is called metadata)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you create a stream in Node and make a grpc call, the stream receives status
and metadata
events. The shape of the status event received matches the first argument of onStatusReceived method and the shape of the metadata event received matches the first argument of onMetadataReceived.
* @param {string} projectId The id of the project. | ||
* @param {OnAttemptCompleteInfo} info Information about the completed attempt. | ||
*/ | ||
onAttemptComplete(projectId: string, info: OnAttemptCompleteInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does a StreamingState need to be passed here? and project id? shouldn't they be known in advance?
It seems to me that the only new information here would be the status code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could definitely pass in the streamingOperation parameter when we create the metrics collector instead so we don't need to pass it in here. projectId is a different story, because the projectId property on the client is only populated by calling auth just before the grpc call is made so it's not available when the metrics collector gets created.
} | ||
|
||
/** | ||
* An interface representing a tabular API surface, such as a Bigtable table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this exist here? Isn't there already a class for the TabularAPISurface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An ITabularApiSurface
parameter is provided when the metrics collector is created. We don't want to require this parameter to be a whole TabularApiSurface
object because only a small subset of the properties are actually used and we don't want to restrict the OperationMetricsCollector
further than we have to. In particular, this simplifies the test for the metrics collector where we create a simple FakeTable
object which would otherwise have to be a whole TabularApiSurface
object.
/** | ||
* An interface representing a Date-like object. Provides a `getTime` method | ||
* for retrieving the time value in milliseconds. Used for abstracting time | ||
* in tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really necessary? There's not a way to mock a stdlib Date without this extra interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer needed since we removed the date provider.
* This information is used for recording metrics. | ||
*/ | ||
interface OnAttemptCompleteInfo { | ||
connectivityErrorCount: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this connectivityErrorCount supposed to be here? I see it's also in OnAttemptCompleteMetrics, which seems more natural. Is this a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This is a typo and OnAttemptCompleteInfo is gone now because after other corrections it only contains one property so that property is just passed directly.
/** | ||
* Whether the operation is a streaming operation or not. | ||
*/ | ||
streamingOperation: StreamingState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this part of OnAttemptComplete? Isn't it known in advance?
My understanding is the streaming
flag is basically just to check if the user called the public read_row
vs read_rows
method, which should be known at call time? I'm not sure if that even applies to node though, since I'm not seeing a point read method.
How were you planning on populating this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is the streaming flag is basically just to check if the user called the public read_row vs read_rows method, which should be known at call time?
That's my understanding too. This is now populated when the metrics collector gets created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment to consider about de-duplicating the attempt/operation data. But overall looks good!
clientName: string; | ||
attemptStatus: grpc.status; | ||
streamingOperation: StreamingState; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking a lot better!
One more comment though: there's a lot of duplicated fields between OnOperationCompleteData and OnAttemptCompleteData. You should consider deduplicating the data so that each field exists in just one place (some fields belong at the operation level, and some are related to the specific attempt, and it could make sense to keep them separate)
In Python, both the Attempt and its associated Operation are passed in for on_attempt_complete. And the Operation holds a list of associated attempts, which is used to calculate retry_count
But that may be less applicable here since you're building these objects on the fly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had something similar before with StandardAttributes
when the metrics and attributes were separate so I decided to de-duplicate the data with a new StandardData interface and then extend it.
Summary:
This PR adds the
MetricsCollector
which the rest of the client library will use to collect client side metrics to send to GCP backend for analysis. None of the methods will useMetricsCollector
yet so this PR doesn't change client library behaviour in any way. It is just in the stage of building infrastructure to support client side metrics.Background:
Below is a diagram of the new data structures that will be added.
Changes:
Files that will contain utilities for unit tests and system tests:
common/client-side-metrics-attributes.ts:
Contains interfaces for various client side metrics attributes.common/logger.ts:
Provides abstract classes for other test classes so that those test classes can easily log test results.common/test-date-provider:
: Exports a DateProvider class used in metrics collector tests that provides a predictable series of dates so that the metrics collector class used in the tests behaves consistently.common/test-metrics-handler.ts:
Exports a metrics handler for testing that simply collects results the metrics collector records instead of sending the recorded results to the GCP backend.Files directly pertaining to the metrics collector:
src/client-side-metrics/metrics-collector.ts:
Contains the new MetricsCollector class which will be created to collect metrics each time a method is called.src/client-side-metrics/metrics-handler.ts:
The metrics handler interface that the metrics collector can use. Users can also supply their own metrics handler for the metrics collector to use.test/metrics-collector/metrics-collector.ts
: A test file for the metrics collector that introduces a fake table with a fake method that records metrics using the metrics collector and ensures the right metrics from the method are getting recorded.test/metrics-collector/typical-method-call.ts
: The test intest/metrics-collector/metrics-collector.ts
compares this file with the logger output of the metrics collector and ensures that these two series of logs are exactly the same.test/metrics-collector/metrics-handler-fixture.ts
: A file containing a constant that is compared against the data the metrics handler collects in the test.Other changes:
package.json:
Add new open telemetry packagesSuggestions for how to review this PR:
It may be a good idea to review
test/metrics-tracer/typical-method-call.txt
first because it contains a plain-text log of what the metrics tracer is actually recording for a method call. Looking at this file may reveal details regarding when the client is collecting certain metrics without having to understand the client metrics collector architecture itself.Next Steps:
metrics-handler-fixture.ts
constant into it and ensure it writes to the metrics dashboard with a system test.