Skip to content
This repository was archived by the owner on Nov 1, 2022. It is now read-only.

Record Sync telemetry in Glean pings #3092

Merged
merged 2 commits into from
Jun 11, 2019
Merged

Conversation

linabutler
Copy link
Contributor

@linabutler linabutler commented May 22, 2019

This PR adds two Glean pings to the Storage-Sync browser component:
one for history, and one for bookmarks. Both pings record the same set
of base metrics, including incoming and outgoing counts, sync duration,
the hashed FxA UID, and, most importantly, the failure reason if the
store fails to sync. The bookmarks ping records additional validation
data.

The Glean schema is a flattened version of the Sync ping that's
currently sent on Firefox Desktop and Firefox for iOS. Instead of
sending a single ping with multiple syncs, each having multiple
engines, we send one Glean ping per engine per sync.

The pings are recorded directly in the component, so any app that
consumes it should get pings for free.

This depends on a not-yet-released version of a-s, so the tests won't pass until then.

Pull Request checklist

  • Quality: This PR builds and passes detekt/ktlint checks (A pre-push hook is recommended)
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry or does not need one
  • Accessibility: The code in this PR follows accessibility best practices or does not include any user facing features

Copy link
Contributor

@Dexterp37 Dexterp37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall approach is good. As highlighted in mozilla-mobile/fenix#2749, I think most of the code from that PR should live here, with this component shipping its own pings.yaml and metrics.yaml files.

@linabutler linabutler force-pushed the telemetry branch 2 times, most recently from 3c3ed6e to 1a93d32 Compare May 31, 2019 02:59
@linabutler linabutler changed the title WIP: Pass Sync telemetry from a-s through a-c to consumers WIP: Record Sync telemetry in Glean pings May 31, 2019
Copy link
Contributor

@Dexterp37 Dexterp37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice, thanks Lina for bearing with us on this. Let me know if you'd like a detailed review on this by one of us.

@linabutler linabutler marked this pull request as ready for review June 1, 2019 00:04
@linabutler linabutler requested a review from a team as a code owner June 1, 2019 00:04
@linabutler
Copy link
Contributor Author

OK, I think this is ready for a first round of review! Thanks for all your help with this, @Dexterp37! I can't assign reviewers, so paging @travis79 and @liuche, too. ☎️

For more context, the a-s PR is mozilla/application-services#1112. The existing Sync ping is also documented here; this pulls those fields into Glean pings.

@linabutler linabutler changed the title WIP: Record Sync telemetry in Glean pings Record Sync telemetry in Glean pings Jun 1, 2019
Copy link
Contributor

@Dexterp37 Dexterp37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice! I'd take another pass after you change the metrics.yaml to share metric definitions across the different pings.

Copy link
Contributor

@Dexterp37 Dexterp37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Lina for following up on this. I think I have an actionable proposal to make testing code a bit simpler and not rely on Glean SDK exposing more than required. Let me know what you think about that (I couldn't test it locally: is this PR requiring some other change somewhere else?).

@travis79
Copy link
Member

travis79 commented Jun 5, 2019

Thank you Lina for following up on this. I think I have an actionable proposal to make testing code a bit simpler and not rely on Glean SDK exposing more than required. Let me know what you think about that (I couldn't test it locally: is this PR requiring some other change somewhere else?).

After looking at @Dexterp37's suggestion, I have to say that I agree with him that mocking would be much better solution for what you are trying to do.

@linabutler
Copy link
Contributor Author

I couldn't test it locally: is this PR requiring some other change somewhere else?

This depends on a yet-to-be-published version of application-services...I think you'll have to build and publish a-s to your local Maven repo if you want to test locally now. Once we cut an a-s release with telemetry support, I'll drop b713ef3, and we can point directly to that.

Copy link
Contributor

@Dexterp37 Dexterp37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to request one last change before the final r+ on this. I think we should do without the refactoring on the Glean side, not even for the resetGlean. From a testing perspective, this should work just fine without it. See my comments below.

Copy link
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with us -- once again, @Dexterp37 has the better grasp on external testing, and I agree with what he suggests (even though it goes against what I first suggested).

I hope to make this work part of a documentation example so that folks in the future doing this sort of thing have a template to follow.

Copy link
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, pending tests passing on CI.

Thanks again for your patience with us as a trailblazer!

Copy link
Contributor

@Dexterp37 Dexterp37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great from a Glean POV, thanks @linacambridge ! I dropped a few non-blocking nits below :) Other passes won't require new reviews from us ;)

@linabutler
Copy link
Contributor Author

Request for data collection review form

All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.

  1. What questions will you answer with this data?

We would like to measure the performance and correctness of our Rust sync implementation. This includes collecting the time taken to sync each data type (currently history and bookmarks), incoming and outgoing record counts, any errors that occur (reporting sanitized error messages in a string field), and, for bookmarks, tree structure problem counts.

With the exception of the error string, which does not contain PII, we're submitting timings and counts only.

  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements?

We need to understand how our new Sync implementation behaves in the wild. The Sync ping in Firefox Desktop and Firefox for iOS already exists, and has been extremely valuable in identifying and diagnosing Sync issues.

This pull request collects the same information as the Sync ping, but ports its structure to Glean.

  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

Server-side metrics are not sufficient to understand Sync performance (especially for each step of an engine sync), given that the bulk of the work happens on clients. Validation data can only be collected on the client, since Sync records are encrypted and opaque to the server. The Sync ping for Desktop and iOS provides some stats about Desktop, but, since all three still use different Sync implementations, can't be extrapolated to Fenix.

  1. Can current instrumentation answer these questions?

Android Components consumers, including Fenix, do not currently report any Sync telemetry.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the Mozilla wiki.

Note that the data steward reviewing your request will characterize your data collection based on the highest (and most sensitive) category.

Measurement Description Data Collection Category Tracking Bug #
Timings, counts, and failure reasons for history syncs Interaction data #3092
Timings, counts, and failure reasons for bookmark syncs Interaction data #3092
  1. How long will this data be collected? Choose one of the following:

I want to permanently monitor this data. (Lina Cambridge)

  1. What populations will you measure?

All Sync users with Sync enabled.

The data is not correlated to the client_id; instead, we send a hash of the user's Firefox account ID (uid). This does not expose new identifiers, as these are already submitted in the Sync ping on other platforms.

  • Which release channels?

All.

  • Which countries?

All.

  • Which locales?

All.

  • Any other filters? Please describe in detail below.

No.

  1. If this data collection is default on, what is the opt-out mechanism for users?

Users can opt-out by disabling telemetry, or signing out of Sync.

  1. Please provide a general description of how you will analyze this data.

We will create queries in re:dash to monitor the engines, adding to our existing sync engine error/success (https://sql.telemetry.mozilla.org/dashboard/sync-leif-status-dashboard-wip) dashboards and engine error analysis notebook for Desktop (https://gist.github.com/mhammond/66684669e1478d65bd60446cf150c244).

  1. Where do you intend to share the results of your analysis?

See above.

  1. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection?

No.

Copy link
Contributor

@liuche liuche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, documented in metrics.yaml

  1. Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, whatever data controls the consumer provides

  1. If the request is for permanent data collection, is there someone who will monitor the data over time?

Permanent data collection for sync data collection, but there are automated tests.

  1. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Type 2, sync behavior of various pings

  1. Is the data collection request for default-on or default-off?

Based on whatever consumers set

  1. Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

Hashed Firefox Account ID

  1. Is the data collection covered by the existing Firefox privacy notice?
    Yes

  2. Does there need to be a check-in in the future to determine whether to renew the data? (Yes/No) (If yes, set a todo reminder or file a bug if appropriate)**
    No, has automated tests

  3. Does the data collection use a third-party collection tool? If yes, escalate to legal.
    No

This commit adds two Glean pings to the Storage-Sync browser component:
one for history, and one for bookmarks. Both pings record the same set
of base metrics, including incoming and outgoing counts, sync duration,
the hashed FxA UID, and, most importantly, the failure reason if the
store fails to sync. The bookmarks ping records additional validation
data.

The Glean schema is a flattened version of the Sync ping that's
currently sent on Firefox Desktop and Firefox for iOS. Instead of
sending a single ping with multiple syncs, each having multiple
engines, we send one Glean ping per engine per sync.

The pings are recorded directly in the component, so any app that
consumes it should get pings for free.
Copy link
Contributor

@grigoryk grigoryk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good!

@grigoryk grigoryk requested a review from travis79 June 11, 2019 02:30
@grigoryk
Copy link
Contributor

Not sure why github isn't letting me land this - " Merging can be performed automatically once the requested changes are addressed. ", but AFAIK it's all good to go (and I don't see any unresolved change requests).

@pocmo could you land this, please, with your super powers?

@pocmo pocmo merged commit ca58aa1 into mozilla-mobile:master Jun 11, 2019
@pocmo
Copy link
Contributor

pocmo commented Jun 11, 2019

🛬

Copy link
Member

@travis79 travis79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this has already been merged, but congrats Lina on being the first to integrate Glean into another component and thanks for helping us improve the docs and the process! Cheers!

@linabutler linabutler deleted the telemetry branch June 11, 2019 15:08
@linabutler
Copy link
Contributor Author

Thanks for all your help getting this working! ❤️

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants