-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Analytics accuracy check #2045
Comments
From @KennaW We maintain a dynamic exclusion list of known robots and crawlers at https://github.com/atmire/COUNTER-Robots. All COUNTER compliant entities use this list to eliminate bots and crawlers. I hope it helps. Do let us know if you find any bots or crawlers not on this list, our Robots and Crawlers working group will review and update the list accordingly. |
After a conversation with the university's google analytics contact (Kelly Holcomb :) ), she recommended trying to route the 'real' traffic through a custom url campaign https://support.google.com/analytics/answer/1033863?hl=en |
With the recent change from Google Analytics 3 to GA 4, we've been looking at the views and downloads for SA again. Reliable usage statistics are still important to creators. "Reliable" and "accurate" means real humans viewing and downloading SA content. Some artificially high counts may due caused by counting thumbnail hits, which would be resolved with #1889. The main culprit seems to be bot traffic, and we should leverage GA4 improvements to bot filtering. @KennaW and @CGillen did some exploratory work and likely have more to say. |
After doing a little more exploring. It looks like for This seems within reason of being accurate for raw download visits. Not sure on 'reliability.' Regular page visits are way off. Again, log parsing is imperfect and is likely over reporting with clear bot traffic, but excluding downloads (and thumbnails), admin/dashboard, edit/new interfaces, we got around 1m page visits for |
Still not sure why |
Yes, I think that would improve understandability of our stats. Thanks! |
Ok, I'm seeing analytics in this kind of break down: GA4: Previous GA: To me, this seems pretty reasonably accurate now |
@CGillen I think there's been solid improvement. I agree these numbers seem reasonable, or at least I don't have any data to say otherwise. I think Clara's original concern for this ticket was about download numbers being much higher than page views, which we're still seeing at this macro level with 2K page views vs 10K downloads daily, and this sort of doesn't agree with how library folks expect users to navigate to works -- search, arrive at landing page (+1 page view), and then decide to download (+1 download) -- or sometimes decide not to download, which would result in overall more views than downloads. I think at least one of these things is happening:
...Or is it something else entirely? Do we have any way to know? |
We can revisit after public-facing stats are restored to the site (#2602) |
Descriptive summary
@clarallebot noticed some high download numbers for three recent deposits and wonders how accurate they are. The number of downloads per day are similar or identical (85 for each on 2/20). The pageviews for the corresponding records are in single-digits.
Analytics are important to creators; our numbers need to be as accurate as possible.
I looked at some other Public items that were recently deposited and they have very high downloads:
1144 downloads for this thesis but only 12 pageviews
928 downloads with 15 pageviews
Expected behavior
Analytics to exclude bots and crawlers.
Actual behavior
Peer Review of Research Data Submissions v692td402
Went live 2/18/20
Analytics
2/18: 43 downloads
2/19: 86
2/20: 85
Record pageviews: 2
Remediation Data Management Plans vm40xz548
Went live 2/18/20
Analytics
2/18: 47 downloads
2/19: 86
2/20: 85
Record pageviews: 4
Give Them What They Want 9593v2274
Went live 2/19/20, mid-day
Analytics
2/19: 52 downloads
2/20: 85
Record pageviews: 4
The text was updated successfully, but these errors were encountered: