Skip to content
This repository has been archived by the owner on Jun 8, 2018. It is now read-only.

Is there evidence for referral tracking by CDNs? #99

Closed
Gitoffthelawn opened this issue May 30, 2016 · 5 comments
Closed

Is there evidence for referral tracking by CDNs? #99

Gitoffthelawn opened this issue May 30, 2016 · 5 comments
Labels

Comments

@Gitoffthelawn
Copy link
Contributor

Is there any evidence to believe, or not believe, that CDN's are recording HTTP referrers (technically referers) from billions of hits?

For those that aren't familiar, HTTP referrers (technically referers) are strings that are often transmitted as part of an HTTP request for a resource. That string tells the other resource which page requested it. Historically, most browsers sent them for almost every request, which posed massive privacy and security issues. That's been clamped down a little as of late, but they are still in wide use (and Google appears to use them for some surprising things).

It is trivial to strip out those headers programmatically or via an extension, but most people do neither.

By the way, here is a Mozilla development page regarding this issue: https://wiki.mozilla.org/Privacy/Features/Shortened_HTTP_Referer_header

So, is there any evidence, one way or another, regarding the collection of this data by CDN's?

@TriMoon
Copy link

TriMoon commented Jun 6, 2016

So, is there any evidence, one way or another, regarding the collection of this data by CDN's?

Until explicitly stated by the CDN in its public documentation regarding privacy and their service, that they DON'T collect NOR use referer header data in anyway;
You are (as a user) advised to assume in good faith that they DO collect and use that info.
Because the only party that benefits from keeping that info from not leaving your browser is you...

The WWW has a DNT (Do Not Track) header, but as many know that is not honored by those that WANT to track you for their own benefit...
The only protection we have is a personal proxy that strips-out those headers plain and simple...(and even replace with random data to pollute their data)
If sites break because they rely on referer then you're better off not using them anyway 😉

@austinhartzheim
Copy link

I noticed that the MathJax CDN is actually being run behind CloudFlare, which means that CloudFlare's privacy policy applies.

From the CloudFlare privacy policy:

As visitors browse our website, or our users’ websites if they are protected by CloudFlare, we normally log these visitors’ interactions in order to provide better services to our users (e.g., using visitor log data in order to detect new threats and malicious third parties).
[...]
CloudFlare may aggregate data we acquire about our users and the visitors to their websites. For example, we may assemble data to determine how Web crawlers index the Internet and whether they are engaged in malicious activity. If we assemble this sort of data and provide it to external parties, our users’ personal information will never be attached to or included in such aggregated data. Please note, data that our users provide to us, such as log files of their site’s visitors, may be included in the aggregate data, reports, and statistics.

From a privacy perspective, it is also noteworthy that CloudFlare uses cookies to uniquely identify your device across networks. It is not clear if this information is logged or used outside of identifying your computer as "trusted" when you change networks:

As part of our services, CloudFlare may also place cookies on the computers of visitors to your CloudFlare-protected website. We do this to in order to identify malicious visitors, to reduce the chance of blocking legitimate users, and to provide customized services.

While I am not a lawyer, it seems possible that a referer header could be considered part of a visitor interaction and therefore can/would be logged.

@TriMoon
Copy link

TriMoon commented Jun 14, 2016

For example, we may assemble data to determine how Web crawlers index the Internet.

Can anyone explain how a web-server would be able to separate if his visitor is a web crawler or a human if that crawler doesn't expose itself as a crawler via headers?
(Don't try to answer lol)

What they describe in the part i quoted is that they will use ANY data they can get ahold of to assemble (eg. combine) data to determine the connection paths between websites on the internet.

  • To do that:
    • They need to know which page access is done by which browser (human or web crawler)
    • They need to know what the previous page was.
  • To get those info they can combine:
    • Access time and IP# from the server logs.
    • Their own cookies in the users browser.
    • The referer header.
    • The Ping attribute.
    • zero-width pixels/images to their own server.
    • etc...

@stewie
Copy link

stewie commented Jun 17, 2016

TriMoon asked "how". Individual 'headers' exist as concatenated label:value pairs with an http request and their consecutive order varies among agents. This detail serves as a bit of a fingerprint (and extremely few bots, or headless browsers, bother to spoof this detail).

for reference, find a downloadable copy of this 47page PDF:
"HTTP Header Analysis"
Aug 31, 2015
author: Roland Zegers
University of Amsterdam

Also, google crawlers (and those operated by other search indexing entities) hail from a known/published set of IPaddresses, er, netranges. Many webservers employ "web application firewalls" which contain netrange lookup lists ~~ WAF will detect (and or react) if your request is a spoof, based on e.g. "no, googlebot would NEVER hail from that IPaddress".

@Synzvato Synzvato changed the title [discussion] Referrers (referers) and CDN's Is there evidence for referral tracking by CDNs? Jun 18, 2016
@Atavic
Copy link

Atavic commented Feb 19, 2017

The Cookie Matching Service enables a buyer to associate two types of cookies:

  1. One that identifies a user within the buyer domain.
  2. A doubleclick.net cookie that identifies a Google user.
    We share a buyer-specific encrypted user ID for buyers to match on.

https://developers.google.com/ad-exchange/rtb/cookie-guide

Tracking GA users' ID

See: Cookie Syncing

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants