Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Added support for detecting bots through client-hints #7316

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sanchezzzhak
Copy link
Collaborator

@sanchezzzhak sanchezzzhak commented Dec 30, 2022

for matomo, you will need to add a new HTTP_X_CLIENT key
https://github.com/matomo-org/matomo/blob/d1eaaca1b7abdddea15ff8d1d8e2075b6f92c672/core/Http.php#L996-L1012

[!] this PR is worth viewing about when we have more bots through the xClient header.

@sanchezzzhak sanchezzzhak linked an issue Dec 30, 2022 that may be closed by this pull request
@liviuconcioiu
Copy link
Collaborator

@sanchezzzhak I have something to add:

  1. Can you also include the HTTP_FROM header? According to https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/From this one is a must for any crawler. So, if this header is present it should be detected as Generic bot for example.
  2. Here are some HTTP_FROM values that should be added, to be detected as specific bots:
googlebot(at)googlebot.com
bingbot(at)microsoft.com
support@search.yandex.ru
the.knowledge.ai@gmail.com
crawler@alexa.com

@sanchezzzhak
Copy link
Collaborator Author

@liviuconcioiu Perhaps the implementation of HTTP_FROM should be added separately, since this PR is in limbo.
It would be good to have approximate statistics before implementing new features.

@sanchezzzhak
Copy link
Collaborator Author

ChatGPT support header HTTP_FROM;

From: gptbot(at)openai.com
User-Agent" Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

@liviuconcioiu
Copy link
Collaborator

Here is a complete list of what I have so far:

bingbot(at)microsoft.com
googlebot(at)googlebot.com
gptbot(at)openai.com
robot@seokicks.de
support@search.yandex.ru
tech@babbar.tech
TGVnaXRpbWF0ZSBsaW5rIHRyYWNrZXI=
the.knowledge.ai@gmail.com
wc@verisign.com
crawler@alexa.com
pigafetta-bot(at)visual-seo.com
oai-searchbot@openai.com
"<?=print(9347655345-4954366);?>"
root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.oast.site

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improvements to the ClientHints - HTTP_X_CLIENT
2 participants