Skip to content

Commit

Permalink
Adds detection for BBC bots
Browse files Browse the repository at this point in the history
  • Loading branch information
liviuconcioiu committed Dec 25, 2023
1 parent e4b14f6 commit a6fc878
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 0 deletions.
18 changes: 18 additions & 0 deletions Tests/fixtures/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5899,3 +5899,21 @@
producer:
name: Open Search Foundation e.V.
url: https://openwebsearch.eu/
-
user_agent: Page Monitor (https://confluence.dev.bbc.co.uk/display/men/Page+Monitor)
bot:
name: BBC Page Monitor
category: Site Monitor
url: https://confluence.dev.bbc.co.uk/display/men/Page+Monitor
producer:
name: BBC
url: https://www.bbc.com/
-
user_agent: BBC-Forge-URL-Monitor-Twisted
bot:
name: BBC Forge URL Monitor
category: Site Monitor
url: https://www.bbc.com/
producer:
name: BBC
url: https://www.bbc.com/
16 changes: 16 additions & 0 deletions regexes/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3516,6 +3516,22 @@
name: 'Open Search Foundation e.V.'
url: 'https://openwebsearch.eu/'

- regex: 'bbc.co.uk/display/men/Page\+Monitor'
name: 'BBC Page Monitor'
category: 'Site Monitor'
url: 'https://confluence.dev.bbc.co.uk/display/men/Page+Monitor'
producer:
name: 'BBC'
url: 'https://www.bbc.com/'

- regex: 'BBC-Forge-URL-Monitor-Twisted'
name: 'BBC Forge URL Monitor'
category: 'Site Monitor'
url: 'https://www.bbc.com/'
producer:
name: 'BBC'
url: 'https://www.bbc.com/'

# Generic detections
- regex: '[a-z0-9\-_]*((?<!cu|power[ _]|m[ _])bot(?![ _]TAB|[ _]?5[0-9]|[ _]Senior|[ _]Junior)|crawler|crawl|checker|archiver|transcoder|spider|^firefox$|^chrome$)([^a-z]|$)'
name: 'Generic Bot'

0 comments on commit a6fc878

Please # to comment.