Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Does FESS supports Crawl rate limiting in robots.txt #1835

Open
farooqsheikhpk opened this issue Sep 11, 2018 · 1 comment
Open

Does FESS supports Crawl rate limiting in robots.txt #1835

farooqsheikhpk opened this issue Sep 11, 2018 · 1 comment

Comments

@farooqsheikhpk
Copy link

Hi @marevol
I have checked FESS respects Disallow for robots.txt but i am unable to verify Crawl-delay and Request-rate. Can you please confirm is it implemented?

https://www.promptcloud.com/blog/how-to-read-and-respect-robots-file

  1. Crawl rate limiting

Crawl-delay: 11

This is used to limit crawlers from hitting the site too frequently. As frequent hits by crawlers could place unwanted stress on the server and make the site slow for human visitors, many sites add this line in their robots file. In this case, the site can be crawled with a delay of 11 seconds.

  1. Visit time

Visit-time: 0400-0845

This tells the crawlers about hours when crawling is allowed. In this example, the site can be crawled between 04:00 and 08:45 UTC. Sites do this to avoid load from bots during their peak hours.

  1. Request rate

Request-rate: 1/10

Some websites do not entertain bots trying to fetch multiple pages simultaneously. Request rate is used to limit this behavior. 1/10 as the value means the site allows crawlers to request one page every 10 seconds.

@marevol
Copy link
Contributor

marevol commented Sep 11, 2018

Fess does not support delay and rate.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants