-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Cache warm-up #15
Comments
Things to keep in mind:
Points 1. and 2. inherently require some priority management (a queue of some sort). I would argue that individual URLs can be grouped as follows (sorted by descending priority):
Groups 4 and 5 futhermore contain subgroups:
Subgroups should have priority too, although here no obvious default sorting exists. In case of a blog website, blog posts would have higher priority than custom post types; In case of portfolio website with a news section, portfolio custom post type would have higher priority than blog posts. Finally, items within every (sub)group should have priority too. I assume it could be (most of the time) sort order based on particular database columns: date in case of blog posts, menu order in case of pages etc. Summary Given all the above, the crawling queue can be treated as static list. To get next set of URLs to crawl, one would only need to maintain a pointer of some kind like "Nth static page should be cached next". This pointer would then be reset every time the cache is cleared. Since individual cache entries are only removed manually (see point 3. above), I would argue that they can be set for immediate recrawl. Alternatively the interface could provide an option for that (like "Remove and recrawl"). Implementation notes:
Open questions:
|
Another thing to consider: some pages (URLs in general) are excluded from caching (like WooCommerce cart page). Crawler should handle such URLs gracefully. |
Seems to be a large enough feature to warrant major version bump. |
There are 3-rd party solutions to this problem, probably worth further investigation: |
One elegant solution that fits these criteria is to integrate cache warming with a statistics plugin like Statify and only warm up the cache with M most-visited URLs from last N days (where M and N are respective Statify options). |
Check the response of HTTP request and if it is anything else than 200, push the item back to the feeder for a later recrawl. Fix the handling of request variants (use key instead of name). See #15.
Let internal crawling mechanism trigger cache files creation instead of site visitors.
The text was updated successfully, but these errors were encountered: