Cache warm-up #15

chesio · 2019-01-30T19:09:03Z

Let internal crawling mechanism trigger cache files creation instead of site visitors.

chesio · 2020-02-03T10:47:47Z

Things to keep in mind:

Crawling should not overload the web server.
Some URLs are more important than others.
Individual cache entries are never removed automatically, they can only be removed "by hand" via Cache Viewer interface, WP-CLI command or deletion from disk.
Request variants should be supported as much as possible.

Points 1. and 2. inherently require some priority management (a queue of some sort). I would argue that individual URLs can be grouped as follows (sorted by descending priority):

Front-page - is_front_page()
Blog-page - is_home()
Static pages - is_page()
Posts (including CPT, excluding attachments) - is_single()
Taxonomy pages - is_tax()

Groups 4 and 5 futhermore contain subgroups:

in case of group 4: built-in posts and (optional) custom post types
in case of group 5: built-in category and tags taxonomy and (optional) custom taxonomies

Subgroups should have priority too, although here no obvious default sorting exists. In case of a blog website, blog posts would have higher priority than custom post types; In case of portfolio website with a news section, portfolio custom post type would have higher priority than blog posts.

Finally, items within every (sub)group should have priority too. I assume it could be (most of the time) sort order based on particular database columns: date in case of blog posts, menu order in case of pages etc.

Summary

Given all the above, the crawling queue can be treated as static list. To get next set of URLs to crawl, one would only need to maintain a pointer of some kind like "Nth static page should be cached next". This pointer would then be reset every time the cache is cleared.

Since individual cache entries are only removed manually (see point 3. above), I would argue that they can be set for immediate recrawl. Alternatively the interface could provide an option for that (like "Remove and recrawl").

Implementation notes:

The ordering of groups, subgroups and items must be strict.
The ordering of groups, subgroups and items should be filterable.
It should be possible to exclude certain group, subgroup or item.

Open questions:

Should there be some indications of crawling progress? Is this feasible?
How to proceed with request variants? Does wp_remote_get() support cookies?

chesio · 2020-05-22T09:16:35Z

Another thing to consider: some pages (URLs in general) are excluded from caching (like WooCommerce cart page). Crawler should handle such URLs gracefully.

chesio · 2020-05-29T08:25:00Z

Seems to be a large enough feature to warrant major version bump.

chesio · 2020-05-29T08:28:46Z

There are 3-rd party solutions to this problem, probably worth further investigation:

chesio · 2020-05-29T08:38:42Z

Things to keep in mind:
1. Crawling should not overload the web server.
2. Some URLs are more important than others.

One elegant solution that fits these criteria is to integrate cache warming with a statistics plugin like Statify and only warm up the cache with M most-visited URLs from last N days (where M and N are respective Statify options).

Check the response of HTTP request and if it is anything else than 200, push the item back to the feeder for a later recrawl. Fix the handling of request variants (use key instead of name). See #15.

chesio added the enhancement New feature or request label Jan 30, 2019

chesio self-assigned this Jan 30, 2019

chesio mentioned this issue Jan 30, 2019

Implement "boost mode" #14

Closed

chesio added this to the 1.8 milestone Feb 3, 2020

chesio modified the milestones: 1.8, 1.9 May 8, 2020

chesio modified the milestones: 1.9.x, 2.0.x May 29, 2020

chesio closed this as completed in 2354b4e Aug 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache warm-up #15

Cache warm-up #15

chesio commented Jan 30, 2019

chesio commented Feb 3, 2020

chesio commented May 22, 2020

chesio commented May 29, 2020

chesio commented May 29, 2020

chesio commented May 29, 2020

Cache warm-up #15

Cache warm-up #15

Comments

chesio commented Jan 30, 2019

chesio commented Feb 3, 2020

chesio commented May 22, 2020

chesio commented May 29, 2020

chesio commented May 29, 2020

chesio commented May 29, 2020