Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Question about caching #103

Closed
matkoniecz opened this issue Jan 14, 2018 · 3 comments
Closed

Question about caching #103

matkoniecz opened this issue Jan 14, 2018 · 3 comments

Comments

@matkoniecz
Copy link

Is caching something that every user should reimplement or is it planned to add built-in caching? Or is cache that allows to avoid rerequesting the same data on each run available somehow and I missed it?

@siznax siznax changed the title caching Question about caching Jan 16, 2018
@siznax
Copy link
Owner

siznax commented Jan 16, 2018

Thanks for trying wptools @matkoniecz! Its caching features prevent requests for the same data, as described in the docs under Request actions. Let us know if you have a problem with caching.

@siznax siznax closed this as completed Jan 16, 2018
@matkoniecz
Copy link
Author

According to my tests it is using per-process cache, so there is no cache lasting between separate program runs.

My usecase requires cache persisting between program runs (probably stored as file) and I wonder is it easy to swap cache engine to achieve this.

@siznax
Copy link
Owner

siznax commented Jan 18, 2018

Hi @matkoniecz. We're not using any "cache engine", but it should be straightforward to keep track of cached requests and avoid calling them again (unless you think you need to).

Here's an example of what caching currently looks like:

>>> page = wptools.page('Jazz')

>>> page.get_query()
en.wikipedia.org (query) Jazz
pageen.wikipedia.org (imageinfo) File:Louis Armstrong restored.jpg
Jazz (en) data
{
  ...
}
>>> page.cache.keys()
>>> ['query', 'imageinfo']

>>> page.cache['query'].keys()
['info', 'query', 'response']
>>> page.cache['query']['info']
{'bytes': 21542.0,
 'content': 'application/json; charset=utf-8',
 'kB/s': '32.7',
 'seconds': '0.660',
 'status': 200,
 'url': 'https://en.wikipedia.org/w/api.php?action=query&exintro&inprop=url|watchers&list=random&pithumbsize=240&pllimit=500&ppprop=disambiguation|wikibase_item&prop=extracts|info|links|pageimages|pageprops|pageterms|redirects&redirects&rdlimit=500&rnlimit=1&rnnamespace=0&titles=Jazz',
 'user-agent': 'wptools/0.4.7 (https://github.com/siznax/wptools) PycURL/7.43.0.1 libcurl/7.54.0 LibreSSL/2.0.20 zlib/1.2.11 nghttp2/1.24.0'}
>>> page.cache['query']['query']
'https://en.wikipedia.org/w/api.php?action=query&exintro&format=json&formatversion=2&inprop=url|watchers&list=random&pithumbsize=240&pllimit=500&ppprop=disambiguation|wikibase_item&prop=extracts|info|links|pageimages|pageprops|pageterms|redirects&redirects&rdlimit=500&rnlimit=1&rnnamespace=0&titles=Jazz'
>>> page.cache['query']['response'][:80]
'{"continue":{"rncontinue":"0.072021668822|0.072022074465|22127666|0","plcontinue'

The page.cache['query']['query'] member (above) is the request URL that was made by get_query() against the Mediawiki API (also shown in page.cache['query']['info']['url']. This is briefly described for other types of requests in Request actions.

If your use case requires something more sophisticated for some reason, maybe you can tell us more about.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants