Skip to content

Latest commit

 

History

History
41 lines (31 loc) · 1.8 KB

README.md

File metadata and controls

41 lines (31 loc) · 1.8 KB

Consults datasetter pleiades dataset for places modified within a certain time frame, then checks those pleiades URIs live on the web to make sure the place hasn't been withdrawn or deleted, then checks the internet archive to see if their snapshot of that page has been updated since the last pleiades revision and, if not, asks internet archive to grab a new snapshot.

Run it like:

python waybackit.py

That will run silently. To get some feedback, try:

python waybackit.py -v

By default, the script looks back over the past week. If you want to change that horizon:

python waybackit.py -s 2022-07-20

There are more options:

python waybackit.py -h
                    [-f FROM] [-u USERAGENT]

Ensure recently added/changed Pleiades places are archived

options:
  -h, --help            show this help message and exit
  -l LOGLEVEL, --loglevel LOGLEVEL
                        desired logging level (case-insensitive string: DEBUG, INFO,
                        WARNING, or ERROR (default: NOTSET)
  -v, --verbose         verbose output (logging level == INFO) (default: False)
  -w, --veryverbose     very verbose output (logging level == DEBUG) (default: False)
  -s START, --start START
                        date when to start archiving (default: one week ago)
  -e END, --end END     date when to end archiving (default: today)
  -d DATASETTER, --datasetter DATASETTER
                        path to location of datasetter cache (default:
                        ~/Documents/files/D/datasetter/data/cache)
  -f FROM, --from FROM  email address for http request headers (default:
                        pleiades.admin@nyu.edu)
  -u USERAGENT, --useragent USERAGENT
                        user agent for http request headers (default:
                        PleiadesGazetteer/today (+https://pleiades.stoa.org))