Skip to content

Dataset snapshot release v0.1.1

Compare
Choose a tag to compare
@mattbierbaum mattbierbaum released this 23 Mar 00:21
· 108 commits to master since this release

Initial release (up to a security update) with ability to gather and process:

  1. arXiv metadata provided by OAI
  2. PDFs downloaded from S3
  3. Full plain text generated by pdftotext
  4. Internal co-citation network

The binaries available are:

  • arxiv-metadata-hash-abstracts-v0.1.1-2019-03-01.json.gz
    Full metadata downloaded from (1) with hashed abstracts in place of the abstract text.
  • internal-references-v0.1.1-2019-03-01.json.gz
    Snapshot of the internal co-citation network at the time of release generated with (4).