Dataset snapshot release v0.1.1
Initial release (up to a security update) with ability to gather and process:
- arXiv metadata provided by OAI
- PDFs downloaded from S3
- Full plain text generated by pdftotext
- Internal co-citation network
The binaries available are:
- arxiv-metadata-hash-abstracts-v0.1.1-2019-03-01.json.gz
Full metadata downloaded from (1) with hashed abstracts in place of the abstract text. - internal-references-v0.1.1-2019-03-01.json.gz
Snapshot of the internal co-citation network at the time of release generated with (4).