This repository contains folders corresponding to 672 druids, each of which represents one PDF file associated with the Taxa project.
Within each folder, for each page of the PDF, there is a JSON file which records the output from the Transkribus Metagrapho API (see https://github.com/Miller-Library/taxa-ocr-scripts) and a text file which has just the plain text.
Stats
-----
# Folders (== # druids/pdfs): 672
# JSON files (== # pages): 15,391 (~768MB)
# Text files (== # pages): 15,391 ( ~64MB)
# Lines of text: 390,216
# Characters: 2,299,977