Skip to content

Miller-Library/taxa-ocr-output

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

taxa-ocr-output

This repository contains folders corresponding to 672 druids, each of which represents one PDF file associated with the Taxa project.

Within each folder, for each page of the PDF, there is a JSON file which records the output from the Transkribus Metagrapho API (see https://github.com/Miller-Library/taxa-ocr-scripts) and a text file which has just the plain text.

Stats
-----
# Folders (== # druids/pdfs):       672  
# JSON files (== # pages):       15,391  (~768MB)
# Text files (== # pages):       15,391  ( ~64MB)

# Lines of text:                390,216
# Characters:                 2,299,977

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published