Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
-
Updated
May 20, 2025 - Python
Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
Dafonts Free Dataset and python scripts used to make it
simLIBS provides Python class to simulate LIBS spectra with NIST LIBS Database interface.
Library to programmatically build labeled datasets for Named-Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks
XFetcher is Python Library for downloading data from different sources for machine learning purposes
API for showing the crawled wiki movies content and list of movies
Python JSON ORM (simple module all in one file)
🫙 Event datasets used for training machine learning models.
Add a description, image, and links to the ml-datasets topic page so that developers can more easily learn about it.
To associate your repository with the ml-datasets topic, visit your repo's landing page and select "manage topics."