library supporting NLP and CV research on scientific papers
-
Updated
Nov 8, 2024 - Python
library supporting NLP and CV research on scientific papers
Multiple and Large PDF Documents Text Extraction.
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.
LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.
An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.
📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study
A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing
PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.
This is some useful mini projects that I had worked for self-learning Python programming.
A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.
A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.
Opinionated and Sophisticated Document Region Analyzer.
MistralOCR is an open-source application that transforms documents into structured data using Mistral AI's OCR capabilities. Built with FastAPI and Streamlit, it provides an intuitive interface for extracting and processing text from PDFs and images, making document digitization effortless and accurate.
PDF Professor 2.0 extracts and processes PDF text, analyzed by Ollama for summarization, data extraction, and insights. More coming soon!
CLI tool to merge, compress, extract or delete pages from PDF
PDF Extractor API is a FastAPI project for extracting information from PDFs. It includes user authentication, PDF uploading, and text extraction. The API supports secure PDF uploads, keyword-based extraction, and rate limiting.
A web application for calculating credit points and GPA from PDF transcripts. Built with FastAPI and pdfplumber, this tool simplifies the process for BIU engineering students.
Resume Ranker is an AI-powered system that automatically analyzes and ranks resumes based on job-specific criteria. It fetches resumes from Google Drive, extracts text, scores candidates using Google Gemini API, and saves the results in a CSV file for easy review.
A powerful Q&A system using Google's Gemini Pro API with vector storage (AstraDB) and LLM monitoring. Supports text, images, PDFs, DOCXs, URLs, and YouTube videos.
Add a description, image, and links to the pdf-processing topic page so that developers can more easily learn about it.
To associate your repository with the pdf-processing topic, visit your repo's landing page and select "manage topics."