A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.
- PDF to Markdown conversion with formatting preservation using Docling
- Automatic image extraction with quality preservation using XRef Id
- Table detection using Microsoft's Table Transformer
- PDF URL support for core functionalities
- AI-powered image and table descriptions using multiple LLM providers
- Interactive HTML output with downloadable Excel tables
- Customizable image resolution and UI elements
- Comprehensive logging system
- Support for other files
- Streamlit/web interface
pip install markdrop
Python Package Index (PyPI) Page: https://pypi.org/project/markdrop
from markdrop import extract_images, make_markdown, extract_tables_from_pdf
source_pdf = 'url/or/path/to/pdf/file' # Replace with your local PDF file path or a URL
output_dir = 'data/output' # Replace with desired output directory's path
make_markdown(source_pdf, output_dir)
extract_images(source_pdf, output_dir)
extract_tables_from_pdf(source_pdf, output_dir=output_dir)
from markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging
# Configure processing options
config = MarkDropConfig(
image_resolution_scale=2.0, # Scale factor for image resolution
download_button_color='#444444', # Color for download buttons in HTML
log_level=logging.INFO, # Logging detail level
log_dir='logs', # Directory for log files
excel_dir='markdropped-excel-tables' # Directory for Excel table exports
)
# Process PDF document
input_doc_path = "path/to/input.pdf"
output_dir = Path('output_directory')
# Convert PDF and generate HTML with images and tables
html_path = markdrop(input_doc_path, output_dir, config)
# Add interactive table download functionality
downloadable_html = add_downloadable_tables(html_path, config)
from markdrop import setup_keys, process_markdown, ProcessorConfig, AIProvider, logger
from pathlib import Path
# Set up API keys for AI providers
setup_apikeys(key='gemini') # or setup_keys(key='openai')
# Configure AI processing options
config = ProcessorConfig(
input_path="path/to/markdown/file.md", # Input markdown file path
output_dir=Path("output_directory"), # Output directory
ai_provider=AIProvider.GEMINI, # AI provider (GEMINI or OPENAI)
remove_images=False, # Keep or remove original images
remove_tables=False, # Keep or remove original tables
table_descriptions=True, # Generate table descriptions
image_descriptions=True, # Generate image descriptions
max_retries=3, # Number of API call retries
retry_delay=2, # Delay between retries in seconds
gemini_model_name="gemini-1.5-flash", # Gemini model for images
gemini_text_model_name="gemini-pro", # Gemini model for text
image_prompt=DEFAULT_IMAGE_PROMPT, # Custom prompt for image analysis
table_prompt=DEFAULT_TABLE_PROMPT # Custom prompt for table analysis
)
# Process markdown with AI descriptions
output_path = process_markdown(config)
from markdrop import generate_descriptions
prompt = "Give textual highly detailed descriptions from this image ONLY, nothing else."
input_path = 'path/to/img_file/or/dir'
output_dir = 'data/output'
llm_clients = ['gemini', 'llama-vision'] # Available: ['qwen', 'gemini', 'openai', 'llama-vision', 'molmo', 'pixtral']
generate_descriptions(
input_path=input_path,
output_dir=output_dir,
prompt=prompt,
llm_client=llm_clients
)
Converts PDF to markdown and HTML with enhanced features.
Parameters:
input_doc_path
(str): Path to input PDF fileoutput_dir
(str): Output directory pathconfig
(MarkDropConfig, optional): Configuration options for processing
Adds interactive table download functionality to HTML output.
Parameters:
html_path
(Path): Path to HTML fileconfig
(MarkDropConfig, optional): Configuration options
Configuration for PDF processing:
image_resolution_scale
(float): Scale factor for image resolution (default: 2.0)download_button_color
(str): HTML color code for download buttons (default: '#444444')log_level
(int): Logging level (default: logging.INFO)log_dir
(str): Directory for log files (default: 'logs')excel_dir
(str): Directory for Excel table exports (default: 'markdropped-excel-tables')
Configuration for AI processing:
input_path
(str): Path to markdown fileoutput_dir
(str): Output directory pathai_provider
(AIProvider): AI provider selection (GEMINI or OPENAI)remove_images
(bool): Whether to remove original imagesremove_tables
(bool): Whether to remove original tablestable_descriptions
(bool): Generate table descriptionsimage_descriptions
(bool): Generate image descriptionsmax_retries
(int): Maximum API call retriesretry_delay
(int): Delay between retries in secondsgemini_model_name
(str): Gemini model for image processinggemini_text_model_name
(str): Gemini model for text processingimage_prompt
(str): Custom prompt for image analysistable_prompt
(str): Custom prompt for table analysis
Legacy function for basic PDF to markdown conversion.
Parameters:
source
(str): Path to input PDF or URLoutput_dir
(str): Output directory pathverbose
(bool): Enable detailed logging
Legacy function for basic image extraction.
Parameters:
source
(str): Path to input PDF or URLoutput_dir
(str): Output directory pathverbose
(bool): Enable detailed logging
Legacy function for basic table extraction.
Parameters:
pdf_path
(str): Path to input PDF or URLstart_page
(int, optional): Starting page numberend_page
(int, optional): Ending page numberthreshold
(float, optional): Detection confidence thresholdoutput_dir
(str): Output directory path
Check an example in run.py
We welcome contributions! Please see our Contributing Guidelines for details.
- Clone the repository:
git clone https://github.com/shoryasethia/markdrop.git
cd markdrop
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install development dependencies:
pip install -r requirements.txt
markdrop/
├── LICENSE
├── README.md
├── CONTRIBUTING.md
├── CHANGELOG.md
├── requirements.txt
├── setup.py
└── markdrop/
├── __init__.py
├── src
| └── markdrop-logo.png
├── main.py
├── process.py
├── api_setup.py
├── parse.py
├── utils.py
├── helper.py
├── ignore_warnings.py
├── run.py
└── models/
├── __init__.py
├── .env
├── img_descriptions.py
├── logger.py
├── model_loader.py
├── responder.py
└── setup_keys.py
This project is licensed under the MIT License - see the LICENSE file for details.
See CHANGELOG.md for version history.
Please note that this project follows our Code of Conduct.