This Streamlit application provides a user-friendly interface to aggregate content from various source files (including PDFs, plain text, Markdown, and source code) into a single, structured context file suitable for Large Language Models (LLMs). It leverages the capabilities of two external command-line tools:
llama-parse
: Used to extract text content from PDF documents into Markdown format via the LlamaCloud API.files-to-prompt
: Used to gather text-based files from specified directories and format the combined content recursively.
Additionally, it includes a feature to suggest an expert persona system prompt based on the generated context, utilizing Google's Gemini AI model.
The primary goal is to simplify preparing large, diverse context sets for LLMs, outputting a single file where each source document is wrapped in Claude XML-style <document>
tags (files-to-prompt --cxml
).
Key External Dependencies:
- PDF Parsing (
llama-parse
): Requires internet and a one-time manual authentication (llama-parse auth
) using a LlamaCloud API key. - Prompt Suggestion (Gemini): Requires internet and a Google AI API key set as the
GEMINI_API_KEY
environment variable.
This app checks for necessary prerequisites but does not store or directly handle API keys itself.
- Multi-Source Processing: Handles PDF documents and various text-based files (
.txt
,.md
,.py
,.js
,.json
,.xml
, etc.). - PDF Parsing Integration: Uses
llama-parse
to convert PDF content to Markdown via LlamaCloud. Parsed files are stored in a configurable sub-directory. - LlamaParse Authentication Check: Verifies if
llama-parse auth
has been completed before attempting PDF parsing. - Flexible Processing Modes:
- Both: Parses PDFs, then combines all files recursively from the main TXT directory (including parsed PDFs in the subfolder).
- PDF only: Parses PDFs, then combines only the results from the parsed PDF sub-directory.
- TXT only: Skips PDF parsing, clears old parsed results, then combines only files recursively from the main TXT directory.
- File Management UI: Upload, view, and delete PDF and Plaintext files in their respective input directories directly through the app.
- Configurable Paths: Set input/output directories and filenames easily via the sidebar.
- ✨ System Prompt Suggestion: After generating context, uses Google's Gemini model (
gemini-2.5-pro-exp-03-25
as of 2025-03-27) to analyze a snippet and suggest a system prompt instructing an AI to act as a relevant expert using that context. RequiresGEMINI_API_KEY
environment variable. - Structured Output Format: Generates context using Claude XML tags (
<document path="...">...</document>
). - In-App Previews: Displays generated context and suggested system prompts.
- Status & Error Feedback: Provides messages, progress indicators, and toasts.
- Python: Version 3.8+ recommended (
python3 --version
). uv
(Recommended Installer/Venv): A fast, modern Python package installer and virtual environment manager. While standardpip
/venv
works,uv
significantly speeds up installation.- Install
uv
:# Linux/macOS/WSL curl -LsSf [https://astral.sh/uv/install.sh](https://astral.sh/uv/install.sh) | sh
- (For other methods like
pipx
or Windows standalone, see uv documentation).
- (For other methods like
- Verify:
uv --version
(You might need to restart your terminal after installation).
- Install
- Node.js & npm: Required only for installing
llama-parse-cli
.- Check:
node -v
andnpm -v
- Install (e.g., Ubuntu/Debian):
sudo apt update && sudo apt install nodejs npm -y
- Check:
- LlamaCloud API Key: Needed only for the one-time
llama-parse auth
setup in your terminal. Get from LlamaCloud. - Google AI API Key: Needed for the "Suggest System Prompt" feature. Get from Google AI Studio. Must be set as the
GEMINI_API_KEY
environment variable.
-
Get the Code:
- Save the Streamlit Python scripts (
context_generator_app.py
,gemini_interface.py
) andrequirements.txt
into a dedicated directory. - Navigate there:
cd /path/to/your/app_directory
- Save the Streamlit Python scripts (
-
Create Virtual Environment & Install Dependencies (using
uv
):uv
combines environment creation and package installation.# Create a virtual environment named .venv in the current directory uv venv # Activate the environment (syntax depends on shell) # Linux/macOS (bash/zsh): source .venv/bin/activate # Windows (Command Prompt): # .venv\Scripts\activate.bat # Windows (PowerShell): # .venv\Scripts\Activate.ps1 # Install dependencies from requirements.txt using uv uv pip install -r requirements.txt
- (Alternatively, without
requirements.txt
:uv pip install streamlit files-to-prompt google-generativeai
) - (Using standard pip/venv:
python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt
)
-
Install
llama-parse
CLI Tool:npm install -g llama-parse-cli
(You might needsudo
or specific npm configurations).- Verify:
llama-parse --version
- If not found, ensure npm global bin directory is in your system
PATH
and restart your terminal.
-
Set Google AI API Key Environment Variable:
-
Set
GEMINI_API_KEY
before launching Streamlit. Choose the command for your terminal session:Linux / macOS (.bashrc, .zshrc, etc.):
export GEMINI_API_KEY="YOUR_API_KEY_HERE"
Windows (Command Prompt - current session only):
set GEMINI_API_KEY=YOUR_API_KEY_HERE
Windows (PowerShell - current session only):
$env:GEMINI_API_KEY = 'YOUR_API_KEY_HERE'
-
Replace
YOUR_API_KEY_HERE
with your key. For persistence, add to your shell profile or system variables.
-
-
Authenticate
llama-parse
(CRITICAL ONE-TIME STEP):- Links the CLI tool to your LlamaCloud account.
- Run manually in your terminal:
llama-parse auth
- Enter your LlamaCloud API key (starts with
llx-
). Creates~/.llama-parse/config.json
.
-
Delete
GEMINI_API_KEY.py
(Cleanup):- If you previously had this file, delete it.
- Prepare Input Folders: Create/populate directories (e.g.,
pdfs_to_parse
,txt_files
) or configure paths in sidebar. - Activate Environment & Run App:
- Open terminal, navigate to the app directory.
- Activate the virtual environment: (e.g.,
source .venv/bin/activate
) - Set Environment Variable: Ensure
GEMINI_API_KEY
is set for the current session (Setup Step 4). - Launch:
streamlit run context_generator_app.py
- Open the provided local URL in your browser.
- Configure Settings (Sidebar): Adjust paths if needed. Verify directory status.
- Manage Files (Tabs 2 & 3 - Optional): Upload/view/delete source files.
- Select Processing Mode (Tab 1): Choose
TXT only
,PDF only
, orBoth
. - Generate Context (Tab 1): Click "Generate Context File". Monitor progress.
- Review Context Output (Tab 1): View generated file path and preview content.
- Suggest System Prompt (Tab 1 - Optional): Click "✨ Suggest System Prompt". View suggested persona prompt in the text area.
- Use Output: Copy prompt/context from UI or the generated output file.
- Error:
LlamaParse Auth Missing...
: Runllama-parse auth
manually (Setup Step 5). Check~/.llama-parse/config.json
. - Error:
'llama-parse'/'files-to-prompt' command not found
: Check tool installation and systemPATH
. Restart terminal. Verify with--version
. - Error:
uv: command not found
: Ensureuv
is installed and its location is in PATH (Setup Step 2 / Prerequisites). - Error:
Configuration Error: Environment variable 'GEMINI_API_KEY' not found...
: Ensure variable set beforestreamlit run
. Check name/value. Check session persistence. - Error:
API Error: Invalid Google AI API Key...
: Verify key value in environment variable. - Error:
API Error: Quota exceeded...
: Check Google Cloud Gemini API usage/limits. - Error:
API Error: Model '...' not found or permission denied...
: Check model name (gemini-1.5-pro-latest
) and API key permissions. - Error:
Content generation blocked due to safety filters...
: Google safety filters triggered. Review input/output content. - PDF Parsing Failures: Check internet, LlamaCloud status/key, PDF file validity/complexity. Check UI error expander.
- Permission Errors: Check read/write permissions for input/output directories for the user running Streamlit.
streamlit
: (Python - viauv pip
orpip
) Web application framework.files-to-prompt
: (Python - viauv pip
orpip
) File gathering and formatting.google-generativeai
: (Python - viauv pip
orpip
) Google AI SDK for Python (Gemini).uv
: (Installer - viacurl
/pipx
/etc.) Recommended package installer/venv manager.llama-parse-cli
: (Node.js - vianpm install -g
) CLI for LlamaParse PDF processing. Requires manualllama-parse auth
.