Skip to content

Unstructured-IO/UNS-MCP

Repository files navigation

Unstructured API MCP Server

An MCP server implementation for interacting with the Unstructured API. This server provides tools to list sources and workflows.

Setup

  1. Install dependencies:
  • uv add "mcp[cli]"
  • uv pip install --upgrade unstructured-client python-dotenv

or use uv sync.

  1. Set your Unstructured API key as an environment variable.
    • Create a .env file in the root directory, and add a line with your key: UNSTRUCTURED_API_KEY="YOUR_KEY"

To test in local, any working key that pointing to prod env would work. However, to be able to return valid results from client's side (e.g, Claude for Desktop), your personal key that is fetched from https://platform.unstructured.io/app/account/api-keys is needed.

Running the Server

Using the MCP CLI:

mcp run uns_mcp/server.py

or:

uv run uns_mcp/server.py

Available Tools

Tool Description
list_sources Lists available sources from the Unstructured API.
get_source_info Get detailed information about a specific source connector.
create_[connector]_source Create a source connector. Currently, we have s3/google drive/azure connectors (more to come!)
update_[connector]_source Update an existing source connector by params.
delete_[connector]_source Delete a source connector by source id.
list_destinations Lists available destinations from the Unstructured API.
get_destination_info Get detailed info about a specific destination connector. Currently, we have s3/weaviate/astra/neo4j/mongo DB (more to come!)
create_[connector]_destination Create a destination connector by params.
update_[connector]_destination Update an existing destination connector by destination id.
delete_[connector]_destination Delete a destination connector by destination id.
list_workflows Lists workflows from the Unstructured API.
get_workflow_info Get detailed information about a specific workflow.
create_workflow Create a new workflow with source, destination id, etc.
run_workflow Run a specific workflow with workflow id
update_workflow Update an existing workflow by params.
delete_workflow Delete a specific workflow by id.
list_jobs Lists jobs for a specific workflow from the Unstructured API.
get_job_info Get detailed information about a specific job by job id.
cancel_job Delete a specific job by id.

To use the tool that creates/updates/deletes a connector, the credentials for that specific connector must be defined in your .env file. Below is the list of credentials for the connectors we support:

Credential Name Description
ANTHROPIC_API_KEY required to run the minimal_client to interact with our server.
AWS_KEY, AWS_SECRET required to create S3 connector via uns-mcp server, see how in documentation and here
WEAVIATE_CLOUD_API_KEY required to create Weaviate vector db connector, see how in documentation
FIRECRAWL_API_KEY required to use Firecrawl tools in external/firecrawl.py, # on Firecrawl and get an API key.
ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT required to create Astradb connector via uns-mcp server, see how in documentation
AZURE_CONNECTION_STRING required option 1 to create Azure connector via uns-mcp server, see how in documentation
AZURE_ACCOUNT_NAME+AZURE_ACCOUNT_KEY required option 2 to create Azure connector via uns-mcp server, see how in documentation
AZURE_ACCOUNT_NAME+AZURE_SAS_TOKEN required option 3 to create Azure connector via uns-mcp server, see how in documentation
NEO4J_PASSWORD required to create Neo4j connector via uns-mcp server, see how in documentation
MONGO_DB_CONNECTION_STRING required to create Mongodb connector via uns-mcp server, see how in documentation
GOOGLEDRIVE_SERVICE_ACCOUNT_KEY a string value. The original server account key (follow documentation) is in json file, run `cat /path/to/google_service_account_key.json
DATABRICKS_CLIENT_ID,DATABRICKS_CLIENT_SECRET required to create Databricks volume/delta table connector via uns-mcp server, see how in documentation and here
ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_CRED,ONEDRIVE_TENANT_ID required to create One Drive connector via uns-mcp server, see how in documentation
LOG_LEVEL Used to set logging level for our minimal_client, e.g. set to ERROR to get everything
CONFIRM_TOOL_USE set to true so that minimal_client can confirm execution before each tool call
DEBUG_API_REQUESTS set to true so that uns_mcp/server.py can output request parameters for better debugging

Firecrawl Source

Firecrawl is a web crawling API that provides two main capabilities in our MCP:

  1. HTML Content Retrieval: Using invoke_firecrawl_crawlhtml to start crawl jobs and check_crawlhtml_status to monitor them
  2. LLM-Optimized Text Generation: Using invoke_firecrawl_llmtxt to generate text and check_llmtxt_status to retrieve results

How Firecrawl works:

Web Crawling Process:

  • Starts with a specified URL and analyzes it to identify links
  • Uses the sitemap if available; otherwise follows links found on the website
  • Recursively traverses each link to discover all subpages
  • Gathers content from every visited page, handling JavaScript rendering and rate limits
  • Jobs can be cancelled with cancel_crawlhtml_job if needed
  • Use this if you require all the info extracted into raw HTML, Unstructured's workflow cleans it up really well 😄

LLM Text Generation:

  • After crawling, extracts clean, meaningful text content from the crawled pages
  • Generates optimized text formats specifically formatted for large language models
  • Results are automatically uploaded to the specified S3 location
  • Note: LLM text generation jobs cannot be cancelled once started. The cancel_llmtxt_job function is provided for consistency but is not currently supported by the Firecrawl API.

Note: A FIRECRAWL_API_KEY environment variable must be set to use these functions.

Claude Desktop Integration

To install in Claude Desktop:

  1. Go to ~/Library/Application Support/Claude/ and create a claude_desktop_config.json.
  2. In that file add:
{
    "mcpServers":
    {
        "UNS_MCP":
        {
            "command": "ABSOLUTE/PATH/TO/.local/bin/uv",
            "args":
            [
                "--directory",
                "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp",
                "run",
                "server.py"
            ],
            "env":
            [
            "UNSTRUCTURED_API_KEY":"<your key>"
            ],
            "disabled": false
        }
    }
}
  1. Restart Claude Desktop.

  2. Example Issues seen from Claude Desktop.

    • You will see No destinations found when you query for a list of destination connectors. Check your API key in .env or in your config json, it needs to be your personal key in https://platform.unstructured.io/app/account/api-keys.

Debugging tools

Anthropic provides MCP Inspector tool to debug/test your MCP server. Run the following command to spin up a debugging UI. From there, you will be able to add environment variables (pointing to your local env) on the left pane. Include your personal API key there as env var. Go to tools, you can test out the capabilities you add to the MCP server.

mcp dev uns_mcp/server.py

If you need to log request call parameters to UnstructuredClient, set the environment variable DEBUG_API_REQUESTS=false. The logs are stored in a file with the format unstructured-client-{date}.log, which can be examined to debug request call parameters to UnstructuredClient functions.

Running locally minimal client

uv run python minimal_client/client.py uns_mcp/server.py

or

make local-client

Env variables to configure behavior of the client:

  • LOG_LEVEL="ERROR" # If you would like to hide outputs from the LLM and present clear messages for the user
  • CONFIRM_TOOL_USE='false' If you would like to disable the tool use confirmation before running it (True by default). BE MINDFUL about that option, as LLM can decide to purge all data from your account or run some expensive workflows; use only for development purposes.

Running locally minimal client, accessing local the MCP server over HTTP + SSE

The main difference here is it becomes easier to set breakpoints on the server side during development -- the client and server are decoupled.

# in one terminal, run the server:
uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080

or
make sse-server

# in another terminal, run the client:
uv run python minimal_client/client.py "http://127.0.0.1:8080/sse"

or
make sse-client

Hint: ctrl+c out of the client first, then the server. Otherwise the server appears to hang.

CHANGELOG.md

Any new developed features/fixes/enhancements will be added to CHANGELOG.md. 0.x.x-dev pre-release format is preferred before we bump to a stable version.

About

No description, website, or topics provided.

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published