Skip to content

Ollama's API proxy server that captures /api/chat responses and removes <think></think> streamed content

License

Notifications You must be signed in to change notification settings

vhanla/deepseek-r1-unthink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

deepseek-r1-unthink

Ollama's API proxy server that captures /api/chat and /api/generate responses and removes streamed content, so we get rid of the "<THINK>-ing process" specially on DeepSeek-R1's generated content.

e.g.
imagen
Zed's settings.json demo in action, notice it doesn't show the <think></think> content anymore.

Ollama Proxy Server

A Flask-based proxy server for Ollama that provides enhanced streaming capabilities and thinking tag handling. This proxy sits between your client application and the Ollama server, filtering and processing the response stream.

Features

  • Proxies requests to Ollama server
  • Handles streaming responses efficiently
  • Processes special thinking tags ( and ) to filter internal processing
  • Maintains CORS support for cross-origin requests
  • Supports both chat and generate endpoints
  • Cleans up response formatting and removes excessive newlines

Prerequisites

  • Python 3.7+
  • Ollama server running locally or remotely
  • pip (Python package manager)

Installation

  1. Clone this repository:
git clone https://github.com/yourusername/ollama-proxy.git
cd ollama-proxy
  1. Install required dependencies:
pip install flask flask-cors requests

Configuration

The proxy server uses two main configuration variables at the top of the script:

OLLAMA_SERVER = "http://localhost:11434"  # Your Ollama server address
PROXY_PORT = 11435                        # Port for this proxy server

Modify these values in the code to match your setup if needed.

Usage

  1. Start the proxy server:
python app.py
  1. The server will start on port 11435 (default) and forward requests to your Ollama server
  2. Use the proxy in your applications by pointing them to:
http://localhost:11435

API Endpoints

The proxy server supports the following Ollama endpoints:

  • /api/generate - For text generation
  • /api/chat - For chat interactions

All other Ollama endpoints are proxied as-is.

Thinking Tag Processing

The proxy server handles special thinking tags in the response stream:

  • Content between and tags is filtered out
  • Content before and after is preserved
  • Handles cases where tags may appear in any order or be unpaired

Example:

Input stream: "Hello <think>processing request</think> World!"
Output stream: "Hello World!"

CORS Support

The server includes CORS support with the following configuration:

  • Allows all origins (*)
  • Supports GET, POST, and OPTIONS methods
  • Allows all headers

Error Handling

Invalid JSON responses are forwarded as-is Non-content messages (like 'done' signals) are preserved Missing or malformed thinking tags are handled gracefully HTTP errors from Ollama are properly proxied

Development

The server includes debug mode by default when running directly. To modify this behavior, change the debug parameter in:

app.run(port=PROXY_PORT, debug=True)

License

MIT Licence

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Support

For issues, questions, or contributions, please:

  1. Check existing GitHub issues
  2. Create a new issue if needed
  3. Provide as much context as possible

Acknowledgments

  • Ollama team for the base server
  • Contributors to Flask and related packages

About

Ollama's API proxy server that captures /api/chat responses and removes <think></think> streamed content

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages