Ollama's API proxy server that captures /api/chat
and /api/generate
responses and
removes streamed content, so we get rid of
the "<THINK>-ing process" specially on DeepSeek-R1's generated content.
e.g. |
---|
![]() |
Zed's settings.json demo in action, notice it doesn't show the <think></think> content anymore. |
A Flask-based proxy server for Ollama that provides enhanced streaming capabilities and thinking tag handling. This proxy sits between your client application and the Ollama server, filtering and processing the response stream.
- Proxies requests to Ollama server
- Handles streaming responses efficiently
- Processes special thinking tags ( and ) to filter internal processing
- Maintains CORS support for cross-origin requests
- Supports both chat and generate endpoints
- Cleans up response formatting and removes excessive newlines
- Python 3.7+
- Ollama server running locally or remotely
- pip (Python package manager)
- Clone this repository:
git clone https://github.com/yourusername/ollama-proxy.git
cd ollama-proxy
- Install required dependencies:
pip install flask flask-cors requests
The proxy server uses two main configuration variables at the top of the script:
OLLAMA_SERVER = "http://localhost:11434" # Your Ollama server address
PROXY_PORT = 11435 # Port for this proxy server
Modify these values in the code to match your setup if needed.
- Start the proxy server:
python app.py
- The server will start on port 11435 (default) and forward requests to your Ollama server
- Use the proxy in your applications by pointing them to:
http://localhost:11435
The proxy server supports the following Ollama endpoints:
- /api/generate - For text generation
- /api/chat - For chat interactions
All other Ollama endpoints are proxied as-is.
The proxy server handles special thinking tags in the response stream:
- Content between and tags is filtered out
- Content before and after is preserved
- Handles cases where tags may appear in any order or be unpaired
Example:
Input stream: "Hello <think>processing request</think> World!"
Output stream: "Hello World!"
The server includes CORS support with the following configuration:
- Allows all origins (*)
- Supports GET, POST, and OPTIONS methods
- Allows all headers
Invalid JSON responses are forwarded as-is Non-content messages (like 'done' signals) are preserved Missing or malformed thinking tags are handled gracefully HTTP errors from Ollama are properly proxied
The server includes debug mode by default when running directly. To modify this behavior, change the debug
parameter in:
app.run(port=PROXY_PORT, debug=True)
MIT Licence
- Fork the repository
- Create your feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add some amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request
For issues, questions, or contributions, please:
- Check existing GitHub issues
- Create a new issue if needed
- Provide as much context as possible
- Ollama team for the base server
- Contributors to Flask and related packages