Skip to content

Commit

Permalink
Merge pull request #133 from Mac0q/jiaxu_dev
Browse files Browse the repository at this point in the history
append claude method
  • Loading branch information
vyokky authored Oct 31, 2024
2 parents 0334363 + d4e16d8 commit 6d12108
Show file tree
Hide file tree
Showing 8 changed files with 182 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend th
- 📅 2024-06-25: **New Release for v0.2.1!** We are excited to announce the release of version 0.2.1! This update includes several new features and improvements:
1. **HostAgent Refactor:** We've refactored the HostAgent to enhance its efficiency in managing AppAgents within UFO.
2. **Evaluation Agent:** Introducing an evaluation agent that assesses task completion and provides real-time feedback.
3. **Google Gemini Support:** UFO now supports Google Gemini as the inference engine. Refer to our detailed guide in [documentation](https://microsoft.github.io/UFO/supported_models/gemini/).
3. **Google Gemini && Claude Support:** UFO now supports Google Gemini and Cluade as the inference engine. Refer to our detailed guide in [Gemini documentation](https://microsoft.github.io/UFO/supported_models/gemini/) or [Claude documentation](https://microsoft.github.io/UFO/supported_models/claude/).
4. **Customized User Agents:** Users can now create customized agents by simply answering a few questions.
- 📅 2024-05-21: We have reached 5K stars!✨
- 📅 2024-05-08: **New Release for v0.1.1!** We've made some significant updates! Previously known as AppAgent and ActAgent, we've rebranded them to HostAgent and AppAgent to better align with their functionalities. Explore the latest enhancements:
Expand Down
29 changes: 29 additions & 0 deletions documents/docs/supported_models/claude.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Google Gemini

## Step 1
To use the Claude API, you need to create an account on the [Claude website](https://www.anthropic.com/) and access the API key.

## Step 2
You may need to install additional dependencies to use the Claude API. You can install the dependencies using the following command:

```bash
pip install -U anthropic==0.37.1
```

## Step 3
Configure the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the Claude API. The following is an example configuration for the Claude API:

```yaml
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
API_TYPE: "Claude" ,
API_KEY: "YOUR_KEY",
API_MODEL: "YOUR_MODEL"
```
!!! tip
If you set `VISUAL_MODE` to `True`, make sure the `API_MODEL` supports visual inputs.
!!! tip
`API_MODEL` is the model name of Claude LLM API. You can find the model name in the [Claude LLM model](https://www.anthropic.com/##anthropic-api) list.

## Step 4
After configuring the `HOST_AGENT` and `APP_AGENT` with the Claude API, you can start using UFO to interact with the Claude API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
2 changes: 1 addition & 1 deletion documents/docs/supported_models/gemini.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ API_MODEL: "YOUR_MODEL"
`API_MODEL` is the model name of Gemini LLM API. You can find the model name in the [Gemini LLM model](https://ai.google.dev/gemini-api) list. If you meet the `429` Resource has been exhausted (e.g. check quota)., it may because the rate limit of your Gemini API.

## Step 4
After configuring the `HOST_AGENT` and `APP_AGENT` with the OpenAI API, you can start using UFO to interact with the Gemini API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
After configuring the `HOST_AGENT` and `APP_AGENT` with the Gemini API, you can start using UFO to interact with the Gemini API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
1 change: 1 addition & 0 deletions documents/docs/supported_models/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Please refer to the following sections for more information on the supported mod
| `OPENAI` | [OpenAI API](./openai.md) |
| `Azure OpenAI (AOAI)` | [Azure OpenAI API](./azure_openai.md) |
| `Gemini` | [Gemini API](./gemini.md) |
| `Claude` | [Claude API](./claude.md) |
| `QWEN` | [QWEN API](./qwen.md) |
| `Ollama` | [Ollama API](./ollama.md) |
| `Custom` | [Custom API](./custom_model.md) |
Expand Down
17 changes: 17 additions & 0 deletions model_worker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,23 @@ pip install -U google-generativeai==0.7.0
NOTE: `API_MODEL` is the model name of Gemini LLM API.
You can find the model name in the [Gemini LLM model list](https://ai.google.dev/gemini-api).
If you meet the `429 Resource has been exhausted (e.g. check quota).`, it may because the rate limit of your Gemini API.
### If you use Claude as the Agent

1. Create an account on [Claude](https://www.anthropic.com/) and get your API key.
2. Install the required packages anthropic or install the `requirement.txt` with uncommenting the Claude.
```bash
pip install -U anthropic==0.37.1
```
3. Add following configuration to `config.yaml`:
```json showLineNumbers
{
"API_TYPE": "claude" ,
"API_KEY": "YOUR_KEY",
"API_MODEL": "YOUR_MODEL"
}
```
NOTE: `API_MODEL` is the model name of Claude LLM API.
You can find the model name in the [Claude LLM model list](https://www.anthropic.com/##anthropic-api).

### If you use QWEN as the Agent

Expand Down
3 changes: 3 additions & 0 deletions ufo/config/config_prices.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ PRICES: {
"gemini/gemini-1.5-flash": {"input": 0.00035, "output": 0.00105},
"gemini/gemini-1.5-pro": {"input": 0.0035, "output": 0.0105},
"gemini/gemini-1.0-pro": {"input": 0.0005, "output": 0.0015},
"claude/claude-3-5-sonnet-20241022": {"input": 0.0003, "output": 0.0015},
"claude/claude-3-5-sonnet": {"input": 0.0003, "output": 0.0015},
"claude/claude-3-5-opus": {"input": 0.0015, "output": 0.0075},
}


3 changes: 3 additions & 0 deletions ufo/llm/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ def get_service(name: str) -> "BaseService":
"qwen": "QwenService",
"ollama": "OllamaService",
"gemini": "GeminiService",
"claude": "ClaudeService",
"placeholder": "PlaceHolderService",
}
service_name = service_map.get(name, None)
Expand Down Expand Up @@ -67,6 +68,8 @@ def get_cost_estimator(
name = str("qwen/" + model)
elif api_type.lower() == "gemini":
name = str("gemini/" + model)
elif api_type.lower() == "claude":
name = str("claude/" + model)

if name in prices:
cost = (
Expand Down
127 changes: 127 additions & 0 deletions ufo/llm/claude.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
import re
import time
from typing import Any, Dict, List, Optional, Tuple

import anthropic
from PIL import Image

from ufo.llm.base import BaseService
from ufo.utils import print_with_color


class ClaudeService(BaseService):
"""
A service class for Claude models.
"""

def __init__(self, config: Dict[str, Any], agent_type: str):
"""
Initialize the Gemini service.
:param config: The configuration.
:param agent_type: The agent type.
"""
self.config_llm = config[agent_type]
self.config = config
self.model = self.config_llm["API_MODEL"]
self.prices = self.config["PRICES"]
self.max_retry = self.config["MAX_RETRY"]
self.api_type = self.config_llm["API_TYPE"].lower()
self.client = anthropic.Anthropic(api_key=self.config_llm["API_KEY"])

def chat_completion(
self,
messages: List[Dict[str, str]],
n: int = 1,
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
top_p: Optional[float] = None,
**kwargs: Any,
) -> Any:
"""
Generates completions for a given list of messages.
:param messages: The list of messages to generate completions for.
:param n: The number of completions to generate for each message.
:param temperature: Controls the randomness of the generated completions. Higher values (e.g., 0.8) make the completions more random, while lower values (e.g., 0.2) make the completions more focused and deterministic. If not provided, the default value from the model configuration will be used.
:param max_tokens: The maximum number of tokens in the generated completions. If not provided, the default value from the model configuration will be used.
:param top_p: Controls the diversity of the generated completions. Higher values (e.g., 0.8) make the completions more diverse, while lower values (e.g., 0.2) make the completions more focused. If not provided, the default value from the model configuration will be used.
:param kwargs: Additional keyword arguments to be passed to the underlying completion method.
:return: A list of generated completions for each message and the cost set to be None.
"""

temperature = (
temperature if temperature is not None else self.config["TEMPERATURE"]
)
top_p = top_p if top_p is not None else self.config["TOP_P"]
max_tokens = max_tokens if max_tokens is not None else self.config["MAX_TOKENS"]

responses = []
cost = 0.0
system_prompt, user_prompt = self.process_messages(messages)

for _ in range(n):
for _ in range(self.max_retry):
try:
response = self.client.messages.create(
max_tokens=max_tokens,
model=self.model,
system=system_prompt,
messages=user_prompt,
)
responses.append(response.content.text)
prompt_tokens = response.usage.input_tokens
completion_tokens = response.usage.output_tokens
cost += self.get_cost_estimator(
self.api_type,
self.model,
self.prices,
prompt_tokens,
completion_tokens,
)
except Exception as e:
print_with_color(f"Error making API request: {e}", "red")
try:
print_with_color(response, "red")
except:
_
time.sleep(3)
continue

return responses, cost

def process_messages(self, messages: List[Dict[str, str]]) -> Tuple[str, Dict]:
"""
Processes the messages to generate the system and user prompts.
:param messages: A list of message dictionaries.
:return: A tuple containing the system prompt (str) and the user prompt (dict).
"""

system_prompt = ""
user_prompt = {"role": "user", "content": []}
if isinstance(messages, dict):
messages = [messages]
for message in messages:
if message["role"] == "system":
system_prompt = message["content"]
else:
for content in message["content"]:
if content["type"] == "text":
user_prompt["content"].append(content)
elif content["type"] == "image_url":
data_url = content["image_url"]["url"]
match = re.match(r"data:(.*?);base64,(.*)", data_url)
if match:
media_type = match.group(1)
base64_data = match.group(2)
user_prompt["content"].append(
{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type,
"base64_data": base64_data,
},
}
)
else:
raise ValueError("Invalid image URL")
return system_prompt, user_prompt

0 comments on commit 6d12108

Please # to comment.