Merge pull request #133 from Mac0q/jiaxu_dev

append claude method
microsoft · Oct 31, 2024 · 6d12108 · 6d12108
2 parents 0334363 + d4e16d8
commit 6d12108
Show file tree

Hide file tree

Showing 8 changed files with 182 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -41,7 +41,7 @@ Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend th
 - 📅 2024-06-25: **New Release for v0.2.1!**  We are excited to announce the release of version 0.2.1! This update includes several new features and improvements:
     1. **HostAgent Refactor:** We've refactored the HostAgent to enhance its efficiency in managing AppAgents within UFO.
     2. **Evaluation Agent:** Introducing an evaluation agent that assesses task completion and provides real-time feedback.
-    3. **Google Gemini Support:** UFO now supports Google Gemini as the inference engine. Refer to our detailed guide in [documentation](https://microsoft.github.io/UFO/supported_models/gemini/).
+    3. **Google Gemini && Claude Support:** UFO now supports Google Gemini and Cluade as the inference engine. Refer to our detailed guide in [Gemini documentation](https://microsoft.github.io/UFO/supported_models/gemini/) or [Claude documentation](https://microsoft.github.io/UFO/supported_models/claude/).
     4. **Customized User Agents:** Users can now create customized agents by simply answering a few questions.
 - 📅 2024-05-21: We have reached 5K stars!✨
 - 📅 2024-05-08: **New Release for v0.1.1!** We've made some significant updates! Previously known as AppAgent and ActAgent, we've rebranded them to HostAgent and AppAgent to better align with their functionalities. Explore the latest enhancements:

diff --git a/documents/docs/supported_models/claude.md b/documents/docs/supported_models/claude.md
@@ -0,0 +1,29 @@
+# Google Gemini
+
+## Step 1
+To use the Claude API, you need to create an account on the [Claude website](https://www.anthropic.com/) and access the API key.
+
+## Step 2
+You may need to install additional dependencies to use the Claude API. You can install the dependencies using the following command:
+
+```bash
+pip install -U anthropic==0.37.1
+```
+
+## Step 3
+Configure the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the Claude API. The following is an example configuration for the Claude API:
+
+```yaml
+VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "Claude" ,
+API_KEY: "YOUR_KEY",  
+API_MODEL: "YOUR_MODEL"
+```
+
+!!! tip
+    If you set `VISUAL_MODE` to `True`, make sure the `API_MODEL` supports visual inputs.
+!!! tip
+    `API_MODEL` is the model name of Claude LLM API. You can find the model name in the [Claude LLM model](https://www.anthropic.com/##anthropic-api) list. 
+
+## Step 4
+After configuring the `HOST_AGENT` and `APP_AGENT` with the Claude API, you can start using UFO to interact with the Claude API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
diff --git a/documents/docs/supported_models/gemini.md b/documents/docs/supported_models/gemini.md
@@ -26,4 +26,4 @@ API_MODEL: "YOUR_MODEL"
     `API_MODEL` is the model name of Gemini LLM API. You can find the model name in the [Gemini LLM model](https://ai.google.dev/gemini-api) list. If you meet the `429` Resource has been exhausted (e.g. check quota)., it may because the rate limit of your Gemini API.
 
 ## Step 4
-After configuring the `HOST_AGENT` and `APP_AGENT` with the OpenAI API, you can start using UFO to interact with the Gemini API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
+After configuring the `HOST_AGENT` and `APP_AGENT` with the Gemini API, you can start using UFO to interact with the Gemini API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
diff --git a/documents/docs/supported_models/overview.md b/documents/docs/supported_models/overview.md
@@ -9,6 +9,7 @@ Please refer to the following sections for more information on the supported mod
 | `OPENAI` | [OpenAI API](./openai.md) |
 | `Azure OpenAI (AOAI)` | [Azure OpenAI API](./azure_openai.md) |
 | `Gemini` | [Gemini API](./gemini.md) |
+| `Claude` | [Claude API](./claude.md) |
 | `QWEN` | [QWEN API](./qwen.md) |
 | `Ollama` | [Ollama API](./ollama.md) |
 | `Custom` | [Custom API](./custom_model.md) |

diff --git a/model_worker/README.md b/model_worker/README.md
@@ -19,6 +19,23 @@ pip install -U google-generativeai==0.7.0
 NOTE: `API_MODEL` is the model name of Gemini LLM API. 
 You can find the model name in the [Gemini LLM model list](https://ai.google.dev/gemini-api).
 If you meet the `429 Resource has been exhausted (e.g. check quota).`, it may because the rate limit of your Gemini API.
+### If you use Claude as the Agent
+
+1. Create an account on [Claude](https://www.anthropic.com/) and get your API key.
+2. Install the required packages anthropic or install the `requirement.txt` with uncommenting the Claude.
+```bash
+pip install -U anthropic==0.37.1
+```
+3. Add following configuration to `config.yaml`:
+```json showLineNumbers
+{
+    "API_TYPE": "claude" ,
+    "API_KEY": "YOUR_KEY",  
+    "API_MODEL": "YOUR_MODEL"
+}
+```
+NOTE: `API_MODEL` is the model name of Claude LLM API. 
+You can find the model name in the [Claude LLM model list](https://www.anthropic.com/##anthropic-api).
 
 ### If you use QWEN as the Agent
 

diff --git a/ufo/config/config_prices.yaml b/ufo/config/config_prices.yaml
@@ -39,6 +39,9 @@ PRICES: {
     "gemini/gemini-1.5-flash": {"input": 0.00035, "output": 0.00105},
     "gemini/gemini-1.5-pro": {"input": 0.0035, "output": 0.0105},
     "gemini/gemini-1.0-pro": {"input": 0.0005, "output": 0.0015},
+    "claude/claude-3-5-sonnet-20241022": {"input": 0.0003, "output": 0.0015},
+    "claude/claude-3-5-sonnet": {"input": 0.0003, "output": 0.0015},
+    "claude/claude-3-5-opus": {"input": 0.0015, "output": 0.0075},
 }
 
 
diff --git a/ufo/llm/base.py b/ufo/llm/base.py
@@ -29,6 +29,7 @@ def get_service(name: str) -> "BaseService":
             "qwen": "QwenService",
             "ollama": "OllamaService",
             "gemini": "GeminiService",
+            "claude": "ClaudeService",
             "placeholder": "PlaceHolderService",
         }
         service_name = service_map.get(name, None)
@@ -67,6 +68,8 @@ def get_cost_estimator(
             name = str("qwen/" + model)
         elif api_type.lower() == "gemini":
             name = str("gemini/" + model)
+        elif api_type.lower() == "claude":
+            name = str("claude/" + model)
 
         if name in prices:
             cost = (

diff --git a/ufo/llm/claude.py b/ufo/llm/claude.py
@@ -0,0 +1,127 @@
+import re
+import time
+from typing import Any, Dict, List, Optional, Tuple
+
+import anthropic
+from PIL import Image
+
+from ufo.llm.base import BaseService
+from ufo.utils import print_with_color
+
+
+class ClaudeService(BaseService):
+    """
+    A service class for Claude models.
+    """
+
+    def __init__(self, config: Dict[str, Any], agent_type: str):
+        """
+        Initialize the Gemini service.
+        :param config: The configuration.
+        :param agent_type: The agent type.
+        """
+        self.config_llm = config[agent_type]
+        self.config = config
+        self.model = self.config_llm["API_MODEL"]
+        self.prices = self.config["PRICES"]
+        self.max_retry = self.config["MAX_RETRY"]
+        self.api_type = self.config_llm["API_TYPE"].lower()
+        self.client = anthropic.Anthropic(api_key=self.config_llm["API_KEY"])
+
+    def chat_completion(
+        self,
+        messages: List[Dict[str, str]],
+        n: int = 1,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        **kwargs: Any,
+    ) -> Any:
+        """
+        Generates completions for a given list of messages.
+        :param messages: The list of messages to generate completions for.
+        :param n: The number of completions to generate for each message.
+        :param temperature: Controls the randomness of the generated completions. Higher values (e.g., 0.8) make the completions more random, while lower values (e.g., 0.2) make the completions more focused and deterministic. If not provided, the default value from the model configuration will be used.
+        :param max_tokens: The maximum number of tokens in the generated completions. If not provided, the default value from the model configuration will be used.
+        :param top_p: Controls the diversity of the generated completions. Higher values (e.g., 0.8) make the completions more diverse, while lower values (e.g., 0.2) make the completions more focused. If not provided, the default value from the model configuration will be used.
+        :param kwargs: Additional keyword arguments to be passed to the underlying completion method.
+        :return: A list of generated completions for each message and the cost set to be None.
+        """
+
+        temperature = (
+            temperature if temperature is not None else self.config["TEMPERATURE"]
+        )
+        top_p = top_p if top_p is not None else self.config["TOP_P"]
+        max_tokens = max_tokens if max_tokens is not None else self.config["MAX_TOKENS"]
+
+        responses = []
+        cost = 0.0
+        system_prompt, user_prompt = self.process_messages(messages)
+
+        for _ in range(n):
+            for _ in range(self.max_retry):
+                try:
+                    response = self.client.messages.create(
+                        max_tokens=max_tokens,
+                        model=self.model,
+                        system=system_prompt,
+                        messages=user_prompt,
+                    )
+                    responses.append(response.content.text)
+                    prompt_tokens = response.usage.input_tokens
+                    completion_tokens = response.usage.output_tokens
+                    cost += self.get_cost_estimator(
+                        self.api_type,
+                        self.model,
+                        self.prices,
+                        prompt_tokens,
+                        completion_tokens,
+                    )
+                except Exception as e:
+                    print_with_color(f"Error making API request: {e}", "red")
+                    try:
+                        print_with_color(response, "red")
+                    except:
+                        _
+                    time.sleep(3)
+                    continue
+
+        return responses, cost
+
+    def process_messages(self, messages: List[Dict[str, str]]) -> Tuple[str, Dict]:
+        """
+        Processes the messages to generate the system and user prompts.
+        :param messages: A list of message dictionaries.
+        :return: A tuple containing the system prompt (str) and the user prompt (dict).
+        """
+
+        system_prompt = ""
+        user_prompt = {"role": "user", "content": []}
+        if isinstance(messages, dict):
+            messages = [messages]
+        for message in messages:
+            if message["role"] == "system":
+                system_prompt = message["content"]
+            else:
+                for content in message["content"]:
+                    if content["type"] == "text":
+                        user_prompt["content"].append(content)
+                    elif content["type"] == "image_url":
+                        data_url = content["image_url"]["url"]
+                        match = re.match(r"data:(.*?);base64,(.*)", data_url)
+                        if match:
+                            media_type = match.group(1)
+                            base64_data = match.group(2)
+                            user_prompt["content"].append(
+                                {
+                                    "type": "image",
+                                    "source": {
+                                        "type": "base64",
+                                        "media_type": media_type,
+                                        "base64_data": base64_data,
+                                    },
+                                }
+                            )
+                        else:
+                            raise ValueError("Invalid image URL")
+        return system_prompt, user_prompt