Skip to content

Commit

Permalink
Merge pull request #128 from MightyGaga/pre-release
Browse files Browse the repository at this point in the history
add instantiation process
  • Loading branch information
vyokky authored Oct 29, 2024
2 parents 70d228c + 9318a74 commit 5f8178a
Show file tree
Hide file tree
Showing 35 changed files with 2,130 additions and 42 deletions.
16 changes: 0 additions & 16 deletions SUPPORT.md

This file was deleted.

9 changes: 9 additions & 0 deletions instantiation/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Ignore files
cache/
controls_cache/
tasks/*
!tasks/prefill
templates/word/*
logs/*
controller/utils/
config/config.yaml
219 changes: 219 additions & 0 deletions instantiation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
## Introduction of Instantiation

**The instantiation process aims to filter and modify instructions according to the current environment.**

By using this process, we can obtain clearer and more specific instructions, making them more suitable for the execution of the UFO.

## How to Use

### 1. Install Packages

You should install the necessary packages in the UFO root folder:

```bash
pip install -r requirements.txt
```

### 2. Configure the LLMs

Before using the instantiation section, you need to provide your LLM configurations in `config.yaml` and `config_dev.yaml` located in the `instantiation/config` folder.

- `config_dev.yaml` specifies the paths of relevant files and contains default settings. The match strategy for the control filter supports options: `'contains'`, `'fuzzy'`, and `'regex'`, allowing flexible matching between application windows and target files.

- `config.yaml` stores the agent information. You should copy the `config.yaml.template` file and fill it out according to the provided hints.

You will configure the prefill agent and the filter agent individually. The prefill agent is used to prepare the task, while the filter agent evaluates the quality of the prefilled task. You can choose different LLMs for each.

**BE CAREFUL!** If you are using GitHub or other open-source tools, do not expose your `config.yaml` online, as it contains your private keys.

Once you have filled out the template, rename it to `config.yaml` to complete the LLM configuration.

### 3. Prepare Files

Certain files need to be prepared before running the task.

#### 3.1. Tasks as JSON

The tasks that need to be instantiated should be organized in a folder of JSON files, with the default folder path set to `instantiation/tasks`. This path can be changed in the `instantiation/config/config.yaml` file, or you can specify it in the terminal, as mentioned in **4. Start Running**. For example, a task stored in `instantiation/tasks/prefill/` may look like this:

```json
{
// The app you want to use
"app": "word",
// A unique ID to distinguish different tasks
"unique_id": "1",
// The task and steps to be instantiated
"task": "Type 'hello' and set the font type to Arial",
"refined_steps": [
"Type 'hello'",
"Set the font to Arial"
]
}
```

#### 3.2. Templates and Descriptions

You should place an app file as a reference for instantiation in a folder named after the app.

For example, if you have `template1.docx` for Word, it should be located at `instantiation/templates/word/template1.docx`.

Additionally, for each app folder, there should be a `description.json` file located at `instantiation/templates/word/description.json`, which describes each template file in detail. It may look like this:

```json
{
"template1.docx": "A document with a rectangle shape",
"template2.docx": "A document with a line of text",
"template3.docx": "A document with a chart"
}
```

If a `description.json` file is not present, one template file will be selected at random.

#### 3.3. Final Structure

Ensure the following files are in place:

- [X] JSON files to be instantiated
- [X] Templates as references for instantiation
- [X] Description file in JSON format

The structure of the files can be:

```bash
instantiation/
|
├── tasks/
│ ├── action_prefill/
│ │ ├── task1.json
│ │ ├── task2.json
│ │ └── task3.json
│ └── ...
|
├── templates/
│ ├── word/
│ │ ├── template1.docx
│ │ ├── template2.docx
│ │ ├── template3.docx
│ │ └── description.json
│ └── ...
└── ...
```

### 4. Start Running

Run the `instantiation/action_prefill.py` file in module mode. You can do this by typing the following command in the terminal:

```bash
python -m instantiation
```

You can use `--task` to specify the task folder you want to use; the default is `action_prefill`:

```bash
python -m instantiation --task your_task_folder_name
```

After the process is completed, a new folder named `prefill_instantiated` will be created alongside the original one. This folder will contain the instantiated task, which will look like:

```json
{
// A unique ID to distinguish different tasks
"unique_id": "1",
// The chosen template path
"instantial_template_path": "copied template file path",
// The instantiated task and steps
"instantiated_request": "Type 'hello' and set the font type to Arial in the Word document.",
"instantiated_plan": [
{
"step 1": "Select the target text 'text to edit'",
"controlLabel": "",
"controlText": "",
"function": "select_text",
"args": {
"text": "text to edit"
}
},
{
"step 2": "Type 'hello'",
"controlLabel": "101",
"controlText": "Edit",
"function": "type_keys",
"args": {
"text": "hello"
}
},
{
"step 3": "Select the typed text 'hello'",
"controlLabel": "",
"controlText": "",
"function": "select_text",
"args": {
"text": "hello"
}
},
{
"step 4": "Click the font dropdown",
"controlLabel": "",
"controlText": "Consolas",
"function": "click_input",
"args": {
"button": "left",
"double": false
}
},
{
"step 5": "Set the font to Arial",
"controlLabel": "",
"controlText": "Arial",
"function": "click_input",
"args": {
"button": "left",
"double": false
}
}
],
"result": {
"filter": "Drawing or writing a signature using the drawing tools in the Word desktop app is a task that can be executed locally within the application."
},
"execution_time": {
"choose_template": 10.650701761245728,
"prefill": 44.23913502693176,
"filter": 3.746831178665161,
"total": 58.63666796684265
}
}
```

Additionally, a `prefill_templates` folder will be created, which stores the copied chosen templates for each task.

## Workflow

There are three key steps in the instantiation process:

1. Choose a template file according to the specified app and instruction.
2. Prefill the task using the current screenshot.
3. Filter the established task.

#### 1. Choose Template File

Templates for your app must be defined and described in `instantiation/templates/app`. For instance, if you want to instantiate tasks for the Word application, place the relevant `.docx` files in `instantiation/templates/word`, along with a `description.json` file.

The appropriate template will be selected based on how well its description matches the instruction.

#### 2. Prefill the Task

After selecting the template file, it will be opened, and a screenshot will be taken. If the template file is currently in use, errors may occur.

The screenshot will be sent to the action prefill agent, which will return a modified task.

#### 3. Filter Task

The completed task will be evaluated by a filter agent, which will assess it and provide feedback. If the task is deemed a good instance, it will be saved in `instantiation/tasks/your_folder_name_instantiated/instances_pass/`; otherwise, it will be saved in `instantiation/tasks/your_folder_name_instantiated/instances_fail/`.

All encountered error messages and tracebacks are saved in `instantiation/tasks/your_folder_name_instantiated/instances_error/`.

## Notes

1. Users should be careful to save the original files while using this project; otherwise, the files will be closed when the app is shut down.

2. After starting the project, users should not close the app window while the program is taking screenshots.
7 changes: 7 additions & 0 deletions instantiation/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
from instantiation import instantiation

if __name__ == "__main__":
# Execute the main script
instantiation.main()
37 changes: 37 additions & 0 deletions instantiation/config/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

from ufo.config.config import Config


class Config(Config):
_instance = None

def __init__(self, config_path="instantiation/config/"):
"""
Initializes the Config class.
:param config_path: The path to the config file.
"""
self.config_data = self.load_config(config_path)

@staticmethod
def get_instance():
"""
Get the instance of the Config class.
:return: The instance of the Config class.
"""
if Config._instance is None:
Config._instance = Config()

return Config._instance

def optimize_configs(self, configs):
"""
Optimize the configurations.
:param configs: The configurations to optimize.
:return: The optimized configurations.
"""
self.update_api_base(configs, "PREFILL_AGENT")
self.update_api_base(configs, "FILTER_AGENT")

return configs
43 changes: 43 additions & 0 deletions instantiation/config/config.yaml.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# You will configure for the prefill agent and filter agent individualy.
# Prefill agent is used to prefill the task.
# Filter agent is to evaluate the prefill quality.

PREFILL_AGENT: {
VISUAL_MODE: True, # Whether to use the visual mode

API_TYPE: "azure_ad" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.
API_BASE: "https://cloudgpt-openai.azure-api.net/", # The the OpenAI API endpoint, "https://api.openai.com/v1/chat/completions" for the OpenAI API. As for the AAD, it should be your endpoints.
API_KEY: "YOUR_API_KEY", # The OpenAI API key
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4o-20240513", # The only OpenAI model by now that accepts visual input

###For the AOAI
API_DEPLOYMENT_ID: "gpt-4-0125-preview", # The deployment id for the AOAI API
### For Azure_AD
AAD_TENANT_ID: "YOUR_AAD_ID", # Set the value to your tenant id for the llm model
AAD_API_SCOPE: "openai", # Set the value to your scope for the llm model
AAD_API_SCOPE_BASE: "YOUR_AAD_API_SCOPE_BASE" # Set the value to your scope base for the llm model, whose format is API://YOUR_SCOPE_BASE, and the only need is the YOUR_SCOPE_BASE
}

FILTER_AGENT: {
VISUAL_MODE: False, # Whether to use the visual mode

API_TYPE: "azure_ad" , # The API type, "openai" for the OpenAI API, "aoai" for the Azure OpenAI.
API_BASE: "https://cloudgpt-openai.azure-api.net/", # The the OpenAI API endpoint, "https://api.openai.com/v1/chat/completions" for the OpenAI API. As for the aoai, it should be https://{your-resource-name}.openai.azure.com
API_KEY: "YOUR_API_KEY", # The aoai API key
API_VERSION: "2024-04-01-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4o-20240513", # The only OpenAI model by now that accepts visual input
API_DEPLOYMENT_ID: "gpt-4o-20240513-preview", # The deployment id for the AOAI API

### For Azure_AD
AAD_TENANT_ID: "YOUR_AAD_ID",
AAD_API_SCOPE: "openai", #"openai"
AAD_API_SCOPE_BASE: "YOUR_AAD_API_SCOPE_BASE", #API://YOUR_SCOPE_BASE
}

# For parameters
MAX_TOKENS: 2000 # The max token limit for the response completion
MAX_RETRY: 3 # The max retry limit for the response completion
TEMPERATURE: 0.0 # The temperature of the model: the lower the value, the more consistent the output of the model
TOP_P: 0.0 # The top_p of the model: the lower the value, the more conservative the output of the model
TIMEOUT: 60 # The call timeout(s), default is 10 minss
31 changes: 31 additions & 0 deletions instantiation/config/config_dev.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
version: 0.1

AOAI_DEPLOYMENT: "gpt-4-visual-preview" # Your AOAI deployment if apply
API_VERSION: "2024-02-15-preview" # "2024-02-15-preview" by default.
OPENAI_API_MODEL: "gpt-4-0125-preview" # The only OpenAI model by now that accepts visual input

CONTROL_BACKEND: "uia" # The backend for control action
CONTROL_LIST: ["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton", "DataItem", "Spinner"]
PRINT_LOG: False # Whether to print the log
LOG_LEVEL: "INFO" # The log level
MATCH_STRATEGY: "regex" # The match strategy for the control filter, support 'contains', 'fuzzy', 'regex'

PREFILL_PROMPT: "instantiation/controller/prompts/{mode}/prefill.yaml" # The prompt for the action prefill
FILTER_PROMPT: "instantiation/controller/prompts/{mode}/filter.yaml" # The prompt for the filter
PREFILL_EXAMPLE_PROMPT: "instantiation/controller/prompts/{mode}/prefill_example.yaml" # The prompt for the action prefill example
API_PROMPT: "ufo/prompts/share/lite/api.yaml" # The prompt for the API

# Exploration Configuration
TASKS_HUB: "instantiation/tasks" # The tasks hub for the exploration
TEMPLATE_PATH: "instantiation/templates" # The template path for the exploration

# For control filtering
CONTROL_FILTER_TYPE: [] # The list of control filter type, support 'TEXT', 'SEMANTIC', 'ICON'
CONTROL_FILTER_MODEL_SEMANTIC_NAME: "all-MiniLM-L6-v2" # The control filter model name of semantic similarity
CONTROL_EMBEDDING_CACHE_PATH: "instantiation/cache/" # The cache path for the control filter
CONTROL_FILTER_TOP_K_PLAN: 2 # The control filter effect on top k plans from UFO, default is 2

# log path
LOG_PATH: "instantiation/logs/{task}"
PREFILL_LOG_PATH: "instantiation/logs/{task}/prefill/"
FILTER_LOG_PATH: "instantiation/logs/{task}/filter/"
Loading

0 comments on commit 5f8178a

Please # to comment.