Skip to content

Commit

Permalink
Merge pull request #149 from microsoft/vyokky/dev
Browse files Browse the repository at this point in the history
Vyokky/dev
  • Loading branch information
vyokky authored Dec 16, 2024
2 parents f34fefe + 9ac33c8 commit 7da39f0
Show file tree
Hide file tree
Showing 12 changed files with 414 additions and 30 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,7 @@ scripts/*
!vectordb/docs/example/
!vectordb/demonstration/example.yaml

.vscode
.vscode

# Ignore the record files
tasks_status.json
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Both agents leverage the multi-modal capabilities of GPT-4V(o) to comprehend the

## 📢 News
- 📅 2024-12-13: We have a **New Release for v1.2.0!**! Checkout our new features and improvements:
1. **Large Action Model (LAM) Data Collection:** We have released the code and sample data for Large Action Model (LAM) data collection with UFO! Please checkout our [new paper](https://arxiv.org/abs/2412.07939), [code](dataflow/README.md) and [documentation](https://microsoft.github.io/UFO/dataflow/overview/) for more details.
1. **Large Action Model (LAM) Data Collection:** We have released the code and sample data for Large Action Model (LAM) data collection with UFO! Please checkout our [new paper](https://arxiv.org/abs/2412.10047), [code](dataflow/README.md) and [documentation](https://microsoft.github.io/UFO/dataflow/overview/) for more details.
2. **Bash Command Support:** HostAgent also support bash command now!
3. **Bug Fixes:** We have fixed some bugs, error handling, and improved the overall performance.
- 📅 2024-09-08: We have a **New Release for v1.1.0!**, to allows UFO to click on any region of the application and reduces its latency by up tp 1/3!
Expand Down
16 changes: 10 additions & 6 deletions dataflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

<div align="center">

[![arxiv](https://img.shields.io/badge/Paper-arXiv:202402.07939-b31b1b.svg)](https://arxiv.org/abs/2402.07939)&ensp;
[![arxiv](https://img.shields.io/badge/Paper-arXiv:2412.10047-b31b1b.svg)](https://arxiv.org/abs/2412.10047)&ensp;
![Python Version](https://img.shields.io/badge/Python-3776AB?&logo=python&logoColor=white-blue&label=3.10%20%7C%203.11)&ensp;
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)&ensp;
[![Documentation](https://img.shields.io/badge/Documentation-%230ABAB5?style=flat&logo=readthedocs&logoColor=black)](https://microsoft.github.io/UFO/dataflow/overview/)&ensp;
Expand All @@ -20,13 +20,17 @@

This repository contains the implementation of the **Data Collection** process for training the **Large Action Models** (LAMs) in the [**UFO**](https://arxiv.org/abs/2402.07939) project. The **Data Collection** process is designed to streamline task processing, ensuring that all necessary steps are seamlessly integrated from initialization to execution. This module is part of the [**UFO**](https://arxiv.org/abs/2402.07939) project.

If you find this project useful, please consider giving a star ⭐, and cite our paper:
If you find this project useful, please give a star ⭐, and consider to cite our paper:

```bibtex
@article{UFO2024,
title={Large Action Models: From Inception to Implementation},
author={Microsoft},
year={2024}
@misc{wang2024largeactionmodelsinception,
title={Large Action Models: From Inception to Implementation},
author={Lu Wang and Fangkai Yang and Chaoyun Zhang and Junting Lu and Jiaxu Qian and Shilin He and Pu Zhao and Bo Qiao and Ray Huang and Si Qin and Qisheng Su and Jiayi Ye and Yudi Zhang and Jian-Guang Lou and Qingwei Lin and Saravan Rajmohan and Dongmei Zhang and Qi Zhang},
year={2024},
eprint={2412.10047},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2412.10047},
}
```

Expand Down
67 changes: 67 additions & 0 deletions documents/docs/advanced_usage/batch_mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Batch Mode

Batch mode is a feature of UFO, the agent allows batch automation of tasks.

## Quick Start

### Step 1: Create a Plan file

Before starting the Batch mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields:

| Field | Description | Type |
| ------ | -------------------------------------------------------------------------------------------- | ------- |
| task | The task description. | String |
| object | The application or file to interact with. | String |
| close | Determines whether to close the corresponding application or file after completing the task. | Boolean |

Below is an example of a plan file:

```json
{
"task": "Type in a text of 'Test For Fun' with heading 1 level",
"object": "draft.docx",
"close": False
}
```

!!! note
The `object` field is the application or file that the agent will interact with. The object **must be active** (can be minimized) when starting the Batch mode.
The structure of your files should be as follows, where `tasks` is the directory for your tasks and `files` is where your object files are stored:

- Parent
- tasks
- files


### Step 2: Start the Batch Mode
To start the Batch mode, run the following command:

```bash
# assume you are in the cloned UFO folder
python ufo.py --task_name {task_name} --mode batch_normal --plan {plan_file}
```

!!! tip
Replace `{task_name}` with the name of the task and `{plan_file}` with the `Path_to_Parent/Plan_file`.



## Evaluation
You may want to evaluate the `task` is completed successfully or not by following the plan. UFO will call the `EvaluationAgent` to evaluate the task if `EVA_SESSION` is set to `True` in the `config_dev.yaml` file.

You can check the evaluation log in the `logs/{task_name}/evaluation.log` file.

# References
The batch mode employs a `PlanReader` to parse the plan file and create a `FromFileSession` to follow the plan.

## PlanReader
The `PlanReader` is located in the `ufo/module/sessions/plan_reader.py` file.

:::module.sessions.plan_reader.PlanReader

<br>
## FollowerSession

The `FromFileSession` is also located in the `ufo/module/sessions/session.py` file.

:::module.sessions.session.FromFileSession
26 changes: 13 additions & 13 deletions documents/docs/agents/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@

In UFO, there are four types of agents: `HostAgent`, `AppAgent`, `FollowerAgent`, and `EvaluationAgent`. Each agent has a specific role in the UFO system and is responsible for different aspects of the user interaction process:

| Agent | Description |
| --- | --- |
| [`HostAgent`](../agents/host_agent.md) | Decomposes the user request into sub-tasks and selects the appropriate application to fulfill the request. |
| [`AppAgent`](../agents/app_agent.md) | Executes actions on the selected application. |
| [`FollowerAgent`](../agents/follower_agent.md) | Follows the user's instructions to complete the task. |
| [`EvaluationAgent`](../agents/evaluation_agent.md) | Evaluates the completeness of a session or a round. |
| Agent | Description |
| -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| [`HostAgent`](../agents/host_agent.md) | Decomposes the user request into sub-tasks and selects the appropriate application to fulfill the request. |
| [`AppAgent`](../agents/app_agent.md) | Executes actions on the selected application. |
| [`FollowerAgent`](../agents/follower_agent.md) | Follows the user's instructions to complete the task. |
| [`EvaluationAgent`](../agents/evaluation_agent.md) | Evaluates the completeness of a session or a round. |

In the normal workflow, only the `HostAgent` and `AppAgent` are involved in the user interaction process. The `FollowerAgent` and `EvaluationAgent` are used for specific tasks.

Expand All @@ -21,13 +21,13 @@ Please see below the orchestration of the agents in UFO:

An agent in UFO is composed of the following main components to fulfill its role in the UFO system:

| Component | Description |
| --- | --- |
| [`State`](../agents/design/state.md) | Represents the current state of the agent and determines the next action and agent to handle the request. |
| [`Memory`](../agents/design/memory.md) | Stores information about the user request, application state, and other relevant data. |
| [`Blackboard`](../agents/design/blackboard.md) | Stores information shared between agents. |
| [`Prompter`](../agents/design/prompter.md) | Generates prompts for the language model based on the user request and application state. |
| [`Processor`](../agents/design/processor.md) | Processes the workflow of the agent, including handling user requests, executing actions, and memory management. |
| Component | Description |
| ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| [`State`](../agents/design/state.md) | Represents the current state of the agent and determines the next action and agent to handle the request. |
| [`Memory`](../agents/design/memory.md) | Stores information about the user request, application state, and other relevant data. |
| [`Blackboard`](../agents/design/blackboard.md) | Stores information shared between agents. |
| [`Prompter`](../agents/design/prompter.md) | Generates prompts for the language model based on the user request and application state. |
| [`Processor`](../agents/design/processor.md) | Processes the workflow of the agent, including handling user requests, executing actions, and memory management. |

## Reference

Expand Down
2 changes: 1 addition & 1 deletion documents/docs/dataflow/overview.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Introduction

This repository contains the implementation of the **Data Collection** process for training the **Large Action Models** (LAMs) in the paper of [Large Action Models: From Inception to Implementation]. The **Data Collection** process is designed to streamline task processing, ensuring that all necessary steps are seamlessly integrated from initialization to execution. This module is part of the [**UFO**](https://arxiv.org/abs/2402.07939) project.
This repository contains the implementation of the **Data Collection** process for training the **Large Action Models** (LAMs) in the paper of [Large Action Models: From Inception to Implementation](https://arxiv.org/abs/2412.10047). The **Data Collection** process is designed to streamline task processing, ensuring that all necessary steps are seamlessly integrated from initialization to execution. This module is part of the [**UFO**](https://arxiv.org/abs/2402.07939) project.

# Dataflow

Expand Down
14 changes: 11 additions & 3 deletions ufo/agents/agent/host_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ def create_agent(agent_type: str, *args, **kwargs) -> BasicAgent:
return AppAgent(*args, **kwargs)
elif agent_type == "follower":
return FollowerAgent(*args, **kwargs)
elif agent_type == "batch_normal":
return AppAgent(*args, **kwargs)
else:
raise ValueError("Invalid agent type: {}".format(agent_type))

Expand Down Expand Up @@ -233,10 +235,16 @@ def create_app_agent(
:return: The app agent.
"""

if mode == "normal":
if mode == "normal" or "batch_normal":

agent_name = "AppAgent/{root}/{process}".format(
root=application_root_name, process=application_window_name
agent_name = (
"AppAgent/{root}/{process}".format(
root=application_root_name, process=application_window_name
)
if mode == "normal"
else "BatchAgent/{root}/{process}".format(
root=application_root_name, process=application_window_name
)
)

app_agent: AppAgent = self.create_subagent(
Expand Down
3 changes: 2 additions & 1 deletion ufo/agents/states/host_agent_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,14 +198,15 @@ def next_state(self, agent: "HostAgent") -> AppAgentState:
:param agent: The current agent.
:return: The state for the next step.
"""

# Transition to the app agent state.
# Lazy import to avoid circular dependency.

from ufo.agents.states.app_agent_state import ContinueAppAgentState

return ContinueAppAgentState()


def next_agent(self, agent: "HostAgent") -> AppAgent:
"""
Get the agent for the next step.
Expand Down
6 changes: 6 additions & 0 deletions ufo/config/config_dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,9 @@ DEFAULT_PNG_COMPRESS_LEVEL: 9 # The compress level for the PNG image, 0-9, 0 is

# Save UI tree
SAVE_UI_TREE: False # Whether to save the UI tree


# Record the status of the tasks
TASK_STATUS: True # Whether to record the status of the tasks in batch execution mode.
# TASK_STATUS_FILE # The path for the task status file.

57 changes: 56 additions & 1 deletion ufo/module/sessions/plan_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# Licensed under the MIT License.

import json
import os
from typing import List, Optional

from ufo.config.config import Config
Expand All @@ -20,9 +21,19 @@ def __init__(self, plan_file: str):
:param plan_file: The path of the plan file.
"""

self.plan_file = plan_file
with open(plan_file, "r") as f:
self.plan = json.load(f)
self.remaining_steps = self.get_steps()
self.support_apps = ["word", "excel", "powerpoint"]

def get_close(self) -> bool:
"""
Check if the plan is closed.
:return: True if the plan need closed, False otherwise.
"""

return self.plan.get("close", False)

def get_task(self) -> str:
"""
Expand All @@ -46,7 +57,7 @@ def get_operation_object(self) -> str:
:return: The operation object.
"""

return self.plan.get("object", "")
return self.plan.get("object", None).lower()

def get_initial_request(self) -> str:
"""
Expand Down Expand Up @@ -76,6 +87,42 @@ def get_host_agent_request(self) -> str:

return request

def get_file_path(self):

file_path = os.path.dirname(os.path.abspath(self.plan_file)).replace(
"tasks", "files"
)
file = os.path.basename(
self.plan.get(
"object",
)
)

return os.path.join(file_path, file)

def get_support_apps(self) -> List[str]:
"""
Get the support apps in the plan.
:return: The support apps in the plan.
"""

return self.support_apps

def get_host_request(self) -> str:
"""
Get the request for the host agent.
:return: The request for the host agent.
"""

task = self.get_task()
object_name = self.get_operation_object()
if object_name in self.support_apps:
request = task
else:
request = f"Open the application of {task}. You must output the selected application with their control text and label even if it is already open."

return request

def next_step(self) -> Optional[str]:
"""
Get the next step in the plan.
Expand All @@ -95,3 +142,11 @@ def task_finished(self) -> bool:
"""

return not self.remaining_steps

def get_root_path(self) -> str:
"""
Get the root path of the plan.
:return: The root path of the plan.
"""

return os.path.dirname(os.path.abspath(self.plan_file))
Loading

0 comments on commit 7da39f0

Please # to comment.