Skip to content

Commit

Permalink
Refine the readme of software side (#51)
Browse files Browse the repository at this point in the history
  • Loading branch information
XinrunXu authored Jul 8, 2024
1 parent 103dba9 commit ec0523e
Showing 1 changed file with 29 additions and 30 deletions.
59 changes: 29 additions & 30 deletions docs/envs/software.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Here are the settings for Software, like Chrome and Outlook.
Here are the settings for Software side.

## Software Setup

### 1. Install Software Dependencies

Download the [StableSAM](https://huggingface.co/spaces/abhishek/StableSAM/blob/main/sam_vit_h_4b8939.pth) model file and copy it to the /cache folder.
Download the [StableSAM](https://huggingface.co/spaces/abhishek/StableSAM/blob/main/sam_vit_h_4b8939.pth) model file and copy it to the `/cache` folder.

### 2. Change Computer Settings Before Running the Code

Expand All @@ -16,12 +16,7 @@ Then, set the folder that the agent will open to display in Large icons or Extra

![Large icons](../envs/images/software/large_icon.png)

### 3. Change Code Settings Before Running the Code

- To use debug mode, you need to change the --envConfig target in .vscode\launch.json to the software's JSON file in the conf\ directory that you want to test.
- To use terminal mode, you need to pass the --envConfig argument to the software's JSON file in the conf\ directory that you want to test.

### 4. Open the software you want to test.
### 3. Open the software you want to test

Below are the exact software versions utilized in our paper:

Expand All @@ -35,6 +30,26 @@ Below are the exact software versions utilized in our paper:

In theory, any version can be used. However, if you want to reproduce our experimental results, we recommend using the software versions listed below.

### 4. Run

To simplify operations, the default LLM model we use is OpenAI's `GPT-4o`.
After opening the corresponding software in your main screen, use the follow script to let Cradle run.

```bash
# Run Chrome
python runner.py --envConfig "./conf/env_config_chrome.json"
# Run Outlook
python runner.py --envConfig "./conf/env_config_outlook.json"
# Run CapCut
python runner.py --envConfig "./conf/env_config_capcut.json"
# Run Meitu
python runner.py --envConfig "./conf/env_config_xiuxiu.json"
# Run Feishu
python runner.py --envConfig "./conf/env_config_feishu.json"
```

Or if you want use debug mode, you need to change the `--envConfig` target in `.vscode\launch.json` to the software's JSON file in the `conf\` directory that you want to test.

## 25 Tasks in our Paper

Task Descriptions for Chrome, Outlook, CapCut, Meitu and Feishu. **Difficulty** refers to how hard it is for our agent to accomplish the corresponding tasks.
Expand Down Expand Up @@ -72,7 +87,6 @@ Task Descriptions for Chrome, Outlook, CapCut, Meitu and Feishu. **Difficulty**
| #4 Set User Status| Open the user profile menu and set my status to "In meeting". | Medium |
| #5 Start Video Conference | Create a new meeting and meet now. | Easy |


## Initial Stage for Every Software

### 1. Chrome
Expand All @@ -95,7 +109,6 @@ For each task in Outlook, the initial page is shown in the figure:
- For Task 2, ensure there is at least one email in the junk mail folder.
- For Task 4, ensure there is at least one email in the inbox with the subject "Urgent meeting."


### 3. CapCut

For each task in CapCut, the initial page is shown in the table below:
Expand Down Expand Up @@ -127,36 +140,22 @@ For each task in Feishu, the initial page is shown in the figure:
- For Task #3, ensure you have an AWS bill PDF file at your input path.
- For Task #4, set your status to "None" before you begin the test.

## How to Implement Cradle on Other Software

- Always pull the latest /main branch to your current work branch.
- Add [conf\env_config_xxx.json] to adapt to your target software.
- Copy the cradle\environment\chrome folder in cradle\environment\ and rename it to your software environment name. Replace all instances of "chrome" within the folder with your software's environment name.
- Copy the res\chrome folder in res\ and rename it to your software environment name. Replace all instances of "chrome" within the folder with your software's environment name. Modify the prompts and template-matching icon images as needed (for important UI elements that SAM2SOM cannot recognize).

Small tip: Use [log_proc.py] to visualize logs and see how to improve your prompts and skills.


## How to Implement Cradle on Other Software

1. Always Pull the Latest Branch:

- Ensure that you always pull the latest /main branch to your current work branch to keep your repository up to date.
- Ensure that you always pull the latest `/main` branch to your current work branch to keep your repository up to date.

2. Configuring the Environment:

- Add a configuration file in the format [conf\env_config_xxx.json] to adapt Cradle to your target software.
- Add a configuration file in the format `conf\env_config_xxx.json` to adapt Cradle to your target software.

3. Setting Up the Environment:

- Copy the cradle\environment\chrome folder located in cradle\environment\ and rename it to match your software environment name. Replace all instances of "chrome" within the folder with your software's environment name.
- Copy the res\chrome folder located in res\ and rename it to your software environment name. Replace all instances of "chrome" within the folder with your software's environment name. Modify the prompts and template-matching icon images as needed for important UI elements that SAM2SOM cannot recognize.
- Copy the `cradle\environment\chrome` folder located in `cradle\environment\` and rename it to match your software environment name. Replace all instances of "chrome" within the folder with your software's environment name.
- Copy the `res\chrome` folder located in `res\` and rename it to your software environment name. Replace all instances of "chrome" within the folder with your software's environment name. Modify the prompts and template-matching icon images as needed for important UI elements that SAM2SOM cannot recognize.

4. Debug and Terminal Modes:

- Debug Mode: Change the --envConfig target in .vscode\launch.json to point to the software's JSON file in the conf\ directory that you want to test.
- Terminal Mode: Pass the --envConfig argument to the software's JSON file in the conf\ directory that you want to test.

5. Visualizing Logs:

- Use [log_proc.py] to visualize logs. This helps you understand how your prompts and skills are performing and identify areas for improvement.
- Debug Mode: Change the `--envConfig` target in `.vscode\launch.json` to point to the software's JSON file in the `conf\` directory that you want to test.
- Terminal Mode: Pass the `--envConfig` argument to the software's JSON file in the `conf\` directory that you want to test.

0 comments on commit ec0523e

Please sign in to comment.