MobileVLM

Paper Link

MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding

News

2024.11.12 - Partial training data and random walk code for Mobile3M released!
2024.10.4 - Test data for Mobile3M released!
2024.9.26 - Our work accepted by EMNLP 2024 Findings!

1. Quick Start

Requirements

transformers==4.32.0
accelerate
tiktoken
einops
transformers_stream_generator==0.0.4
scipy
torchvision
pillow
tensorboard
matplotlib

2. Mobile3M Dataset

Training Data

Training data is available at the following link: data. We will gradually upload data for all apps.

Corpus Collection Script

To start collecting data, run the script main/corpus/googleCreatDataset/arm_graph_para_lock.py.

Example usage:

python googleCreatDataset/arm_graph_para_lock.py --device_name 10.53.89.79:6532 --systemPort 8112 --appid 8201 --command_executorhttp://127.0.0.1:4812/wd/hub--appPackage com.lucky.luckyclient --name_en lucky --diff_max 0.5 --diff_png 0.3 --waitadb 8 --prefix lucky0_3_1_2_ --recheck -1

Running the above collection instruction requires the following additional installations.

Install Node.js and Appium:

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo apt install npm
npm install -g appium@1.18.3

Install graphical libraries:
sudo apt-get install xorg
Activate the Python virtual environment:
source /path/to/new/virtual/environment/bin/activate
Install Appium Python Client 1.3.0:
pip install Appium-Python-Client==1.3.0

Parameter Descriptions

device_name: Name of the emulator.
appid: Storage ID of the app being collected, e.g., 8201.
command_executor: Appium system endpoint URL.
--diff_max 0.5 --diff_png 0.3: Page similarity thresholds for differentiating screens.
--prefix lucky0_3_1_2_: Distributed starting path for data collection.
--recheck -1: Specifies whether to recheck previously collected data. Set to -1 for no recheck.

Appium

Data Generation Code for Each Task

The code for generating data for each task can be found in the following directories:

Our Test Data

Our test data is available at data.

4. License

The dataset of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

The source code of the this is licensed under the Apache 2.0 license.

Summary of Terms

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

License Badge

5. Citation

If you'd like to use our benchmark or cite this paper, please kindly use the reference below:

@article{wu2024mobilevlm,
  title={Mobilevlm: A vision-language model for better intra-and inter-ui understanding},
  author={Wu, Qinzhuo and Xu, Weikai and Liu, Wei and Tan, Tao and Liu, Jianfeng and Li, Ang and Luan, Jian and Wang, Bin and Shang, Shuo},
  journal={arXiv preprint arXiv:2409.14818},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MobileVLM

Paper Link

News

1. Quick Start

Requirements

2. Mobile3M Dataset

Training Data

Corpus Collection Script

Parameter Descriptions

Appium

Data Generation Code for Each Task

Our Test Data

4. License

Summary of Terms

License Badge

5. Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

MobileVLM

Paper Link

News

1. Quick Start

Requirements

2. Mobile3M Dataset

Training Data

Corpus Collection Script

Parameter Descriptions

Appium

Data Generation Code for Each Task

Our Test Data

4. License

Summary of Terms

License Badge

5. Citation