Prerequisite 1: you need to have poetry installed.
curl -sSL https://install.python-poetry.org | python3 -
Prerequisite 2: you need to have compatible Python version available in poetry. You only need to have this activated the first time you install the poetry environment.
conda create -n py310 python=3.10
conda activate py310
Install environment:
git clone [email protected]:apple/pfl-research.git
cd pfl-research/benchmarks/
# If you have the new Python 3.10 environment active, it should be cloned.
poetry env use `which python`
# Install to run tf, pytorch and tests
poetry install -E pytorch -E tf
# Activate environment
poetry shell
# Add root directory to `PYTHONPATH` such that the utility modules can be imported
export PYTHONPATH=`pwd`:$PYTHONPATH
This default setup should enable you to run any of the official benchmarks.
- Complete setup as above.
- Download CIFAR10 data:
python -m dataset.cifar10.download_preprocess --output_dir data/cifar10
- Train a small CNN on CIFAR10 IID data:
python image_classification/pytorch/train.py --args_config image_classification/configs/baseline.yaml
There are multiple official benchmarks for pfl
to simulate various scenarios, split into categories:
- image_classification - train small CNN on CIFAR10.
- lm - train transformer model on StackOverflow
- flair - train ResNet18 on FLAIR dataset.
Each benchmark can run in distributed mode with multiple cores, GPUs and machines. See the distributed simulation guide on how it works. In summary, to quickly get started running distributed simulations:
- Install Horovod. We have a helper script here.
- Invoke your Python script with the
horovodrun
command. E.g. to run the same CIFAR10 training as described above in the quickstart, but train with 2 processes on the same machine, the command will look like this:
horovodrun --gloo -np 2 -H localhost:2 python image_classification/pytorch/train.py --args_config image_classification/configs/baseline.yaml
If you have 2 GPUs on the machine, each process will be allocated 1 GPU. If you only have 1 GPU, both processes will share the GPU. Sharing GPU can still result in speedup depending on the use case because of the inevitable overhead of FL.
Alternatively, you can install a subset of dependencies:
# Install to run tf examples
poetry install -E tf --no-dev
# Install to run pytorch examples
poetry install -E pytorch --no-dev
# Install to run tf, pytorch and tests
poetry install -E pytorch -E tf