Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: use poetry for reproducible virtual environments #209

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
6a8c5b0
feat: update pyproject.toml to use poetry
VassilisVassiliadis Jun 10, 2024
4aba398
fix: add the tuning package
VassilisVassiliadis Jun 11, 2024
97349ca
fix: make group dependencies optional
VassilisVassiliadis Jun 11, 2024
c3a1cf5
build: update poetry.lock contents
VassilisVassiliadis Jun 11, 2024
6638038
build: update the Dockerfile to use the poetry lock file
VassilisVassiliadis Jun 11, 2024
54163fc
feat: use poetry-dynamic-versioning
VassilisVassiliadis Jun 12, 2024
b404f8c
build: update aim version to 3.22.0 and trl to 0.8.6
VassilisVassiliadis Jun 21, 2024
3655a49
build: use poetry when running tests and building the wheel
VassilisVassiliadis Jun 21, 2024
fbacdd4
docs: document how to install the repository using poetry
VassilisVassiliadis Jun 21, 2024
51fea97
build: deal with git error regarding dubious ownership
VassilisVassiliadis Jun 21, 2024
4903eee
build: update flash-attn constraint to ^2.5.6
VassilisVassiliadis Jun 21, 2024
dfc818c
fix: add simpleeval to python dependencies
VassilisVassiliadis Jun 21, 2024
19c7acc
fix: support 3.9 to 3.11
VassilisVassiliadis Jun 21, 2024
8baa00a
fix: fms_acceleration dependency
VassilisVassiliadis Jun 21, 2024
13b1893
fix: install fms-hf-tuning with poetry before running pytest
VassilisVassiliadis Jun 21, 2024
1a31ba2
refactor: rename build python file and tests as launcher
VassilisVassiliadis Jun 23, 2024
72a92bc
feat: use gen_train_args() method to generate train args and use eval…
VassilisVassiliadis Jun 23, 2024
b8e4443
fix: logging.error doesn't exist in transformers.utils.logging (v4.39.3)
VassilisVassiliadis Jun 23, 2024
f40a21b
feat: add missing dependencies to dev group
VassilisVassiliadis Jun 23, 2024
2313ebc
build: update tox to use poetry
VassilisVassiliadis Jun 23, 2024
f0fc824
fix: add the missing launcher scripts for the python package
VassilisVassiliadis Jun 23, 2024
0af2888
fix: tox lint and tox coverage
VassilisVassiliadis Jun 24, 2024
fc5e362
fix: tox lint and tox coverage
VassilisVassiliadis Jun 24, 2024
1a719b2
build: point fms_acceleration dependency to its last known commit 40a…
VassilisVassiliadis Jun 24, 2024
0d71c23
build: remove the pinned version of fms_acceleration
VassilisVassiliadis Jun 24, 2024
6a46d26
build: update the lock file after modifying pyproject.toml
VassilisVassiliadis Jun 24, 2024
771e2f9
refactor: move launcher scripts back into build
VassilisVassiliadis Jun 25, 2024
1359e6f
build: remove `build` from the `dev` optional dependencies group
VassilisVassiliadis Jun 25, 2024
865bd6c
build: update tox not install poetry in the virtual environment it tests
VassilisVassiliadis Jun 25, 2024
db5832e
build: update CI/CD to install poetry in --user site
VassilisVassiliadis Jun 25, 2024
9739f90
build: install poetry for build-and-publish workflow
VassilisVassiliadis Jun 25, 2024
21a1e5c
docs: document how to build a dev environment with poetry and tox
VassilisVassiliadis Jun 25, 2024
109e478
docs: how to install optional dependency groups
VassilisVassiliadis Jun 25, 2024
3aa293a
docs: fix broken link to fms-acceleration docs
VassilisVassiliadis Jun 25, 2024
ecc62bb
docs: fix typo
VassilisVassiliadis Jun 26, 2024
c3321c1
refactor: update test_run_with_additional_callbacks() unit-test
VassilisVassiliadis Jun 26, 2024
e8ff81b
fix: use optional `extra` dependencies instead of `groups`
VassilisVassiliadis Jun 27, 2024
53025ee
fix: update dockerfile to install poetry in an isolated environment
VassilisVassiliadis Jun 27, 2024
2158078
fix: replace "poetry install --with" with "--extras" in tox.ini
VassilisVassiliadis Jul 2, 2024
896da24
fix: use poetry install --extras dev --no-root" for the fmt tox envir…
VassilisVassiliadis Jul 2, 2024
4a3316e
chore: upgrade trl version to ">=0.9.3,<1.0"
VassilisVassiliadis Jul 2, 2024
ca9ab4e
chore: remove fms-accel extra group
VassilisVassiliadis Jul 2, 2024
a7e2bbb
build: fix the dockerfile by installing/uninstalling wheel and build
VassilisVassiliadis Jul 2, 2024
5207283
Update CONTRIBUTING.md
Ssukriti Jul 3, 2024
0828eb9
Update README.md
Ssukriti Jul 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/build-and-publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ jobs:
run: |
python -m pip install --upgrade pip
python -m pip install tox
python -m pip install poetry --user
Copy link
Collaborator

@fabianlim fabianlim Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a need to install this to --user? If you can install peotry the same way as tox, that this obliviates the need for the PATH setting below.

If the aim is for isolation, then I dont think installing in the user space may achieve it. Since any further pip install commands will still find the peoetry in user and try to update it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There quite a few instances of these in other worflows as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tox always creates it's own isolated venv.
Agree it would probably work without --user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, tox and poetry should go in the same virtual-environment and that environment should not be the one that we install fms-hf-tuning in. Because tox creates a new virtual environment for the environments it processes (e.g. py, `fmt, etc) poetry and tox will be outside the tested virtual-environment.

We don't need to run tox "inside" one of the virtual environments that tox creates but we do need to run poetry. Here's how I chose to do that.

First, I installed poetry under the user pip install directory (which by default for linux is ~/.local/. Then I updated the file that $GITHUB_ENV points to so that in subsequent steps the $PATH environment variable contains the path ~/.local/bin.

This ensures that the poetry commands running inside tox can use the poetry executable. Alternatively, we can create a new virtual environment under a directory of our choosing e.g. /tmp/isolated in there, we install just poetry and then update $GITHUB_ENV so that $PATH includes /tmp/isolated/bin.

echo "PATH=$PATH:~/.local/bin" >> "$GITHUB_ENV"
- name: Build and test with tox
run: tox -e ${{ matrix.python-version.tox }}
- name: Build and check wheel package
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,7 @@ jobs:
run: |
python -m pip install --upgrade pip
python -m pip install tox
python -m pip install poetry --user
echo "PATH=$PATH:~/.local/bin" >> "$GITHUB_ENV"
- name: Check Coverage
run: tox -e coverage
2 changes: 2 additions & 0 deletions .github/workflows/format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ jobs:
run: |
python -m pip install --upgrade pip
python -m pip install tox
python -m pip install poetry --user
echo "PATH=$PATH:~/.local/bin" >> "$GITHUB_ENV"
- name: Check formatting
run: tox -e fmt
- name: Run pylint
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,7 @@ jobs:
run: |
python -m pip install --upgrade pip
python -m pip install tox
python -m pip install poetry --user
echo "PATH=$PATH:~/.local/bin" >> "$GITHUB_ENV"
- name: Run unit tests
run: tox -e py
39 changes: 36 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,45 @@ The following tools are required:
- [git](https://git-scm.com)
- [python](https://www.python.org) (v3.8+)
- [pip](https://pypi.org/project/pip/) (v23.0+)
- [poetry](https://python-poetry.org/docs/#installation) (v1.8.3+)
- Poetry should always be installed in a dedicated virtual environment to isolate it from the rest of your system. It should in no case be installed in the environment of the project that is to be managed by Poetry. This ensures that Poetry’s own dependencies will not be accidentally upgraded or uninstalled.
- [tox](https://tox.wiki/en/4.15.1/installation.html) (v4.15.1+)
- Just like `poetry` install `tox` in an isolated virtual environment

Installation:
```
pip install -U datasets
pip install -e .

```bash
: Install poetry and tox in an isolated virtual environment
python3 -m venv isolated
fabianlim marked this conversation as resolved.
Show resolved Hide resolved
./isolated/bin/pip install -U pip setuptools
./isolated/bin/pip install poetry tox

: Ensure you can access poetry and tox without activating the
: the isolated virtual environment
export PATH=$PATH:`pwd`/isolated/bin

: Create your development virtual environment
python3 -m venv venv
. venv/bin/activate

: Install a dev version (similar to pip -e ".[dev]") of fms-hf-tuning
poetry install --extras dev
```


> Note: After installing, if you wish to use [FlashAttention](https://github.com/Dao-AILab/flash-attention), then you need to install these requirements:

```
poetry install --extras dev,flash-attn
```

If you wish to use [aim](https://github.com/aimhubio/aim), then you need to install it:
```
poetry install --extras aim
```

If you wish to use [fms-acceleration](https://github.com/foundation-model-stack/fms-acceleration) follow the instructions in [this section of README.md](README.md#fms-acceleration).

Copy link
Collaborator

@Ssukriti Ssukriti Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Alternatively, you could continue `pip install -e . `, if you do not wish to leverage the lock file and have environmental contrainsts

<details>
<summary>Linting</summary>

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ If you wish to use [fms-acceleration](https://github.com/foundation-model-stack/
```
pip install git+https://github.com/foundation-model-stack/fms-acceleration.git#subdirectory=plugins/framework
```
`fms-acceleration` is a collection of plugins that packages that accelerate fine-tuning / training of large models, as part of the `fms-hf-tuning` suite. For more details on see [this section below](#fms-acceleration).
`fms-acceleration` is a collection of plugins that packages that accelerate fine-tuning / training of large models, as part of the `fms-hf-tuning` suite. For more details see [this section below](#fms-acceleration).

## Data format
We support two data formats:
Expand Down Expand Up @@ -385,7 +385,7 @@ Equally you can pass in a JSON configuration for running tuning. See [build doc]

### FMS Acceleration

`fms-acceleration` is fuss-free approach to access a curated collection of acceleration plugins that acclerate your `tuning/sft-trainer.py` experience. Accelerations that apply to a variety of use-cases, e.g., PeFT / full-finetuning, are being planned for. As such, the accelerations are grouped into *plugins*; only install the plugins needed for the acceleration of interest. The plugins are housed in the [seperate repository found here](https://github.com/foundation-model-stack/fms-acceleration).
`fms-acceleration` is fuss-free approach to access a curated collection of acceleration plugins that accelerate your `tuning/sft-trainer.py` experience. Accelerations that apply to a variety of use-cases, e.g., PeFT / full-finetuning, are being planned for. As such, the accelerations are grouped into *plugins*; only install the plugins needed for the acceleration of interest. The plugins are housed in the [separate repository found here](https://github.com/foundation-model-stack/fms-acceleration).

To access `fms-acceleration` features the `[fms-accel]` dependency must first be installed:
```
Expand Down
53 changes: 34 additions & 19 deletions build/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -110,29 +110,44 @@ RUN dnf install -y git && \
rm -f /usr/share/doc/perl-Net-SSLeay/examples/server_key.pem && \
dnf clean all
USER ${USER}
WORKDIR /tmp
# Ensure that git directory is owned by current user, otherwise git raises
# "fatal: detected dubious ownership" for `/tmp`
WORKDIR /tmp/fms-hf-tuning

# Install poetry and its dependencies inside an isolated virtual environment which we
# will not copy into the release-base layer
RUN --mount=type=cache,target=/home/${USER}/.cache/pip,uid=${USER_UID} \
python -m pip install --user build
COPY --chown=${USER}:root tuning tuning
COPY .git .git
COPY pyproject.toml pyproject.toml
python -m venv venv /tmp/isolated && \
/tmp/isolated/bin/pip install poetry poetry-plugin-export

# Build a wheel if PyPi wheel_version is empty else download the wheel from PyPi
RUN if [[ -z "${WHEEL_VERSION}" ]]; \
then python -m build --wheel --outdir /tmp; \
else pip download fms-hf-tuning==${WHEEL_VERSION} --dest /tmp --only-binary=:all: --no-deps; \
fi && \
ls /tmp/*.whl >/tmp/bdist_name
COPY --chown=${USER}:root tuning tuning
COPY --chown=${USER}:root .git .git
COPY --chown=${USER}:root pyproject.toml pyproject.toml
COPY --chown=${USER}:root poetry.lock poetry.lock
COPY README.md README.md

# Install from the wheel
# Install using poetry if PyPi wheel_version is empty else download the wheel from PyPi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Install using poetry if PyPi wheel_version is empty else download the wheel from PyPi
# Install using poetry if PyPi wheel_version is empty else download the wheel from PyPi
# If creating your own dockerfile we suggest to use poetry export for a reproducible environment

RUN --mount=type=cache,target=/home/${USER}/.cache/pip,uid=${USER_UID} \
python -m pip install --user wheel && \
python -m pip install --user "$(head bdist_name)" && \
python -m pip install --user "$(head bdist_name)[flash-attn]" && \
# Clean up the wheel module. It's only needed by flash-attn install
python -m pip uninstall wheel build -y && \
# Cleanup the bdist whl file
rm $(head bdist_name) /tmp/bdist_name
if [[ -z "${WHEEL_VERSION}" ]]; then \
# Extract requirements from poetry and install them in ~/.local \
# Need wheel and build for the flash-attn package \
python -m pip install --user wheel build && \
python -m pip install --user --requirement <(/tmp/isolated/bin/poetry export --format requirements.txt) && \
# Next install the package with flash-attn \
python -m pip install --user ".[flash-attn]" && \
python -m pip uninstall wheel build -y ; \
else \
# This will use whatever dependencies versions satisfy the pyproject.toml constraints \
# but they won't necessarily be the exact same versions as present in poetry.lock \
# First, install fms-hf-tuning to get its dependencies which include torch. \
# Then install with the flash-attn extras as the latter expects torch to be present \
python -m pip install --user wheel build && \
python -m pip install --user "fms-hf-tuning==${WHEEL_VERSION}" && \
python -m pip install --user "fms-hf-tuning[flash-attn]==${WHEEL_VERSION}" && \
python -m pip uninstall wheel build -y ; \
fi

RUN python -m pip freeze

## Final image ################################################
FROM release-base as release
Expand Down
53 changes: 32 additions & 21 deletions build/accelerate_launch.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,30 +18,33 @@
"""

# Standard
import os
from pathlib import Path
import logging
import os
import shutil
import subprocess
import sys
import traceback
import tempfile
import shutil
from pathlib import Path
import traceback

# Third Party
from accelerate.commands.launch import launch_command
import torch.distributed.elastic.multiprocessing.errors

# Local
# First Party
from build.utils import (
get_highest_checkpoint,
process_accelerate_launch_args,
serialize_args,
get_highest_checkpoint,
)
from tuning.utils.config_utils import get_json_config

# Local
from tuning.config.tracker_configs import FileLoggingTrackerConfig
from tuning.utils.config_utils import get_json_config
from tuning.utils.error_logging import (
write_termination_log,
USER_ERROR_EXIT_CODE,
INTERNAL_ERROR_EXIT_CODE,
USER_ERROR_EXIT_CODE,
write_termination_log,
)

ERROR_LOG = "/dev/termination-log"
Expand Down Expand Up @@ -89,6 +92,20 @@ def main():
# Launch training
#
##########

def handle_sft_trainer_exit_error(return_code):
# If the subprocess throws an exception, the base exception is hidden in the
# subprocess call and is difficult to access at this level. However, that is not
# an issue because sft_trainer.py would have already written the exception
# message to termination log.
logging.error(traceback.format_exc())
# The exit code that sft_trainer.py threw is captured in e.returncode

if return_code not in [INTERNAL_ERROR_EXIT_CODE, USER_ERROR_EXIT_CODE]:
return_code = INTERNAL_ERROR_EXIT_CODE
write_termination_log(f"Unhandled exception during training. {e}")
sys.exit(return_code)

original_output_dir = job_config.get("output_dir")
with tempfile.TemporaryDirectory() as tempdir:
try:
Expand All @@ -98,19 +115,13 @@ def main():
os.environ["SFT_TRAINER_CONFIG_JSON_ENV_VAR"] = updated_args

launch_command(args)
except torch.distributed.elastic.multiprocessing.errors.ChildFailedError as e:
# This is what accelerate.commands.launch.multi_gpu_launcher() raises
# (when using >1 GPUs)
handle_sft_trainer_exit_error(e.get_first_failure()[1].exitcode)
except subprocess.CalledProcessError as e:
# If the subprocess throws an exception, the base exception is hidden in the
# subprocess call and is difficult to access at this level. However, that is not
# an issue because sft_trainer.py would have already written the exception
# message to termination log.
logging.error(traceback.format_exc())
# The exit code that sft_trainer.py threw is captured in e.returncode

return_code = e.returncode
if return_code not in [INTERNAL_ERROR_EXIT_CODE, USER_ERROR_EXIT_CODE]:
return_code = INTERNAL_ERROR_EXIT_CODE
write_termination_log(f"Unhandled exception during training. {e}")
sys.exit(return_code)
# This is what accelerate.commands.launch.simple_launcher() raises
handle_sft_trainer_exit_error(e.returncode)
except Exception as e: # pylint: disable=broad-except
logging.error(traceback.format_exc())
write_termination_log(f"Unhandled exception during training. {e}")
Expand Down
6 changes: 3 additions & 3 deletions build/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@
# limitations under the License.

# Standard
import os
import base64
import logging
import os
import pickle
import base64

# Third Party
import torch
from accelerate.commands.launch import launch_command_parser
import torch


def get_highest_checkpoint(dir_path):
Expand Down
Loading
Loading