-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: use poetry for reproducible virtual environments #209
base: main
Are you sure you want to change the base?
Changes from all commits
6a8c5b0
4aba398
97349ca
c3a1cf5
6638038
54163fc
b404f8c
3655a49
fbacdd4
51fea97
4903eee
dfc818c
19c7acc
8baa00a
13b1893
1a31ba2
72a92bc
b8e4443
f40a21b
2313ebc
f0fc824
0af2888
fc5e362
1a719b2
0d71c23
6a46d26
771e2f9
1359e6f
865bd6c
db5832e
9739f90
21a1e5c
109e478
3aa293a
ecc62bb
c3321c1
e8ff81b
53025ee
2158078
896da24
4a3316e
ca9ab4e
a7e2bbb
5207283
0828eb9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -42,6 +42,8 @@ If additional new Python module dependencies are required, think about where to | |||||
- If they're optional dependencies for additional functionality, then put them in the pyproject.toml file like were done for [flash-attn](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/pyproject.toml#L44) or [aim](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/pyproject.toml#L45). | ||||||
- If it's an additional dependency for development, then add it to the [dev](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/pyproject.toml#L43) dependencies. | ||||||
|
||||||
if using `poetry` to install dependencies, you can optionally leverage [poetry add](https://python-poetry.org/docs/cli/#add) to add new dependencies to pyproject. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel this is abit confusing. If we have a peoetry flow, and there is a new dep, shouldnt we have some guidance on when it has to be added to the lockfile? Also some guidance to say that for optional deps it should not be added. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. true, maybe we can continue adding to pyproject.yml , like we do now and then we have to run There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There're 2 ways to add a new dependency. One is to use If you opt for the 2nd route but forget to run |
||||||
|
||||||
#### Code Review | ||||||
|
||||||
Once you've [created a pull request](#how-can-i-contribute), maintainers will review your code and may make suggestions to fix before merging. It will be easier for your pull request to receive reviews if you consider the criteria the reviewers follow while working. Remember to: | ||||||
|
@@ -82,12 +84,45 @@ The following tools are required: | |||||
- [git](https://git-scm.com) | ||||||
- [python](https://www.python.org) (v3.8+) | ||||||
- [pip](https://pypi.org/project/pip/) (v23.0+) | ||||||
- [poetry](https://python-poetry.org/docs/#installation) (v1.8.3+) | ||||||
- Poetry should always be installed in a dedicated virtual environment to isolate it from the rest of your system. It should in no case be installed in the environment of the project that is to be managed by Poetry. This ensures that Poetry’s own dependencies will not be accidentally upgraded or uninstalled. | ||||||
- [tox](https://tox.wiki/en/4.15.1/installation.html) (v4.15.1+) | ||||||
- Just like `poetry` install `tox` in an isolated virtual environment | ||||||
|
||||||
Installation: | ||||||
``` | ||||||
pip install -U datasets | ||||||
pip install -e . | ||||||
|
||||||
```bash | ||||||
: Install poetry and tox in an isolated virtual environment | ||||||
python3 -m venv isolated | ||||||
fabianlim marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
./isolated/bin/pip install -U pip setuptools | ||||||
./isolated/bin/pip install poetry tox | ||||||
|
||||||
: Ensure you can access poetry and tox without activating the | ||||||
: the isolated virtual environment | ||||||
export PATH=$PATH:`pwd`/isolated/bin | ||||||
|
||||||
: Create your development virtual environment | ||||||
python3 -m venv venv | ||||||
. venv/bin/activate | ||||||
|
||||||
: Install a dev version (similar to pip -e ".[dev]") of fms-hf-tuning | ||||||
poetry install --extras dev | ||||||
``` | ||||||
|
||||||
|
||||||
> Note: After installing, if you wish to use [FlashAttention](https://github.com/Dao-AILab/flash-attention), then you need to install these requirements: | ||||||
|
||||||
``` | ||||||
poetry install --extras dev,flash-attn | ||||||
``` | ||||||
|
||||||
If you wish to use [aim](https://github.com/aimhubio/aim), then you need to install it: | ||||||
``` | ||||||
poetry install --extras aim | ||||||
``` | ||||||
|
||||||
If you wish to use [fms-acceleration](https://github.com/foundation-model-stack/fms-acceleration) follow the instructions in [this section of README.md](README.md#fms-acceleration). | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
<details> | ||||||
<summary>Linting</summary> | ||||||
|
||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,7 @@ This repo provides basic tuning scripts with support for specific models. The re | |
|
||
## Installation | ||
|
||
### 1. Install from wheel | ||
``` | ||
pip install fms-hf-tuning | ||
``` | ||
|
@@ -27,7 +28,13 @@ If you wish to use [fms-acceleration](https://github.com/foundation-model-stack/ | |
``` | ||
pip install git+https://github.com/foundation-model-stack/fms-acceleration.git#subdirectory=plugins/framework | ||
``` | ||
`fms-acceleration` is a collection of plugins that packages that accelerate fine-tuning / training of large models, as part of the `fms-hf-tuning` suite. For more details on see [this section below](#fms-acceleration). | ||
`fms-acceleration` is a collection of plugins that packages that accelerate fine-tuning / training of large models, as part of the `fms-hf-tuning` suite. For more details see [this section below](#fms-acceleration). | ||
|
||
### 2. Build from source | ||
|
||
We have committed a `poetry.lock` file to allow reproducible enviornments. If building from source, you can clone the repository and use poetry to install as mentioned in [development docs](/CONTRIBUTING.md#development) | ||
|
||
If building in a dockerfile use `poetry export --format requirements.txt` can install same dependencies from the lock file. Maintainers regularly update the lock file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this line is not very clear.. this will do not any installation correct? Its just an export of the locked deps to a requirements file? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ya its just to export the requirements and same lock file and then we have to install. hmm should be just link our dockerfile as example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@fabianlim I'll rebase the PR once #223 makes its way into the
I'm actually not sure. I assume that they do this so that |
||
|
||
## Data format | ||
We support two data formats: | ||
|
@@ -385,7 +392,7 @@ Equally you can pass in a JSON configuration for running tuning. See [build doc] | |
|
||
### FMS Acceleration | ||
|
||
`fms-acceleration` is fuss-free approach to access a curated collection of acceleration plugins that acclerate your `tuning/sft-trainer.py` experience. Accelerations that apply to a variety of use-cases, e.g., PeFT / full-finetuning, are being planned for. As such, the accelerations are grouped into *plugins*; only install the plugins needed for the acceleration of interest. The plugins are housed in the [seperate repository found here](https://github.com/foundation-model-stack/fms-acceleration). | ||
`fms-acceleration` is fuss-free approach to access a curated collection of acceleration plugins that accelerate your `tuning/sft-trainer.py` experience. Accelerations that apply to a variety of use-cases, e.g., PeFT / full-finetuning, are being planned for. As such, the accelerations are grouped into *plugins*; only install the plugins needed for the acceleration of interest. The plugins are housed in the [separate repository found here](https://github.com/foundation-model-stack/fms-acceleration). | ||
|
||
To access `fms-acceleration` features the `[fms-accel]` dependency must first be installed: | ||
``` | ||
|
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -110,29 +110,44 @@ RUN dnf install -y git && \ | |||||||
rm -f /usr/share/doc/perl-Net-SSLeay/examples/server_key.pem && \ | ||||||||
dnf clean all | ||||||||
USER ${USER} | ||||||||
WORKDIR /tmp | ||||||||
# Ensure that git directory is owned by current user, otherwise git raises | ||||||||
# "fatal: detected dubious ownership" for `/tmp` | ||||||||
WORKDIR /tmp/fms-hf-tuning | ||||||||
|
||||||||
# Install poetry and its dependencies inside an isolated virtual environment which we | ||||||||
# will not copy into the release-base layer | ||||||||
RUN --mount=type=cache,target=/home/${USER}/.cache/pip,uid=${USER_UID} \ | ||||||||
python -m pip install --user build | ||||||||
COPY --chown=${USER}:root tuning tuning | ||||||||
COPY .git .git | ||||||||
COPY pyproject.toml pyproject.toml | ||||||||
python -m venv venv /tmp/isolated && \ | ||||||||
/tmp/isolated/bin/pip install poetry poetry-plugin-export | ||||||||
|
||||||||
# Build a wheel if PyPi wheel_version is empty else download the wheel from PyPi | ||||||||
RUN if [[ -z "${WHEEL_VERSION}" ]]; \ | ||||||||
then python -m build --wheel --outdir /tmp; \ | ||||||||
else pip download fms-hf-tuning==${WHEEL_VERSION} --dest /tmp --only-binary=:all: --no-deps; \ | ||||||||
fi && \ | ||||||||
ls /tmp/*.whl >/tmp/bdist_name | ||||||||
COPY --chown=${USER}:root tuning tuning | ||||||||
COPY --chown=${USER}:root .git .git | ||||||||
COPY --chown=${USER}:root pyproject.toml pyproject.toml | ||||||||
COPY --chown=${USER}:root poetry.lock poetry.lock | ||||||||
COPY README.md README.md | ||||||||
|
||||||||
# Install from the wheel | ||||||||
# Install using poetry if PyPi wheel_version is empty else download the wheel from PyPi | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
RUN --mount=type=cache,target=/home/${USER}/.cache/pip,uid=${USER_UID} \ | ||||||||
python -m pip install --user wheel && \ | ||||||||
python -m pip install --user "$(head bdist_name)" && \ | ||||||||
python -m pip install --user "$(head bdist_name)[flash-attn]" && \ | ||||||||
# Clean up the wheel module. It's only needed by flash-attn install | ||||||||
python -m pip uninstall wheel build -y && \ | ||||||||
# Cleanup the bdist whl file | ||||||||
rm $(head bdist_name) /tmp/bdist_name | ||||||||
if [[ -z "${WHEEL_VERSION}" ]]; then \ | ||||||||
# Extract requirements from poetry and install them in ~/.local \ | ||||||||
# Need wheel and build for the flash-attn package \ | ||||||||
python -m pip install --user wheel build && \ | ||||||||
python -m pip install --user --requirement <(/tmp/isolated/bin/poetry export --format requirements.txt) && \ | ||||||||
# Next install the package with flash-attn \ | ||||||||
python -m pip install --user ".[flash-attn]" && \ | ||||||||
python -m pip uninstall wheel build -y ; \ | ||||||||
else \ | ||||||||
# This will use whatever dependencies versions satisfy the pyproject.toml constraints \ | ||||||||
# but they won't necessarily be the exact same versions as present in poetry.lock \ | ||||||||
# First, install fms-hf-tuning to get its dependencies which include torch. \ | ||||||||
# Then install with the flash-attn extras as the latter expects torch to be present \ | ||||||||
python -m pip install --user wheel build && \ | ||||||||
python -m pip install --user "fms-hf-tuning==${WHEEL_VERSION}" && \ | ||||||||
python -m pip install --user "fms-hf-tuning[flash-attn]==${WHEEL_VERSION}" && \ | ||||||||
python -m pip uninstall wheel build -y ; \ | ||||||||
fi | ||||||||
|
||||||||
RUN python -m pip freeze | ||||||||
|
||||||||
## Final image ################################################ | ||||||||
FROM release-base as release | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a need to install this to
--user
? If you can install peotry the same way as tox, that this obliviates the need for thePATH
setting below.If the aim is for isolation, then I dont think installing in the user space may achieve it. Since any further
pip install commands
will still find thepeoetry
inuser
and try to update it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There quite a few instances of these in other worflows as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tox always creates it's own isolated venv.
Agree it would probably work without
--user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch,
tox
andpoetry
should go in the same virtual-environment and that environment should not be the one that we installfms-hf-tuning
in. Becausetox
creates a new virtual environment for theenvironments
it processes (e.g.py
, `fmt, etc) poetry and tox will be outside the tested virtual-environment.We don't need to run tox "inside" one of the virtual environments that tox creates but we do need to run
poetry
. Here's how I chose to do that.First, I installed poetry under the user pip install directory (which by default for linux is
~/.local/
. Then I updated the file that $GITHUB_ENV points to so that in subsequent steps the$PATH
environment variable contains the path~/.local/bin
.This ensures that the
poetry
commands running insidetox
can use thepoetry
executable. Alternatively, we can create a new virtual environment under a directory of our choosing e.g./tmp/isolated
in there, we install just poetry and then update$GITHUB_ENV
so that$PATH
includes/tmp/isolated/bin
.