Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.7.0 #308

Merged
merged 10 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/workflows/ci_action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,11 @@ jobs:
pip install -e .[DEV,ML]
pip install gdal==$(gdal-config --version)

- name: Set up local cluster # we need to install async-timeout until ray 2.9.0 fixes the issue
run: |
pip install async-timeout
ray start --head

- name: Run fast tests
if: ${{ !matrix.full_test_suite }}
run: pytest -m "not integration"
Expand All @@ -113,3 +118,20 @@ jobs:
files: coverage.xml
fail_ci_if_error: true
verbose: false

mirror-to-gitlab:
if: github.event_name == 'push'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- name: Mirror + trigger CI
uses: SvanBoxel/gitlab-mirror-and-ci-action@master
with:
args: "https://git.sinergise.com/eo/code/eo-grow"
env:
FOLLOW_TAGS: "true"
GITLAB_HOSTNAME: "git.sinergise.com"
GITLAB_USERNAME: "github-action"
GITLAB_PASSWORD: ${{ secrets.GITLAB_PASSWORD }}
GITLAB_PROJECT_ID: "878"
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
30 changes: 10 additions & 20 deletions .github/workflows/ci_trigger.yml
Original file line number Diff line number Diff line change
@@ -1,29 +1,19 @@
name: mirror_and_trigger
name: trigger

on:
pull_request:
push:
branches:
- "master"
- "develop"
workflow_call:
release:
types:
- published

jobs:
mirror-to-gitlab:
trigger:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- name: Mirror + trigger CI
uses: SvanBoxel/gitlab-mirror-and-ci-action@master
with:
args: "https://git.sinergise.com/eo/code/eo-grow"
env:
FOLLOW_TAGS: "true"
GITLAB_HOSTNAME: "git.sinergise.com"
GITLAB_USERNAME: "github-action"
GITLAB_PASSWORD: ${{ secrets.GITLAB_PASSWORD }}
GITLAB_PROJECT_ID: "878"
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Trigger API
run: >
curl -X POST --fail \
-F token=${{ secrets.GITLAB_PIPELINE_TRIGGER_TOKEN }} \
-F ref=main \
-F variables[CUSTOM_RUN_TAG]=auto \
-F variables[LAYER_NAME]=dotai-eo \
https://git.sinergise.com/api/v4/projects/1031/trigger/pipeline
18 changes: 0 additions & 18 deletions .gitlab-ci.yml

This file was deleted.

6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,20 @@ repos:
- id: debug-statements

- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v3.0.3"
rev: "v3.1.0"
hooks:
- id: prettier
exclude: "tests/(test_stats|test_project)/"
types_or: [json]

- repo: https://github.com/psf/black
rev: 23.10.1
rev: 23.11.0
hooks:
- id: black
language_version: python3

- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: "v0.1.4"
rev: "v0.1.6"
hooks:
- id: ruff

Expand Down
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
## [Version 1.7.0] - 2023-11-22
With this release we push `eo-grow` towards a more `ray` centered execution model.

- The local EOExecutor models with multiprocessing/multithreading have been removed. (Most) pipelines no longer have the `use_ray` and `workers` parameters. In order to run instances locally one has to set up a local cluster (via `ray start --head`). We included a `debug` parameter that uses `EOExecutor` instead of `RayExecutor` so that IDE breakpoints work in most pipelines.
- Pipeline chain configs have been adjusted. The user can now specify what kind of resources the main pipeline process would require. This also allows one to run pipelines entirely on worker instances.
- The `ray_worker_type` field was replaced with `worker_resources` that allows for precise resource request specifications.
- Fixed a but where CLI variables were not applied for config chains.
- Removed `TestPipeline` and the `eogrow-test` command.
- Some `ValueError` exceptions were changed to `TypeError`.


## [Version 1.6.3] - 2023-11-07

- Pipelines can request specific type of worker when run on a ray cluster with the `ray_worker_type` field.
Expand Down
61 changes: 45 additions & 16 deletions docs/source/common-configuration-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@ Invoking `eogrow-template "eogrow.pipelines.zipmap.ZipMapPipeline" "zipmap.json"
{
"pipeline": "eogrow.pipelines.zipmap.ZipMapPipeline",
"pipeline_name": "<< Optional[str] >>",
"workers": "<< 1 : int >>",
"use_ray": "<< 'auto' : Union[Literal['auto'], bool] >>",
"input_features": {
"<< type >>": "List[InputFeatureSchema]",
"<< nested schema >>": "<class 'eogrow.pipelines.zipmap.InputFeatureSchema'>",
Expand Down Expand Up @@ -104,11 +102,11 @@ In certain use cases we have multiple pipelines that are meant to be run in a ce
But the user still needs to run them in the correct order and by hand. This we can automate with a simple pipeline chain that links them together:
```
[ // end_to_end_run.json
{"**download": "${config_path}/01_download.json"},
{"**preprocess": "${config_path}/02_preprocess_data.json"},
{"**predict": "${config_path}/03_use_model.json"},
{"**export": "${config_path}/04_export_maps.json"},
{"**ingest": "${config_path}/05_ingest_byoc.json"},
{"pipeline_config": {"**download": "${config_path}/01_download.json"}},
{"pipeline_config": {"**preprocess": "${config_path}/02_preprocess_data.json"}},
{"pipeline_config": {"**predict": "${config_path}/03_use_model.json"}},
{"pipeline_config": {"**export": "${config_path}/04_export_maps.json"}},
{"pipeline_config": {"**ingest": "${config_path}/05_ingest_byoc.json"}},
]
```

Expand All @@ -121,28 +119,59 @@ In experimentation we often want to run the same pipeline for multiple parameter
```
[ // run_threshold_experiments.json
{
"variables": {"threshold": 0.1},
"**pipeline": "${config_path}/extract_trees.json"
"pipeline_config:{
"variables": {"threshold": 0.1},
"**pipeline": "${config_path}/extract_trees.json"
},
},
{
"variables": {"threshold": 0.2},
"**pipeline": "${config_path}/extract_trees.json"
"pipeline_config:{
"variables": {"threshold": 0.2},
"**pipeline": "${config_path}/extract_trees.json"
},
},
{
"variables": {"threshold": 0.3},
"**pipeline": "${config_path}/extract_trees.json"
"pipeline_config:{
"variables": {"threshold": 0.3},
"**pipeline": "${config_path}/extract_trees.json"
},
},
{
"variables": {"threshold": 0.4},
"**pipeline": "${config_path}/extract_trees.json"
"pipeline_config:{
"variables": {"threshold": 0.4},
"**pipeline": "${config_path}/extract_trees.json"
}
}
]
```

### Using variables with pipelines
### Using variables with pipeline chains

While there is no syntactic sugar for specifying pipeline-chain-wide variables in JSON files, one can do that through CLI. Running `eogrow end_to_end_run.json -v "year:2019"` will set the variable `year` to 2019 for all pipelines in the chain.

### Specifying resources for pipeline execution

Pipeline chains also allow the user to specify resources needed by the main process of each pipeline in a similar way that a pipeline config can specify resources needed by its workers.

```
[ // end_to_end_run.json
{
"pipeline_config": {"**download": "${config_path}/01_download.json"}
}
{
"pipeline_config": {"**predict": "${config_path}/03_use_model.json"},
"pipeline_resources": {"memory": 2e9} // ~ 2GB RAM reserved for the main process
}
{
"pipeline_config": {"**export": "${config_path}/04_export_maps.json"}
}
]
```

This also allows us to run certain pipelines on specially tagged workers. When setting up the cluster, one can tag workers with custom resources, for instance a `r5.4xlarge` worker with `big_RAM_worker: 1`. If we set `"pipeline_resources": {"resources": {"big_RAM_worker": 1}}` then the pipeline will run ONLY on such workers, and the whole worker instance will be assigned to it. This is great for pipelines which have a large workload in the main process.

Pipeline chains can be 1 pipeline long, so this can also be used with a single pipeline.

## Path modification via variables

In some cases one wants fine grained control over path specifications. The following is a simplified example of how one can provide separate download paths for a large amount of batch pipelines.
Expand Down
23 changes: 16 additions & 7 deletions docs/source/config-language.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,28 @@ Additional notes:

### Pipeline chains

A typical configuration is a dictionary with pipeline parameters. However, it can also be a list of dictionaries. In this case each dictionary must contain parameters of a single pipeline. The order of dictionaries defines the consecutive order in which pipelines will be run. Example:
A typical configuration is a dictionary with pipeline parameters. However, it can also be a list of pipeline-execution dictionaries that specify:
- `pipeline_config`: a configuration for a single pipeline,
- `pipeline_resources` (optional): a dictionary that is passed to `ray.remote` to configure which resources the main pipeline process will request from the cluster (see [here](https://docs.ray.io/en/latest/ray-core/api/doc/ray.remote_function.RemoteFunction.options.html) for options). The pipeline requests 1 CPU by default (and nothing else).

The order of dictionaries defines the consecutive order in which pipelines will be run. Example:

```
[
{
"pipeline": "FirstPipeline",
"param1": "value1",
...
"pipeline_config": {
"pipeline": "FirstPipeline",
"param1": "value1",
...
},
},
{
"pipeline": "SecondPipeline",
"param2": "value2",
...
"pipeline_config": {
"pipeline": "SecondPipeline",
"param2": "value2",
...
},
"pipeline_resources": {"num_cpus": 2}
},
...
]
Expand Down
2 changes: 1 addition & 1 deletion eogrow/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""The main module of the eo-grow package."""

__version__ = "1.6.3"
__version__ = "1.7.0"
Loading