Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3dgan integration #98

Merged
merged 59 commits into from
Dec 13, 2023
Merged
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
443b39e
commiting integration of 3dgan scripts
Oct 24, 2023
0f50aaf
ADD: Download dataset
matbun Oct 24, 2023
41d2b66
FIX: DDP distributed training with manual optimization
matbun Oct 25, 2023
ddfa59d
ADD: log with MLFlow
matbun Oct 25, 2023
e89a433
Sqaaas code (#88)
matbun Oct 25, 2023
adc6c91
Sqaaas code (#89)
matbun Oct 27, 2023
291b4f3
ADD: draft predictor and saver
matbun Nov 6, 2023
7da9ba4
ADD: stub for inference pipeline
matbun Nov 6, 2023
c73fb08
ADD: small docs
matbun Nov 6, 2023
1866a81
UPDATE: inference pipeline components
matbun Nov 7, 2023
22aed46
UPDATE: reorg
matbun Nov 7, 2023
0242790
ADD: image generation for inference
matbun Nov 7, 2023
17915b1
update tag
matbun Nov 7, 2023
c3ff733
ADD: threshold
matbun Nov 7, 2023
0a0f56e
ADD: draft inference
matbun Nov 7, 2023
95661c1
ADD: draft inference wf
matbun Nov 7, 2023
94254cf
ADD: working inference workflow
matbun Nov 8, 2023
63a7aa0
ADD: 3D scatter plots
matbun Nov 8, 2023
61c3666
ADD: Dockerfile + refactor
matbun Nov 8, 2023
d690192
ADD: .dockerignore
matbun Nov 8, 2023
a2a9875
Update .dockerignore
matbun Nov 8, 2023
3bcc410
REMOVE: keras dependency
matbun Nov 8, 2023
be0c115
ADD: skip download option
matbun Nov 9, 2023
77d939e
ADD: cern pipeline.yaml
matbun Nov 9, 2023
b603b05
UPDATE: dataset loading function
matbun Nov 9, 2023
bee1317
UPDATE: dataset loading function
matbun Nov 9, 2023
b6c3ee2
UPDATE conf
matbun Nov 9, 2023
466b150
UPDATE refactor
matbun Nov 9, 2023
3e1d6ab
UPDATE refactor
matbun Nov 9, 2023
a814e65
Merge branch 'dev' into 3dgan_integration
matbun Nov 9, 2023
ca60e19
UPDATE training docs
matbun Nov 9, 2023
307ed65
Update readme
matbun Nov 9, 2023
f47f40d
update README
matbun Nov 9, 2023
fc0697e
FIX typo
matbun Nov 9, 2023
a8c9d6d
Update README
matbun Nov 9, 2023
3faa062
Update mkdir
matbun Nov 9, 2023
0935e83
Merge branch 'dev' into 3dgan_integration
matbun Nov 10, 2023
b50c610
UPDATE data paths
matbun Nov 15, 2023
2cedfe7
UPDATE Dockerfile
matbun Nov 16, 2023
1efba3f
UPDATE Dockerfiles
matbun Nov 16, 2023
60ab87d
UPDATE for Singularity execution
matbun Nov 16, 2023
881ae47
FIX version mismatch
matbun Nov 16, 2023
9ab6ec1
UPDATE Singularity docs
matbun Nov 16, 2023
59fd74b
Named steps pipe (#100)
matbun Nov 23, 2023
8f13d92
UPDATE Singularity exec command
matbun Nov 23, 2023
8e19c62
UPDATE: Image version
matbun Nov 23, 2023
d3a2630
UPDATE: load components from pipeline
matbun Nov 23, 2023
33de0b4
ADD: docs
matbun Nov 23, 2023
f2ccfae
Simplify 3DGAN model config
matbun Nov 23, 2023
1af8ba7
ADD: mlflow autologging support for PL trainer
matbun Nov 23, 2023
acf7782
UPDATE container info
matbun Nov 24, 2023
656ab67
Refactor
matbun Dec 1, 2023
b176abf
UPDATE dependencies
matbun Dec 1, 2023
087c7ec
FIX linter problem
matbun Dec 1, 2023
8d9f51f
Simplified workflow configuration (#108)
matbun Dec 13, 2023
dd2c5ea
Simplified workflow configuration (#109)
matbun Dec 13, 2023
debc6a4
ADD integration tests
matbun Dec 13, 2023
9e8eafe
FIX test
matbun Dec 13, 2023
c9b1c17
FIX 3dgan inference test
matbun Dec 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@ dependencies = [
"submitit>=1.4.6",
"typing-extensions==4.5.0",
"typing_extensions==4.5.0",
"urllib3>=2.0.5",
"urllib3>=1.26.18",
matbun marked this conversation as resolved.
Show resolved Hide resolved
"lightning>=2.0.0",
"torchmetrics>=1.2.0",
]

# dynamic = ["version", "description"]
Expand Down
110 changes: 104 additions & 6 deletions src/itwinai/components.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,15 @@
from typing import Iterable, Dict, Any, Optional, Tuple, Union
from abc import ABCMeta, abstractmethod
import time
from jsonargparse import ArgumentParser

# import logging
# from logging import Logger as PythonLogger

from .cluster import ClusterEnvironment
from .types import ModelML, DatasetML
from .serialization import ModelLoader
from .utils import load_yaml


class Executable(metaclass=ABCMeta):
Expand Down Expand Up @@ -231,12 +234,12 @@ def save(self, *args, **kwargs):
class Executor(Executable):
"""Sets-up and executes a sequence of Executable steps."""

steps: Iterable[Executable]
steps: Union[Dict[str, Executable], Iterable[Executable]]
constructor_args: Dict

def __init__(
self,
steps: Iterable[Executable],
steps: Union[Dict[str, Executable], Iterable[Executable]],
name: Optional[str] = None,
# logs_dir: Optional[str] = None,
# debug: bool = False,
Expand All @@ -247,9 +250,20 @@ def __init__(
self.steps = steps
self.constructor_args = kwargs

def __getitem__(self, subscript) -> Executor:
def __getitem__(self, subscript: Union[str, int, slice]) -> Executor:
if isinstance(subscript, slice):
s = self.steps[subscript.start:subscript.stop: subscript.step]
# First, convert to list if is a dict
if isinstance(self.steps, dict):
steps = list(self.steps.items())
else:
steps = self.steps
# Second, perform slicing
s = steps[subscript.start:subscript.stop: subscript.step]
# Third, reconstruct dict, if it is a dict
if isinstance(self.steps, dict):
s = dict(s)
# Fourth, return sliced sub-pipeline, preserving its
# initial structure
sliced = self.__class__(
steps=s,
**self.constructor_args
Expand All @@ -270,7 +284,12 @@ def setup(self, parent: Optional[Executor] = None) -> None:
Defaults to None.
"""
super().setup(parent)
for step in self.steps:
if isinstance(self.steps, dict):
steps = list(self.steps.values())
else:
steps = self.steps

for step in steps:
step.setup(self)
step.is_setup = True

Expand Down Expand Up @@ -303,7 +322,12 @@ def execute(
Tuple[Optional[Tuple], Optional[Dict]]: tuple structured as
(results, config).
"""
for step in self.steps:
if isinstance(self.steps, dict):
steps = list(self.steps.values())
else:
steps = self.steps

for step in steps:
if not step.is_setup:
raise RuntimeError(
f"Step '{step.name}' was not setup!"
Expand All @@ -318,3 +342,77 @@ def _pack_args(self, args) -> Tuple:
if not isinstance(args, tuple):
args = (args,)
return args


def add_replace_field(
config: Dict,
key_chain: str,
value: Any
) -> None:
"""Replace or add (if not present) a field in a dictionary, following a
path of dot-separated keys. Inplace operation.

Args:
config (Dict): dictionary to be modified.
key_chain (str): path of dot-separated keys to specify the location
if the new value (e.g., 'foo.bar.line' adds/overwrites the value
located at config['foo']['bar']['line']).
value (Any): the value to insert.
"""
sub_config = config
for idx, k in enumerate(key_chain.split('.')):
if idx >= len(key_chain.split('.')) - 1:
# Last key reached
break
if not isinstance(sub_config.get(k), dict):
sub_config[k] = dict()
sub_config = sub_config[k]
sub_config[k] = value


def load_pipeline_step(
pipe: Union[str, Dict],
step_id: Union[str, int],
override_keys: Optional[Dict[str, Any]] = None,
verbose: bool = False
) -> Executable:
"""Instantiates a specific step from a pipeline configuration file, given
its ID (index if steps are a list, key if steps are a dictionary). It
allows to override the step configuration with user defined values.

Args:
pipe (Union[str, Dict]): pipeline configuration. Either a path to a
YAML file (if string), or a configuration in memory (if dict object).
step_id (Union[str, int]): step identifier: list index if steps are
represented as a list, string key if steps are represented as a
dictionary.
override_keys (Optional[Dict[str, Any]], optional): if given, maps key
path to the value to add/override. A key path is a string of
dot-separated keys (e.g., 'foo.bar.line' adds/overwrites the value
located at pipe['foo']['bar']['line']). Defaults to None.
verbose (bool, optional): if given, prints to console the new
configuration, obtained after overriding. Defaults to False.

Returns:
Executable: an instance of the selected step in the pipeline.
"""
if isinstance(pipe, str):
# Load pipe from YAML file path
pipe = load_yaml(pipe)
step_dict_config = pipe['executor']['init_args']['steps'][step_id]

# Override fields
if override_keys is not None:
for key_chain, value in override_keys.items():
add_replace_field(step_dict_config, key_chain, value)
if verbose:
import json
print(f"NEW STEP <ID:{step_id}> CONFIG:")
print(json.dumps(step_dict_config, indent=4))

# Wrap config under "step" field and parse it
step_dict_config = dict(step=step_dict_config)
step_parser = ArgumentParser()
step_parser.add_subclass_arguments(Executable, "step")
parsed_namespace = step_parser.parse_object(step_dict_config)
return step_parser.instantiate_classes(parsed_namespace)["step"]
77 changes: 77 additions & 0 deletions src/itwinai/torch/mlflow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
from typing import Dict, Optional
import os

import mlflow
import yaml


def _get_mlflow_logger_conf(pl_config: Dict) -> Optional[Dict]:
"""Extract MLFLowLogger configuration from pytorch lightning
configuration file, if present.

Args:
pl_config (Dict): lightning configuration loaded in memory.

Returns:
Optional[Dict]: if present, MLFLowLogger constructor arguments
(under 'init_args' key).
"""
if isinstance(pl_config['trainer']['logger'], list):
# If multiple loggers are provided
for logger_conf in pl_config['trainer']['logger']:
if logger_conf['class_path'].endswith('MLFlowLogger'):
return logger_conf['init_args']
elif pl_config['trainer']['logger']['class_path'].endswith('MLFlowLogger'):
return pl_config['trainer']['logger']['init_args']


def _mlflow_log_pl_config(pl_config: Dict, local_yaml_path: str) -> None:
os.makedirs(os.path.dirname(local_yaml_path), exist_ok=True)
with open(local_yaml_path, 'w') as outfile:
yaml.dump(pl_config, outfile, default_flow_style=False)
mlflow.log_artifact(local_yaml_path)


def init_lightning_mlflow(
pl_config: Dict,
default_experiment_name: str = 'Default',
**autolog_kwargs
) -> None:
"""Initialize mlflow for pytorch lightning, also setting up
auto-logging (mlflow.pytorch.autolog(...)). Creates a new mlflow
run and attaches it to the mlflow auto-logger.

Args:
pl_config (Dict): pytorch lightning configuration loaded in memory.
default_experiment_name (str, optional): used as experiment name
if it is not given in the lightning conf. Defaults to 'Default'.
**autolog_kwargs (kwargs): args for mlflow.pytorch.autolog(...).
"""
mlflow_conf: Optional[Dict] = _get_mlflow_logger_conf(pl_config)
if not mlflow_conf:
return

tracking_uri = mlflow_conf.get('tracking_uri')
if not tracking_uri:
save_path = mlflow_conf.get('save_dir')
tracking_uri = "file://" + os.path.abspath(save_path)

experiment_name = mlflow_conf.get('experiment_name')
if not experiment_name:
experiment_name = default_experiment_name

mlflow.set_tracking_uri(tracking_uri)
mlflow.set_experiment(experiment_name)
mlflow.pytorch.autolog(**autolog_kwargs)
mlflow.start_run()

mlflow_conf['experiment_name'] = experiment_name
mlflow_conf['run_id'] = mlflow.active_run().info.run_id

_mlflow_log_pl_config(pl_config, '.tmp/pl_config.yml')


def teardown_lightning_mlflow() -> None:
"""End active mlflow run, if any."""
if mlflow.active_run() is not None:
mlflow.end_run()
File renamed without changes.
41 changes: 37 additions & 4 deletions use-cases/3dgan/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,42 @@
# 3DGAN use case

First of all, from the repository root, create a torch environment:

```bash
make torch-gpu
```

Now, install custom requirements for 3DGAN:

```bash
micromamba activate ./.venv-pytorch
cd use-cases/3dgan
pip install -r requirements.txt
```

**NOTE**: Python commands below assumed to be executed from within the
micromamba virtual environment.

## Training

At CERN, use the dedicated configuration file:

```bash
cd use-cases/3dgan
python train.py -p cern-pipeline.yaml

# Or better:
micromamba run -p ../../.venv-pytorch/ torchrun --nproc_per_node gpu train.py -p cern-pipeline.yaml
```

Anywhere else, use the general purpose training configuration:

```bash
cd use-cases/3dgan
python train.py -p pipeline.yaml

# Or better:
micromamba run -p ../../.venv-pytorch/ torchrun --nproc_per_node gpu train.py -p pipeline.yaml
```

To visualize the logs with MLFLow run the following in the terminal:
Expand Down Expand Up @@ -85,11 +108,11 @@ Build from project root with

```bash
# Local
docker buildx build -t itwinai-mnist-torch-inference -f use-cases/3dgan/Dockerfile .
docker buildx build -t itwinai-mnist-torch-inference -f use-cases/3dgan/Dockerfile.inference .

# Ghcr.io
docker buildx build -t ghcr.io/intertwin-eu/itwinai-3dgan-inference:0.0.1 -f use-cases/3dgan/Dockerfile .
docker push ghcr.io/intertwin-eu/itwinai-3dgan-inference:0.0.1
docker buildx build -t ghcr.io/intertwin-eu/itwinai-3dgan-inference:0.0.3 -f use-cases/3dgan/Dockerfile.inference .
docker push ghcr.io/intertwin-eu/itwinai-3dgan-inference:0.0.3
```

From wherever a sample of MNIST jpg images is available
Expand All @@ -106,7 +129,7 @@ From wherever a sample of MNIST jpg images is available
```

```bash
docker run -it --rm --name running-inference -v "$PWD":/usr/data ghcr.io/intertwin-eu/itwinai-3dgan-inference:0.0.1
docker run -it --rm --name running-inference -v "$PWD":/tmp/data ghcr.io/intertwin-eu/itwinai-3dgan-inference:0.0.3
```

This command will store the results in a folder called "3dgan-generated-data":
Expand All @@ -120,3 +143,13 @@ This command will store the results in a folder called "3dgan-generated-data":
| │ ├── energy=1.664689540863037&angle=1.4906378984451294.pth
| │ ├── energy=1.664689540863037&angle=1.4906378984451294.jpg
```

### Singularity

Run overriding the working directory (`--pwd /usr/src/app`, restores Docker's WORKDIR)
and providing a writable filesystem (`-B "$PWD":/usr/data`):

```bash
singularity exec -B "$PWD":/usr/data docker://ghcr.io/intertwin-eu/itwinai-3dgan-inference:0.0.3 /
bash -c "cd /usr/src/app && python train.py -p inference-pipeline.yaml"
```
36 changes: 18 additions & 18 deletions use-cases/3dgan/cern-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ executor:
steps:
- class_path: dataloader.Lightning3DGANDownloader
init_args:
data_path: /eos/user/k/ktsolaki/data/3dgan_data # exp_data/
data_path: /eos/user/k/ktsolaki/data/3dgan_data
data_url: null # https://drive.google.com/drive/folders/1uPpz0tquokepptIfJenTzGpiENfo2xRX

- class_path: trainer.Lightning3DGANTrainer
Expand All @@ -17,22 +17,22 @@ executor:
accumulate_grad_batches: 1
barebones: false
benchmark: null
# callbacks:
# # - class_path: lightning.pytorch.callbacks.early_stopping.EarlyStopping
# # init_args:
# # monitor: val_loss
# # patience: 2
# - class_path: lightning.pytorch.callbacks.lr_monitor.LearningRateMonitor
# init_args:
# logging_interval: step
# # - class_path: lightning.pytorch.callbacks.ModelCheckpoint
# # init_args:
# # dirpath: checkpoints
# # filename: best-checkpoint
# # mode: min
# # monitor: val_loss
# # save_top_k: 1
# # verbose: true
callbacks:
- class_path: lightning.pytorch.callbacks.early_stopping.EarlyStopping
init_args:
monitor: val_generator_loss
patience: 2
- class_path: lightning.pytorch.callbacks.lr_monitor.LearningRateMonitor
init_args:
logging_interval: step
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
dirpath: checkpoints
filename: best-checkpoint
mode: min
monitor: val_generator_loss
save_top_k: 1
verbose: true
check_val_every_n_epoch: 1
default_root_dir: null
detect_anomaly: false
Expand Down Expand Up @@ -92,4 +92,4 @@ executor:
datapath: /eos/user/k/ktsolaki/data/3dgan_data/*.h5 # exp_data/*/*.h5
batch_size: 128
num_workers: 0
max_samples: 3000
max_samples: 10000
Loading