Update dev #152

matbun · 2024-06-06T15:58:50Z

update dev with changes from main

* Backend (#59) * WIP: Tensorflow MNIST use-case * UPDATE: Tensorflow MNIST version * ADD: Backend * ADD: Use-case init * FIX: Paths and downloading of the data * FIX: Paths and downloading of the data * ADD: Setup, Config update * ADD: Setup, Config update * UPDATE: File movement into itwinai * FIX: Move utils from tensorflow to global folder * FIX: Add setup into torch Executable * ADD: MNIST Torch Use-case * FIX: Formatting * ADD: Lib * ADD: Lib * ADD: Tests, Fix Loggers * Update README.md * ADD: Tests * ADD: MLCC * ADD: Cyclones, Cyclones-pipe * ADD: TensorflowTrainer * UPDATE: Move TensorflowTrainer into Backend * FIX: Dependencies * ADD: Number of devices * ADD: initial version of TorchTrainer * update * update * ADD: distributed torch Trainer and decorator * ADD: New version of torch distribtued trainer and tests * ADD: load torch dist trainer form config file * ADD: multi-gpu pytorch trainer * ADD: download on login node * FIX: dataloaders in Trainer * FIX: add dataloaders into trainer * FIX: clear load and save state * ADD: Loggers * FIX: Log in a distributed environment * TensorFlow backend (#63) * UPDATE: Remove experimental distribution * ADD: Mnist distributed * ADD: Optional strategy * UPDATE: Conditional distribution * FIX: Dataloader for mnist * FIX: Model cloning lambda function for distributed scope * ADD: CycleGAN * UPDATE: Types * UPDATE: Types * ADD: Local distr * FIX: learning rates * ADD: CycleGAN distributed * FIX: Reduction * FIX: Distribution * ADD: tmp.py * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * UPDATE: Executors * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD:Initial VIRGO * UPDATE: Optional distribution, tensorflow-gpu * UPDATE: tensorflow-gpu dependency * ADD: Unify branches --------- Co-authored-by: User3574 <[email protected]> * Refacto entire code base * ADD: workflows folder * FIX: refactor * FIX: linting * ADD: how to run use case doc * ADD: workflows doc * FIX: MD linter * Pipe MNIST lightning (#86) * ADD: lightning distributed + pipeline * UPDATE: jscpd threshold * UPDATE: super linter ignore use cases * ADD: jscpd ignore loggers * Functional tests for MNIST (#87) * ADD: use case tests * FIX: move use case models out of itwinai * FIX: rearrange modules * ADD: ConsoleLogger and LoggersCollection * FIX: loggers filter * FIX: add TF env creation * UPDATE: test flag * ADD: early pytest on slurm * FIX: duplicated code in TF Trainer * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * 3dgan use case (#94) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Sqaaas code (#96) * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml * Update sqaaas.yml * ADD: adaptive branch discovery for SQAaaS actin * Trigger only on main and dev branches * ADD: double quote * Trigger pytest only on main and dev PRs * Torch mnist inference (#95) * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * Remove keras dependency * 3dgan integration (#97) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * 3dgan integration (#98) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * fixed distributed trainer in cyclones use case * 3dgan integration (#118) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Unit test 4 dev (#113) * Define a step for pytest execution * Fix: use v1 of step action * Print result of step composition * Rename step * Use step previous definition in the assessment * Rename input: workflow -> steps * Avoid caching by using 1.0.0 * Set container image * Bump to v1 * Bump to sqaaas-assessment-action@v2 * Remove 'id' property * Adapt inputs to v2 * Remove current branch * Disable test_cyclones_train_tf * ADD marker * ADD skip memory heavy * Disable for PRs --------- Co-authored-by: Matteo Bunino <[email protected]> * Distributed strategy launcher (#117) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * Distributed strategy launcher (#127) Update ParseConfig * Distributed strategy launcher (#128) Remove experimental files * Docs dev (#132) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> * Distributed strategy launcher (#131) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * 3dgan integration (#134) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path * ADD offloading of 3DGAN training * ADAPT 3DGAN training for singularity execution * UPDATE test and fix linter --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Docs dev (#135) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem * UPDATE requirements --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> * Distributed strategy launcher (#137) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Update distributed.py * Update tfmirrored_slurm.sh * Update train.py * TF updates * Add README * Python venv (#136) * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD pypi deployment * DISABLE push debug * UPDATE pypi * UPDATE classifiers * Update pyproject.toml --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * Update README.md * Distributed strategy launcher (#141) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Update distributed.py * Update tfmirrored_slurm.sh * Update train.py * TF updates * Add README * Python venv (#136) * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD pypi deployment * DISABLE push debug * UPDATE pypi * UPDATE classifiers * Update pyproject.toml * Update README.md * Cyclone tf dist (#130) * get_stretegy * UPDATE distributed strategy * change req file * cycline tf dist * small bugs * fix bug in train.py * REFACTOR cyclones use case * Activate pytest * NEW TensorFlow trainer * ADD user information --------- Co-authored-by: ruettgers1 <[email protected]> Co-authored-by: Matteo Bunino <[email protected]> * Interactive distrib ml (#139) Add examples for distributed ml in interactive mode * Interactive distrib ml (#140) Update tutorial * Disable documentation GH action * Remove action --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: MarioRuettgers <[email protected]> * Merge main (#142) Bring changes on main into dev * Virgo integration (#143) * ADD Virgo data pipeline and some refactoring * FIX typo * UPDATE README * ADD training * ADD TrainingConfiguration * ADD distributed training and refactor * update readme * UPDATE loggers and add tests * Refactor * FIX typo * UPDATE use cases instructions * ADD checkpointing and refactor. * FIX linter * FIX jscpd * FIX jscpd * Disable jscpd * Refactor loggers * ADD loggers to Virgo use case * Update AUTHORS.md * Update AUTHORS.md * Docs dev (#144) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem * UPDATE requirements * Remove unnecessary dependencies * Add docstring * adding latest changes from dev * new content and changes * Update index.rst toctree revise * adding pages for distributed ml tutorials * new shpinx reqs to solve build failing * Docs update: - python code format fixed - added brief explanation on ddp in new section * requirements changed * UPDATE requirements * UPDATE requirements and itwinai.types * ADD CMake and GCC installation * UPDATE CMake and GCC installation * UPDATE CMake and GCC installation * ADD notebooks * Disable notebooks section * FIX TOC * Saving local changes before pulling from remote * saving updates before pull from origin * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * adding cyclones and virgo use cases pages * FIX build errors * Update TOC * Update TOC --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> Co-authored-by: Killian Verder <[email protected]> --------- Co-authored-by: Roman Machacek <[email protected]> Co-authored-by: linxUser3574 <[email protected]> Co-authored-by: orviz <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: VerderK <[email protected]> Co-authored-by: MarioRuettgers <[email protected]> Co-authored-by: Killian Verder <[email protected]>

* ADD quick install for users * UPDATE installer * fix framework selection * UPDATE installer

* UPDATE print patch and refactor * Cleanup * Cleanup * Cleanup * Cleanup * FIX broken import * UPDATE docs * FIX docstring parsing * Preserve ordering

* Update README.md * ADD missing doctrings

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v4...v5) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Co-authored-by: KalliopiTsolaki <ktsolaki@LAPTOP-4683QBL6>

* Update train.py * Update generic_tf.sh * Update pyproject.toml * Update train.py * Fix: head problems with MacOS * Fixes for MacOS support * Fix: Update basic_components.py * Addition of cerfacs use-case * Update README.md * Update train.py

review-notebook-app · 2024-06-06T15:58:55Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

* updating doc pages * testing if changing the GH edit url works * adding repo link in toc --------- Co-authored-by: KalliopiTsolaki <ktsolaki@LAPTOP-4683QBL6>

* Backend (#59) * WIP: Tensorflow MNIST use-case * UPDATE: Tensorflow MNIST version * ADD: Backend * ADD: Use-case init * FIX: Paths and downloading of the data * FIX: Paths and downloading of the data * ADD: Setup, Config update * ADD: Setup, Config update * UPDATE: File movement into itwinai * FIX: Move utils from tensorflow to global folder * FIX: Add setup into torch Executable * ADD: MNIST Torch Use-case * FIX: Formatting * ADD: Lib * ADD: Lib * ADD: Tests, Fix Loggers * Update README.md * ADD: Tests * ADD: MLCC * ADD: Cyclones, Cyclones-pipe * ADD: TensorflowTrainer * UPDATE: Move TensorflowTrainer into Backend * FIX: Dependencies * ADD: Number of devices * ADD: initial version of TorchTrainer * update * update * ADD: distributed torch Trainer and decorator * ADD: New version of torch distribtued trainer and tests * ADD: load torch dist trainer form config file * ADD: multi-gpu pytorch trainer * ADD: download on login node * FIX: dataloaders in Trainer * FIX: add dataloaders into trainer * FIX: clear load and save state * ADD: Loggers * FIX: Log in a distributed environment * TensorFlow backend (#63) * UPDATE: Remove experimental distribution * ADD: Mnist distributed * ADD: Optional strategy * UPDATE: Conditional distribution * FIX: Dataloader for mnist * FIX: Model cloning lambda function for distributed scope * ADD: CycleGAN * UPDATE: Types * UPDATE: Types * ADD: Local distr * FIX: learning rates * ADD: CycleGAN distributed * FIX: Reduction * FIX: Distribution * ADD: tmp.py * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * UPDATE: Executors * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD:Initial VIRGO * UPDATE: Optional distribution, tensorflow-gpu * UPDATE: tensorflow-gpu dependency * ADD: Unify branches --------- Co-authored-by: User3574 <[email protected]> * Refacto entire code base * ADD: workflows folder * FIX: refactor * FIX: linting * ADD: how to run use case doc * ADD: workflows doc * FIX: MD linter * Pipe MNIST lightning (#86) * ADD: lightning distributed + pipeline * UPDATE: jscpd threshold * UPDATE: super linter ignore use cases * ADD: jscpd ignore loggers * Functional tests for MNIST (#87) * ADD: use case tests * FIX: move use case models out of itwinai * FIX: rearrange modules * ADD: ConsoleLogger and LoggersCollection * FIX: loggers filter * FIX: add TF env creation * UPDATE: test flag * ADD: early pytest on slurm * FIX: duplicated code in TF Trainer * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * 3dgan use case (#94) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Sqaaas code (#96) * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml * Update sqaaas.yml * ADD: adaptive branch discovery for SQAaaS actin * Trigger only on main and dev branches * ADD: double quote * Trigger pytest only on main and dev PRs * Torch mnist inference (#95) * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * Remove keras dependency * 3dgan integration (#97) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * 3dgan integration (#98) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * fixed distributed trainer in cyclones use case * 3dgan integration (#118) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Unit test 4 dev (#113) * Define a step for pytest execution * Fix: use v1 of step action * Print result of step composition * Rename step * Use step previous definition in the assessment * Rename input: workflow -> steps * Avoid caching by using 1.0.0 * Set container image * Bump to v1 * Bump to sqaaas-assessment-action@v2 * Remove 'id' property * Adapt inputs to v2 * Remove current branch * Disable test_cyclones_train_tf * ADD marker * ADD skip memory heavy * Disable for PRs --------- Co-authored-by: Matteo Bunino <[email protected]> * Distributed strategy launcher (#117) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * Distributed strategy launcher (#127) Update ParseConfig * Distributed strategy launcher (#128) Remove experimental files * Docs dev (#132) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> * Distributed strategy launcher (#131) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * 3dgan integration (#134) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path * ADD offloading of 3DGAN training * ADAPT 3DGAN training for singularity execution * UPDATE test and fix linter --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Docs dev (#135) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem * UPDATE requirements --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> * Distributed strategy launcher (#137) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Update distributed.py * Update tfmirrored_slurm.sh * Update train.py * TF updates * Add README * Python venv (#136) * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD pypi deployment * DISABLE push debug * UPDATE pypi * UPDATE classifiers * Update pyproject.toml --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * Update README.md * Distributed strategy launcher (#141) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Update distributed.py * Update tfmirrored_slurm.sh * Update train.py * TF updates * Add README * Python venv (#136) * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD pypi deployment * DISABLE push debug * UPDATE pypi * UPDATE classifiers * Update pyproject.toml * Update README.md * Cyclone tf dist (#130) * get_stretegy * UPDATE distributed strategy * change req file * cycline tf dist * small bugs * fix bug in train.py * REFACTOR cyclones use case * Activate pytest * NEW TensorFlow trainer * ADD user information --------- Co-authored-by: ruettgers1 <[email protected]> Co-authored-by: Matteo Bunino <[email protected]> * Interactive distrib ml (#139) Add examples for distributed ml in interactive mode * Interactive distrib ml (#140) Update tutorial * Disable documentation GH action * Remove action --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: MarioRuettgers <[email protected]> * Merge main (#142) Bring changes on main into dev * Virgo integration (#143) * ADD Virgo data pipeline and some refactoring * FIX typo * UPDATE README * ADD training * ADD TrainingConfiguration * ADD distributed training and refactor * update readme * UPDATE loggers and add tests * Refactor * FIX typo * UPDATE use cases instructions * ADD checkpointing and refactor. * FIX linter * FIX jscpd * FIX jscpd * Disable jscpd * Refactor loggers * ADD loggers to Virgo use case * Update AUTHORS.md * Update AUTHORS.md * Docs dev (#144) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem * UPDATE requirements * Remove unnecessary dependencies * Add docstring * adding latest changes from dev * new content and changes * Update index.rst toctree revise * adding pages for distributed ml tutorials * new shpinx reqs to solve build failing * Docs update: - python code format fixed - added brief explanation on ddp in new section * requirements changed * UPDATE requirements * UPDATE requirements and itwinai.types * ADD CMake and GCC installation * UPDATE CMake and GCC installation * UPDATE CMake and GCC installation * ADD notebooks * Disable notebooks section * FIX TOC * Saving local changes before pulling from remote * saving updates before pull from origin * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * adding cyclones and virgo use cases pages * FIX build errors * Update TOC * Update TOC --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> Co-authored-by: Killian Verder <[email protected]> * Update dev (#152) * Dev - itwinai 0.0.2 (#138) * Backend (#59) * WIP: Tensorflow MNIST use-case * UPDATE: Tensorflow MNIST version * ADD: Backend * ADD: Use-case init * FIX: Paths and downloading of the data * FIX: Paths and downloading of the data * ADD: Setup, Config update * ADD: Setup, Config update * UPDATE: File movement into itwinai * FIX: Move utils from tensorflow to global folder * FIX: Add setup into torch Executable * ADD: MNIST Torch Use-case * FIX: Formatting * ADD: Lib * ADD: Lib * ADD: Tests, Fix Loggers * Update README.md * ADD: Tests * ADD: MLCC * ADD: Cyclones, Cyclones-pipe * ADD: TensorflowTrainer * UPDATE: Move TensorflowTrainer into Backend * FIX: Dependencies * ADD: Number of devices * ADD: initial version of TorchTrainer * update * update * ADD: distributed torch Trainer and decorator * ADD: New version of torch distribtued trainer and tests * ADD: load torch dist trainer form config file * ADD: multi-gpu pytorch trainer * ADD: download on login node * FIX: dataloaders in Trainer * FIX: add dataloaders into trainer * FIX: clear load and save state * ADD: Loggers * FIX: Log in a distributed environment * TensorFlow backend (#63) * UPDATE: Remove experimental distribution * ADD: Mnist distributed * ADD: Optional strategy * UPDATE: Conditional distribution * FIX: Dataloader for mnist * FIX: Model cloning lambda function for distributed scope * ADD: CycleGAN * UPDATE: Types * UPDATE: Types * ADD: Local distr * FIX: learning rates * ADD: CycleGAN distributed * FIX: Reduction * FIX: Distribution * ADD: tmp.py * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * UPDATE: Executors * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD:Initial VIRGO * UPDATE: Optional distribution, tensorflow-gpu * UPDATE: tensorflow-gpu dependency * ADD: Unify branches --------- Co-authored-by: User3574 <[email protected]> * Refacto entire code base * ADD: workflows folder * FIX: refactor * FIX: linting * ADD: how to run use case doc * ADD: workflows doc * FIX: MD linter * Pipe MNIST lightning (#86) * ADD: lightning distributed + pipeline * UPDATE: jscpd threshold * UPDATE: super linter ignore use cases * ADD: jscpd ignore loggers * Functional tests for MNIST (#87) * ADD: use case tests * FIX: move use case models out of itwinai * FIX: rearrange modules * ADD: ConsoleLogger and LoggersCollection * FIX: loggers filter * FIX: add TF env creation * UPDATE: test flag * ADD: early pytest on slurm * FIX: duplicated code in TF Trainer * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * 3dgan use case (#94) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Sqaaas code (#96) * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml * Update sqaaas.yml * ADD: adaptive branch discovery for SQAaaS actin * Trigger only on main and dev branches * ADD: double quote * Trigger pytest only on main and dev PRs * Torch mnist inference (#95) * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * Remove keras dependency * 3dgan integration (#97) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * 3dgan integration (#98) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * fixed distributed trainer in cyclones use case * 3dgan integration (#118) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: thr…

* Backend (#59) * WIP: Tensorflow MNIST use-case * UPDATE: Tensorflow MNIST version * ADD: Backend * ADD: Use-case init * FIX: Paths and downloading of the data * FIX: Paths and downloading of the data * ADD: Setup, Config update * ADD: Setup, Config update * UPDATE: File movement into itwinai * FIX: Move utils from tensorflow to global folder * FIX: Add setup into torch Executable * ADD: MNIST Torch Use-case * FIX: Formatting * ADD: Lib * ADD: Lib * ADD: Tests, Fix Loggers * Update README.md * ADD: Tests * ADD: MLCC * ADD: Cyclones, Cyclones-pipe * ADD: TensorflowTrainer * UPDATE: Move TensorflowTrainer into Backend * FIX: Dependencies * ADD: Number of devices * ADD: initial version of TorchTrainer * update * update * ADD: distributed torch Trainer and decorator * ADD: New version of torch distribtued trainer and tests * ADD: load torch dist trainer form config file * ADD: multi-gpu pytorch trainer * ADD: download on login node * FIX: dataloaders in Trainer * FIX: add dataloaders into trainer * FIX: clear load and save state * ADD: Loggers * FIX: Log in a distributed environment * TensorFlow backend (#63) * UPDATE: Remove experimental distribution * ADD: Mnist distributed * ADD: Optional strategy * UPDATE: Conditional distribution * FIX: Dataloader for mnist * FIX: Model cloning lambda function for distributed scope * ADD: CycleGAN * UPDATE: Types * UPDATE: Types * ADD: Local distr * FIX: learning rates * ADD: CycleGAN distributed * FIX: Reduction * FIX: Distribution * ADD: tmp.py * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * UPDATE: Executors * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD:Initial VIRGO * UPDATE: Optional distribution, tensorflow-gpu * UPDATE: tensorflow-gpu dependency * ADD: Unify branches --------- Co-authored-by: User3574 <[email protected]> * Refacto entire code base * ADD: workflows folder * FIX: refactor * FIX: linting * ADD: how to run use case doc * ADD: workflows doc * FIX: MD linter * Pipe MNIST lightning (#86) * ADD: lightning distributed + pipeline * UPDATE: jscpd threshold * UPDATE: super linter ignore use cases * ADD: jscpd ignore loggers * Functional tests for MNIST (#87) * ADD: use case tests * FIX: move use case models out of itwinai * FIX: rearrange modules * ADD: ConsoleLogger and LoggersCollection * FIX: loggers filter * FIX: add TF env creation * UPDATE: test flag * ADD: early pytest on slurm * FIX: duplicated code in TF Trainer * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * 3dgan use case (#94) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Sqaaas code (#96) * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml * Update sqaaas.yml * ADD: adaptive branch discovery for SQAaaS actin * Trigger only on main and dev branches * ADD: double quote * Trigger pytest only on main and dev PRs * Torch mnist inference (#95) * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * Remove keras dependency * 3dgan integration (#97) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * 3dgan integration (#98) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * fixed distributed trainer in cyclones use case * 3dgan integration (#118) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Unit test 4 dev (#113) * Define a step for pytest execution * Fix: use v1 of step action * Print result of step composition * Rename step * Use step previous definition in the assessment * Rename input: workflow -> steps * Avoid caching by using 1.0.0 * Set container image * Bump to v1 * Bump to sqaaas-assessment-action@v2 * Remove 'id' property * Adapt inputs to v2 * Remove current branch * Disable test_cyclones_train_tf * ADD marker * ADD skip memory heavy * Disable for PRs --------- Co-authored-by: Matteo Bunino <[email protected]> * Distributed strategy launcher (#117) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * Distributed strategy launcher (#127) Update ParseConfig * Distributed strategy launcher (#128) Remove experimental files * Docs dev (#132) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> * Distributed strategy launcher (#131) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * 3dgan integration (#134) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path * ADD offloading of 3DGAN training * ADAPT 3DGAN training for singularity execution * UPDATE test and fix linter --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Docs dev (#135) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem * UPDATE requirements --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> * Distributed strategy launcher (#137) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Update distributed.py * Update tfmirrored_slurm.sh * Update train.py * TF updates * Add README * Python venv (#136) * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD pypi deployment * DISABLE push debug * UPDATE pypi * UPDATE classifiers * Update pyproject.toml --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * Update README.md * Distributed strategy launcher (#141) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Update distributed.py * Update tfmirrored_slurm.sh * Update train.py * TF updates * Add README * Python venv (#136) * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD pypi deployment * DISABLE push debug * UPDATE pypi * UPDATE classifiers * Update pyproject.toml * Update README.md * Cyclone tf dist (#130) * get_stretegy * UPDATE distributed strategy * change req file * cycline tf dist * small bugs * fix bug in train.py * REFACTOR cyclones use case * Activate pytest * NEW TensorFlow trainer * ADD user information --------- Co-authored-by: ruettgers1 <[email protected]> Co-authored-by: Matteo Bunino <[email protected]> * Interactive distrib ml (#139) Add examples for distributed ml in interactive mode * Interactive distrib ml (#140) Update tutorial * Disable documentation GH action * Remove action --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: MarioRuettgers <[email protected]> * Merge main (#142) Bring changes on main into dev * Virgo integration (#143) * ADD Virgo data pipeline and some refactoring * FIX typo * UPDATE README * ADD training * ADD TrainingConfiguration * ADD distributed training and refactor * update readme * UPDATE loggers and add tests * Refactor * FIX typo * UPDATE use cases instructions * ADD checkpointing and refactor. * FIX linter * FIX jscpd * FIX jscpd * Disable jscpd * Refactor loggers * ADD loggers to Virgo use case * Update AUTHORS.md * Update AUTHORS.md * Docs dev (#144) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem * UPDATE requirements * Remove unnecessary dependencies * Add docstring * adding latest changes from dev * new content and changes * Update index.rst toctree revise * adding pages for distributed ml tutorials * new shpinx reqs to solve build failing * Docs update: - python code format fixed - added brief explanation on ddp in new section * requirements changed * UPDATE requirements * UPDATE requirements and itwinai.types * ADD CMake and GCC installation * UPDATE CMake and GCC installation * UPDATE CMake and GCC installation * ADD notebooks * Disable notebooks section * FIX TOC * Saving local changes before pulling from remote * saving updates before pull from origin * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * adding cyclones and virgo use cases pages * FIX build errors * Update TOC * Update TOC --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> Co-authored-by: Killian Verder <[email protected]> * Update dev (#152) * Dev - itwinai 0.0.2 (#138) * Backend (#59) * WIP: Tensorflow MNIST use-case * UPDATE: Tensorflow MNIST version * ADD: Backend * ADD: Use-case init * FIX: Paths and downloading of the data * FIX: Paths and downloading of the data * ADD: Setup, Config update * ADD: Setup, Config update * UPDATE: File movement into itwinai * FIX: Move utils from tensorflow to global folder * FIX: Add setup into torch Executable * ADD: MNIST Torch Use-case * FIX: Formatting * ADD: Lib * ADD: Lib * ADD: Tests, Fix Loggers * Update README.md * ADD: Tests * ADD: MLCC * ADD: Cyclones, Cyclones-pipe * ADD: TensorflowTrainer * UPDATE: Move TensorflowTrainer into Backend * FIX: Dependencies * ADD: Number of devices * ADD: initial version of TorchTrainer * update * update * ADD: distributed torch Trainer and decorator * ADD: New version of torch distribtued trainer and tests * ADD: load torch dist trainer form config file * ADD: multi-gpu pytorch trainer * ADD: download on login node * FIX: dataloaders in Trainer * FIX: add dataloaders into trainer * FIX: clear load and save state * ADD: Loggers * FIX: Log in a distributed environment * TensorFlow backend (#63) * UPDATE: Remove experimental distribution * ADD: Mnist distributed * ADD: Optional strategy * UPDATE: Conditional distribution * FIX: Dataloader for mnist * FIX: Model cloning lambda function for distributed scope * ADD: CycleGAN * UPDATE: Types * UPDATE: Types * ADD: Local distr * FIX: learning rates * ADD: CycleGAN distributed * FIX: Reduction * FIX: Distribution * ADD: tmp.py * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * UPDATE: Executors * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD:Initial VIRGO * UPDATE: Optional distribution, tensorflow-gpu * UPDATE: tensorflow-gpu dependency * ADD: Unify branches --------- Co-authored-by: User3574 <[email protected]> * Refacto entire code base * ADD: workflows folder * FIX: refactor * FIX: linting * ADD: how to run use case doc * ADD: workflows doc * FIX: MD linter * Pipe MNIST lightning (#86) * ADD: lightning distributed + pipeline * UPDATE: jscpd threshold * UPDATE: super linter ignore use cases * ADD: jscpd ignore loggers * Functional tests for MNIST (#87) * ADD: use case tests * FIX: move use case models out of itwinai * FIX: rearrange modules * ADD: ConsoleLogger and LoggersCollection * FIX: loggers filter * FIX: add TF env creation * UPDATE: test flag * ADD: early pytest on slurm * FIX: duplicated code in TF Trainer * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * 3dgan use case (#94) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Sqaaas code (#96) * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml * Update sqaaas.yml * ADD: adaptive branch discovery for SQAaaS actin * Trigger only on main and dev branches * ADD: double quote * Trigger pytest only on main and dev PRs * Torch mnist inference (#95) * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * Remove keras dependency * 3dgan integration (#97) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * 3dgan integration (#98) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * fixed distributed trainer in cyclones use case * 3dgan integration (#118) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Unit test 4 dev (#113) * Define a step for pytest execution * Fix: use v1 of step action * Print result of step composition * Rename step * Use step previous definition in the assessment * Rename input: workflow -> steps * Avoid caching by using 1.0.0 * Set container image * Bump to v1 * Bump to sqaaas-assessment-action@v2 * Remove 'id' property * Adapt inputs to v2 * Remove current branch * Disable test_cyclones_train_tf * ADD marker * ADD skip memory heavy * Disable for PRs --------- Co-authored-by: Matteo Bunino <[email protected]> * Distributed strategy launcher (#117) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed to…

matbun and others added 16 commits May 31, 2024 16:39

Delete .github/workflows/pages.yml

eaf8825

ADD quick install for users (#145)

268fc83

User install (#146)

848091e

* ADD quick install for users * UPDATE installer * fix framework selection * UPDATE installer

Update README.md

fbf05a3

Update README.md

f54b2fc

Improve docstring parsing and refactor (#147)

6b0c90f

* UPDATE print patch and refactor * Cleanup * Cleanup * Cleanup * Cleanup * FIX broken import * UPDATE docs * FIX docstring parsing * Preserve ordering

Update cli.py

05d2067

Update docs (#148)

ab1bb97

* Update README.md * ADD missing doctrings

Update README.md

bef03a5

Update README.md

9d65a6c

Update README.md

573a671

updating doc pages (#150)

ff94c99

Co-authored-by: KalliopiTsolaki <ktsolaki@LAPTOP-4683QBL6>

Update cyclones_doc.rst

3c06056

matbun and others added 9 commits June 6, 2024 18:22

Update cyclones_doc.rst

17c8e94

Update startscript.sh

f656fa3

Update pyproject.toml

1a77bb1

Update mnist.py

d38f7f3

Update mnist.py

a5b0168

Update generic_tf.sh

53c822f

Update requirements.txt

479c73d

Update requirements.txt

bb2e5ff

Docs changes (#153)

802fb7b

* updating doc pages * testing if changing the GH edit url works * adding repo link in toc --------- Co-authored-by: KalliopiTsolaki <ktsolaki@LAPTOP-4683QBL6>

matbun had a problem deploying to pypi June 11, 2024 11:52 — with GitHub Actions Failure

matbun had a problem deploying to pypi June 11, 2024 11:55 — with GitHub Actions Failure

Update pyproject.toml

96fd2f5

matbun temporarily deployed to pypi June 11, 2024 12:02 — with GitHub Actions Inactive

matbun merged commit be3ec87 into dev Jun 11, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dev #152

Update dev #152

matbun commented Jun 6, 2024

review-notebook-app bot commented Jun 6, 2024

Update dev #152

Update dev #152

Conversation

matbun commented Jun 6, 2024

review-notebook-app bot commented Jun 6, 2024