Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add multiprocessing #92

Open
wants to merge 99 commits into
base: developer
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 94 commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
0e2242e
Add files for multiprocessing
Apr 18, 2024
2add9b9
Update identify_associations_multiprocess.py
LFT18 Apr 18, 2024
f645ea4
Clean multiprocessing script
LFT18 Apr 19, 2024
f471704
Update __main__.py multiprocessing
LFT18 Apr 19, 2024
85c28e5
Update schema.py multiprocessing
LFT18 Apr 19, 2024
6330e92
Update __init__.py multiprocessing
LFT18 Apr 19, 2024
bbe1b4e
Update preprocessing.py
LFT18 Apr 19, 2024
820c554
:fire: clean-up duplicated src/move files (pkg was in main folder)
Apr 22, 2024
ce9a9dc
:sparkles: add identify_associations_multiprocess to src/move/tasks
Apr 23, 2024
5327223
:bug: make mutliprocessing not stale: assign # of threads for each pr…
Apr 23, 2024
eaa858a
Merge pull request #1 from enryH/main
LFT18 Apr 23, 2024
5ab5e59
Updated identify_associations_multiprocess.py
Apr 23, 2024
33f565a
Update config files for small tries
Apr 23, 2024
63f128b
Multiprocessing for analyze_latent
Apr 24, 2024
ca389d2
Analyze latent multiprocessing
Apr 24, 2024
e08a94b
Analyze latent multiprocessing
Apr 24, 2024
e94ef90
Fix bayes_k calculation
Apr 25, 2024
f4f0aa3
Fix analyze_latent_multiprocessing
Apr 26, 2024
6a0b665
Update and new functions
May 11, 2024
a5310a6
Delete files and fix multiloop
May 11, 2024
e67bb75
Clean identify_association_multiprocess.py
May 21, 2024
86bfed5
Clean analyze_latent multiprocessing.py
May 21, 2024
f9d4961
Update perturbations.py
LFT18 Jun 7, 2024
c2c49e8
Update perturbations.py
LFT18 Jun 10, 2024
0a4bcae
Delete src/move/tasks/analyze_latent_efficient.py
LFT18 Jun 13, 2024
ce20dac
Delete src/move/tasks/analyze_latent_multiprocessing.py
LFT18 Jun 13, 2024
4a72842
Delete src/move/tasks/identify_associations_multiprocess_loop.py
LFT18 Jun 13, 2024
f537d21
Delete src/move/tasks/identify_associations_multiprocess_may.py
LFT18 Jun 13, 2024
568aaa8
Delete src/move/tasks/identify_associations_selected.py
LFT18 Jun 13, 2024
ede3707
Delete src/move/tasks/analyze_latent_original.py
LFT18 Jun 13, 2024
5df8a01
Remove multiprocess_loop
LFT18 Jun 13, 2024
52c37fd
Remove multiprocess_loop
LFT18 Jun 13, 2024
2e29f23
:art: format with black
Jun 18, 2024
13678cb
Merge branch 'main' into LFT18-main
Jun 18, 2024
f92a862
Merge branch 'developer' into LFT18-main
Jun 18, 2024
b7824f7
:art: add trigger of actions from PR
Jun 20, 2024
e5253a2
:art: format with black
Jun 20, 2024
d4118a3
:fire: remove duplicated code and intermediate scripts
Jun 20, 2024
3e19e24
Merge branch 'developer' into LFT18-main
Jun 21, 2024
dbc0238
:bug: fix f-string formatting errors
Jun 24, 2024
80352e6
:bug: remove unused imports
Jun 24, 2024
488b4a4
:rewind: add configuration files back in from developer branch
Jul 3, 2024
9b3a27e
:art: isort imports
Jul 3, 2024
05d4c34
:construction: see if this advances CI to the next step
Jul 3, 2024
ebb72ad
:fire: remove intermediate files of development
Jul 3, 2024
efbfd5c
:construction: multiprocess only defined for bayes factors
Jul 3, 2024
fbbeb19
:bug: remove non-existing, intermediate tasks (used for developing), …
Jul 3, 2024
cb3ad30
:bug: also deactivate mutliprocessing for KS as it's not implemented
Jul 3, 2024
8c61d35
:art: fix flake8-bugbear issues except missing multiprocessing of t-t…
Jul 3, 2024
5aa03ff
:bug: format and fix import
Jul 3, 2024
8b06298
:bug: use perturb_continuous_data_extended from perturbations
Jul 3, 2024
6cfd1f8
:fire: comments and old configurations; format
Jul 4, 2024
8277891
:fire: remove duplicated functionality
Jul 4, 2024
44802eb
:sparkles: integrate multiprocessing into analyze_latent.py
Jul 4, 2024
f64d779
:sparkles: merge multiprocessing bayes factors into identify_associat…
Jul 4, 2024
6a17110
:fire: remove old schema entries, increase run time
Jul 5, 2024
b8b4769
:zip: do no save intermediate files for single-process bayes_approach
Jul 5, 2024
2927d3f
:fire: remove comments
Jul 5, 2024
202eb74
:fire: remove unused code
Jul 5, 2024
d6bc896
:art: reorder functions
Jul 5, 2024
b99ce97
:construction: move bayes_parallel to own module
Jul 5, 2024
171c915
:construction: unify interface
Jul 5, 2024
95c9f40
:fire: remove not-used code
Jul 5, 2024
e556fae
:art: start separating recurrent code into fcts
Jul 5, 2024
1a2774d
:art: adapt to look more similar to single-core bayes factor fct
Jul 5, 2024
680c164
:sparkles: add back masking of self-perturbed feat.
Jul 5, 2024
8f43c90
:art: initiailize logger at the top of the module
Jul 8, 2024
6178c8f
:sparkles: pass feature_mask to bayes_parallell
Jul 8, 2024
60ed227
:bug: add condition for masking
Jul 8, 2024
f5da671
:art: align masking strategies
Jul 8, 2024
7eae82d
:bug: fix cont perturbation
Jul 8, 2024
50a623e
:bug: remove redefintion of nan_mask
Jul 8, 2024
2df057b
:art: only define logger once in module
Jul 8, 2024
2e87e12
:art: align single process bayes and multiprocess bayes fct
Jul 8, 2024
aa2e5d9
:art: just document in code that this cannot happen
Jul 8, 2024
4041fcb
:zap: improve CI speed, reduce stability (-> one refit only)
Jul 8, 2024
2d004e0
:bug: use default no. of epochs + t-test needs 4 refits
Jul 8, 2024
8d65528
:zap: do not run t-test check (for now)
Jul 9, 2024
1dd6788
:zap: bump up bayes factor training
Jul 9, 2024
dc9020e
:art: train both refits with 100 epochs
Jul 9, 2024
9cd2a7b
:sparkles: add log2 option
Jul 9, 2024
a4911d7
:art: document some more
Jul 9, 2024
e0421bd
:zap: test multiprocess on continuous tutorial
Jul 9, 2024
c70d328
:bug: remove non-exisitng key
Jul 9, 2024
1c72316
:sparkles: build dataloader fct
Jul 9, 2024
58f08e4
:bug: fix minor bug (wrongly assigned feat)
Jul 9, 2024
f895237
:zap: move masking code into main fct of module
Jul 9, 2024
5eb7954
:art: move feat_mask creation out
Jul 9, 2024
8c4e53b
:ambulance: temp. fix of CI
Jul 9, 2024
49a93d0
:zap: do not build dataloaders for multiprocessing
Jul 9, 2024
709c674
:construction: test t-test again, re-run pert. w/o model training
Jul 10, 2024
6e65cc6
:sparkles: add categorical pert. to multiprocessing
Jul 10, 2024
4efbdd9
:fire: remove unused code
Jul 10, 2024
dab767a
:art: remove unused argument
Jul 10, 2024
980bbce
:rewind: checkout developer version
Jul 12, 2024
c26b2dd
:art: move shared key to base class
Jul 12, 2024
05c1735
:fire: remove comments and code duplications
Jul 12, 2024
c5002cd
:art: update type hints, remove unused import
Jul 12, 2024
fe8c48b
Merge branch 'developer' into main
enryH Aug 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 102 additions & 16 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ jobs:
run: pip install flake8 flake8-bugbear
- name: Lint with flake8
run: flake8 src
run-tutorial:
name: Run tutorial - random_small
# legacy testing of t-test
run-tutorial-ttest:
name: Run - random_small - t-test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
Expand All @@ -39,23 +40,107 @@ jobs:
cd tutorial
move-dl data=random_small task=encode_data --cfg job
move-dl data=random_small task=encode_data
- name: Train model and analyze latent space
run: |
cd tutorial
move-dl data=random_small task=random_small__latent --cfg job
move-dl data=random_small task=random_small__latent
# - name: Identify associations - t-test
# at least 4 refits needed for t-test
- name: Identify associations - t-test
run: |
cd tutorial
move-dl data=random_small task=random_small__id_assoc_ttest --cfg job
move-dl data=random_small task=random_small__id_assoc_ttest task.training_loop.num_epochs=30 task.num_refits=4
# categorical dataset pertubation - single and multiprocessed
run-tutorial-cat-pert-single:
name: Run - random_small - singleprocess
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: pip install .
- name: Prepare tutorial data
run: |
cd tutorial
move-dl data=random_small task=encode_data --cfg job
move-dl data=random_small task=encode_data
- name: Train model and analyze latent space
run: |
cd tutorial
move-dl data=random_small task=random_small__latent --cfg job
move-dl data=random_small task=random_small__latent task.training_loop.num_epochs=100
- name: Identify associations - bayes factors
run: |
cd tutorial
move-dl data=random_small task=random_small__id_assoc_bayes --cfg job
move-dl data=random_small task=random_small__id_assoc_bayes task.training_loop.num_epochs=30 task.num_refits=20
run-tutorial-cont:
name: Run tutorial - random_continuous
move-dl data=random_small task=random_small__id_assoc_bayes task.training_loop.num_epochs=100 task.num_refits=2
- name: Identify associations - bayes factors - w/o training
run: |
cd tutorial
move-dl data=random_small task=random_small__id_assoc_bayes --cfg job
move-dl data=random_small task=random_small__id_assoc_bayes task.training_loop.num_epochs=100 task.num_refits=2
run-tutorial-cat-pert-multi:
name: Run - random_small - multiprocess
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: pip install .
- name: Prepare tutorial data
run: |
cd tutorial
move-dl data=random_small task=encode_data --cfg job
move-dl data=random_small task=encode_data
- name: Train model and analyze latent space - multiprocess
run: |
cd tutorial
move-dl data=random_small task=random_small__latent --cfg job
move-dl data=random_small task=random_small__latent task.training_loop.num_epochs=100 task.multiprocess=true
- name: Identify associations - bayes factors - multiprocess
run: |
cd tutorial
move-dl data=random_small task=random_small__id_assoc_bayes --cfg job
move-dl data=random_small task=random_small__id_assoc_bayes task.training_loop.num_epochs=100 task.num_refits=2 task.multiprocess=true
- name: Identify associations - bayes factors - multiprocess w/o training
run: |
cd tutorial
move-dl data=random_small task=random_small__id_assoc_bayes --cfg job
move-dl data=random_small task=random_small__id_assoc_bayes task.training_loop.num_epochs=100 task.num_refits=2 task.multiprocess=true
# continous dataset perturbation - single and multiprocessed
run-tutorial-cont-pert-multi:
name: Run - random_continuous - multiprocess
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: pip install .
- name: Prepare tutorial data
run: |
cd tutorial
move-dl data=random_continuous task=encode_data --cfg job
move-dl data=random_continuous task=encode_data
- name: Train model and analyze latent space - multiprocess
run: |
cd tutorial
move-dl data=random_continuous task=random_continuous__latent task.multiprocess=true --cfg job
move-dl data=random_continuous task=random_continuous__latent task.multiprocess=true
- name: Identify associations - bayes factors - multiprocess
run: |
cd tutorial
move-dl data=random_continuous task=random_continuous__id_assoc_bayes task.multiprocess=true --cfg job
move-dl data=random_continuous task=random_continuous__id_assoc_bayes task.num_refits=1 task.multiprocess=true
- name: Identify associations - bayes factors - multiprocess w/o training
run: |
cd tutorial
move-dl data=random_continuous task=random_continuous__id_assoc_bayes task.multiprocess=true --cfg job
move-dl data=random_continuous task=random_continuous__id_assoc_bayes task.num_refits=1 task.multiprocess=true
run-tutorial-cont-pert-single:
name: Run - random_continuous - singleprocess
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
Expand All @@ -73,21 +158,22 @@ jobs:
cd tutorial
move-dl data=random_continuous task=random_continuous__latent --cfg job
move-dl data=random_continuous task=random_continuous__latent
- name: Identify associations - t-test
- name: Identify associations - bayes factors
run: |
cd tutorial
move-dl data=random_continuous task=random_continuous__id_assoc_ttest --cfg job
move-dl data=random_continuous task=random_continuous__id_assoc_ttest task.training_loop.num_epochs=30 task.num_refits=4
- name: Identify associations - bayes factors
move-dl data=random_continuous task=random_continuous__id_assoc_bayes --cfg job
move-dl data=random_continuous task=random_continuous__id_assoc_bayes task.num_refits=1
- name: Identify associations - bayes factors - w/o training (repeat)
run: |
cd tutorial
move-dl data=random_continuous task=random_continuous__id_assoc_bayes --cfg job
move-dl data=random_continuous task=random_continuous__id_assoc_bayes task.training_loop.num_epochs=30 task.num_refits=4
move-dl data=random_continuous task=random_continuous__id_assoc_bayes task.num_refits=1
# this reuses the same model trained in analyze latent space
- name: Identify associations - KS
run: |
cd tutorial
move-dl data=random_continuous task=random_continuous__id_assoc_ks --cfg job
move-dl data=random_continuous task=random_continuous__id_assoc_ks task.training_loop.num_epochs=30 task.num_refits=4
move-dl data=random_continuous task=random_continuous__id_assoc_ks task.num_refits=1

publish:
name: Publish package
Expand Down
3 changes: 0 additions & 3 deletions .gitignore
enryH marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -54,14 +54,11 @@ docs/build/
docs/source/_templates/

# VS Code settings
.vscode

# macOS
.DS_Store

# Root folder
/*.*
!/.gitignore
!/.readthedocs.yaml
!/LICENSE
!/MANIFEST.in
Expand Down
6 changes: 6 additions & 0 deletions src/move/conf/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ class InputConfig:
@dataclass
class ContinuousInputConfig(InputConfig):
scale: bool = True
log2: bool = False


@dataclass
Expand Down Expand Up @@ -134,6 +135,7 @@ class AnalyzeLatentConfig(TaskConfig):

feature_names: list[str] = field(default_factory=list)
reducer: dict[str, Any] = MISSING
multiprocess: bool = False


@dataclass
Expand Down Expand Up @@ -170,6 +172,8 @@ class IdentifyAssociationsConfig(TaskConfig):
class IdentifyAssociationsBayesConfig(IdentifyAssociationsConfig):
"""Configure the probabilistic approach to identify associations."""

multiprocess: bool = False # Default value is False
enryH marked this conversation as resolved.
Show resolved Hide resolved

...


Expand All @@ -184,6 +188,7 @@ class IdentifyAssociationsTTestConfig(IdentifyAssociationsConfig):
"""

num_latent: list[int] = MISSING
multiprocess: bool = False # Multiprocessiong not implemented for t-test approach


@dataclass
Expand All @@ -205,6 +210,7 @@ class IdentifyAssociationsKSConfig(IdentifyAssociationsConfig):

perturbed_feature_names: list[str] = field(default_factory=list)
target_feature_names: list[str] = field(default_factory=list)
multiprocess: bool = False # Multiprocessiong not implemented for KS approach


@dataclass
Expand Down
Loading