Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developer continuous #84

Merged
merged 73 commits into from
Jun 7, 2024
Merged

Developer continuous #84

merged 73 commits into from
Jun 7, 2024

Conversation

mpielies
Copy link
Member

No description provided.

ri-heme and others added 30 commits January 5, 2023 16:26
- new preprocessing
- work on cont. associations
- checking different config settings.

Highlights:
- preprocessing.py: feature_min_max function
-perturbations.py:
perturb...extended:
change target_dataset feature by feature for min/max
- identify_associations.py: branched flow depending on target_value
- *preprocessing.py:* feature_min_max to feature_stats (added std)
-* perturbations.py:* added std (1 for now) feature by feature

-*identify_associations.py:*
added predefined list of cont target values
It is used in:
- encode data (show before and after preprocessing)
- identify associations (plot each feature after perturbation)
Reorganizing the file identify_associations:
- ttest and bayes functions put outside
- dataloader preparation defined in functions
- save results function added
- single identify_associations function
- Most comments addressed
Reformat constant CONTINUOUS_TARGET_VALUES
Reused code put in main function:
Identify associations

Working branch for both modes (Continuous assoc finds self correlations)
- Tested bayes and ttest on new data
- Added config files for continuous test
TODO:
Create folder with files to create synthetic datasets
- Add GPU compatibility
- Fix typing
- Make test dataloader batch size configurable
- Add extra columns to output table
- Exploring bayes behaviour on continuous pert
- Plot feature associations and vae architecture as graphs
- Use VSCode debugger (json edited)
Added ks method to calculate distances
Functioning Kolmogorov Smirnov method

- QC of features is given as separate csv from KS scores
- schema: feature names of feature to visualize added
- Basic visualization functions added to dataset_distribution.py
Henry added 3 commits May 16, 2024 18:09
Henry added 4 commits May 31, 2024 16:45
Copy link
Collaborator

@ri-heme ri-heme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the help, Henry!

tutorial/config/task/random_continuous__id_assoc_ks.yaml Outdated Show resolved Hide resolved
tutorial/config/task/random_small__latent.yaml Outdated Show resolved Hide resolved
src/move/visualization/style.py Outdated Show resolved Hide resolved
@enryH enryH force-pushed the developer-continuous-v3 branch from 23cb783 to 0dc2a0f Compare June 4, 2024 11:52
setup.cfg Outdated Show resolved Hide resolved
@enryH enryH merged commit 6f5737a into developer Jun 7, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perturbations with continuous data
3 participants