-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
04. Model for Stain3 #6
Comments
Baseline calculation Stain3Just like Stain2, I calculated the baseline PR and mAP for all plates to compare the model performance with. Benchmark Stain3
|
AnalysisFirst analysis will be done on plate BR00115125, BR00115126, and BR00115127 of S3 using the best model trained on Stain2 (S2) (#5 (comment)). Processing the plates takes quite some time so it may take some days until I can do a full Stain3 (S3) analysis. It also appears that S3 plates are two times as large in size as the S2 plates. I'm assuming that's due to higher cell seeding (because nr. of features remains the same), but this is not mentioned in the S3 issue https://github.com/jump-cellpainting/pilot-analysis/issues/20. @niranjchandrasekaran Do you have an idea of why this is the case? HypothesisBased on previous experiments in S2 (#5), I found that there appears to be a correlation between the performance of the model on a given plate and the correlation between the PC1 loadings of that plate and the plates used to train the model on. Given that correlation I calculate the correlation between the PC1 loadings of both S2 and S3 plates, shown below. Results
As expected the baseline outperforms the model when it is trained on Stain2 plates. Next upBefore training the model on Stain3 plates and evaluating it on those (as I expect this will be relatively easy), I will try to improve the generalization of the model a bit more by tweaking the training process on Stain2 plates a bit. |
ExperimentTo see if the model's dependency on specific features can be reduced, I added a dropout layer on top of the input. This means that some fraction of the values in the feature-cell array will be set to zero during training. I started off with a fraction of 0.2 and then used 0.3, i.e. 20 or 30% of the input values are set to zero during training at random. Main takeawaysAlthough we are handing in some performance on the training plates and training compounds in terms of mAP, there are some (small) gains in terms of validation compounds (both in training and validation plates). I think this is a desired effect, so I will keep dropout in mind as a future hyperparameter to tweak. Dropout 0.2 resultsStain3
Stain2
Dropout 0.3 resultsStain3
Stain2
|
ExperimentThis experiment uses the same model setup as in #5 (comment) to train and validate on cluster 4, which contains Stain3 plates. The results are compared against the benchmark results (mean aggregation with feature selection). As clarification I plotted the zoomed in hierarchical clustering map of cluster 4 as well. I used the best validation model for inference. Main takeaways
Click here for results!Note that BR00115126 is not in cluster 4 and serves as a sanity check, i.e. the model is expected to underperform on that plate.
|
ExperimentI noticed that the number of cells per well is much higher in the Stain3 plates than in the previously used Stain2 plates (~6000-7000 versus ~3000-4000 respectively, see the first comment in this issue). So I increased the number of cells sampled from 1500 (std 400) to 4000 (std 900) in this experiment. Main takeaways
table
|
ExperimentI increased the number of training plates to 5 and switched them around as well. Then I trained the model with the current default setup. Main takeawaysThe training compounds on training plates do not have higher mAP values than the BM. Using 5 training plates requires the model to train for longer. table!
|
ExperimentI discovered a mistake in my code, which lead to the model always being trained on 2 plates (instead of the 3 or 5 I wanted to use in previous experiments). I have now fixed this and trained the model again with 3 plates, while still using the higher cell count as used in the previous two experiments. Main takeaways
TableTime!
|
Sorry, I tried to send this comment via email yesterday, but it apparently didn't take! For Stain2, because we were in a time crunch, we only analyzed 4 sites per well; all other experiments should typically be 9 sites per well. The discrepancy in cell count is therefore expected. (Link for Broad folks only - scroll up for more context if needed https://broadinstitute.slack.com/archives/C3QFQ3WQM/p1594911343036200?thread_ts=1594911343.036200&cid=C3QFQ3WQM ) |
Thanks Beth!! I had completely forgotten this and have been going through my emails and notes for a week, looking for an explanation. |
MOA prediction / percent matching benchmark for Stain3These scores here are calculated by considering the moa tag as replicate pairs. I calculate the mAP and percent matching by both including all compounds that target a specific moa (inc. all compounds) and by only considering the sister compounds (different compound targeting same moa) as a replicate (exclusively sister compounds). BM all plates Stain3 cluster (inc. all compounds)
Benchmark MOA prediction with model (inc. all compounds)Using model shown in #6 (comment)
BM all plates Stain3 cluster (exclusively sister compounds)
Benchmark MOA prediction with model (exclusively sister compounds)Using model shown in #6 (comment)
|
Hyperparameter sweepI ran a large hyperparameter sweep to get an idea of what hyperparameters influence model generalization to validation compounds. The most important parameters I varied were:
Main takeaways
The full sweep report can be viewed here: |
ExperimentUsing the best hyperparameters from the previous sweep I trained a model for 150 epochs. This experiment is to see if:
Main takeaways
Next up
Model with best training loss
Model with best validation loss
|
Experiment wide modelTo explore current deep learning generalization research, I train a very wide model (4096 parameters in each layer instead of the usual 512) and evaluate it's performance. I also trained this model for more than 150 epochs to understand the generalization curve better. Main takeawaysThe overfit is larger than it was before. One of the possible explanations would be that the model is still underfitting the training data. To explore it further I would have to train an even wider model, but I will leave this option for now. Next upInstead I will use more training plates to see if the mAP on the validation compounds of the training plates can be increased. ResultsLast epoch wide model
|
Experiment 4 platesThis model is trained with plates BR00115134_FS, BR00115125_FS, BR00115133highexp_FS, and BR00115128_FS, where the first three have been used in the previous experiments. Main takeawaysUsing 4 training plates did not result in significant improvements of the validation mAP of the model. I used the last epoch model for inferring the profiles. ResultsmAPs
|
As referenced in #6 (comment) there was an issue with the way the validation loss was calculated. I updated it since then to calculate the validation loss on profiles that are inferred the same way as they would be during evaluation metrics calculation. This is done by using all of the cells in a given well. Main takeaways
|
Hyperparameter optimization 2I wanted to see if changing the batch size, number of cells sampled each epoch and the number of samples sampled each epoch significantly impacted the generalization of the model. I performed another hyperparmeter sweep changing these factors, along with a few model width hyperparameters, to investigate this. Main conclusions
Resultshttps://wandb.ai/rvdijk/FeatureAggregation/sweeps/6ssjjxt9?workspace=user-rvdijk mean average precisionTraining plates are marked in bold
|
More hyperparameter tuningAfter another round of tuning (this time based on the actual evaluation metric we want to optimize for: validation mAP), I trained a model for a longer period on Stain3. Main takeaways
|
For Stain2, because we were in a time crunch, we only analyzed 4 sites per
well; all other experiments should typically be 9.
(Link for Broad folks only - scroll up for more context
https://broadinstitute.slack.com/archives/C3QFQ3WQM/p1594911343036200?thread_ts=1594911343.036200&cid=C3QFQ3WQM
)
…On Tue, Apr 5, 2022, 9:59 AM Robert van Dijk ***@***.***> wrote:
Experiment
I noticed that the number of cells per well is much higher in the Stain3
plates than in the previously used Stain2 plates (~6000-7000 versus
~3000-4000 respectively, see the first comment in this issue). So I
increased the number of cells sampled from 1500 (std 400) to 4000 (std 900)
in this experiment.
table
plate Training mAP model Training mAP BM Validation mAP model Validation
mAP BM PR model PR BM
BR00115128highexp 0.6 0.4 0.32 0.33 100 58.9
BR00115125highexp 0.3 0.37 0.17 0.3 75.6 55.6
BR00115134 0.62 0.37 0.26 0.33 98.9 58.9
BR00115125 0.29 0.36 0.16 0.28 76.7 54.4
BR00115131 0.4 0.38 0.3 0.29 96.7 58.9
BR00115126 0.18 0.32 0.18 0.28 43.3 53.3
BR00115133 0.31 0.38 0.19 0.3 75.6 62.2
BR00115127 0.4 0.38 0.26 0.31 86.7 58.9
BR00115133highexp 0.31 0.38 0.13 0.31 68.9 60
BR00115128 0.47 0.39 0.32 0.32 100 61.1
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTI724TDIALLPJ5VXMGNVTVDRBKFANCNFSM5RLKQRAQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: <broadinstitute/FeatureAggregation_single_cell/issues/6/1088739617@
github.com>
|
To test the generalization of the model trained on Stain2, I will now evaluate it on Stain3. Based on the results, further advancements will be made by training on plates of Stain3 (and possibly then evaluating on Stain2 in turn).
Stain 3 consists of 17 plates which were divided into 4 "batches" which are defined based on the analysis pipeline used.
To analyze the relations between the different plates in Stain3, I calculated the correlation between the PC1 loadings of the mean aggregated profiles of every plate. The BR00115130 plate stands out. This agrees with the findings in https://github.com/jump-cellpainting/pilot-analysis/issues/20 where this plate achieved the lowest PR, likely due to a lot wells having evaporated or something else being messed up. Another plate that stand out is BR00115132. This plate contains a 4-fold dilution of all dyes. BR00115130 will be left out of analysis altogether. BR00115132 will not be left out, although it will not be used as a training plate.
Number of cells per well per plate
The text was updated successfully, but these errors were encountered: