04. Model for Stain3 #6

EchteRobert · 2022-03-22T15:57:59Z

To test the generalization of the model trained on Stain2, I will now evaluate it on Stain3. Based on the results, further advancements will be made by training on plates of Stain3 (and possibly then evaluating on Stain2 in turn).

Stain 3 consists of 17 plates which were divided into 4 "batches" which are defined based on the analysis pipeline used.

Standard
Multiplane
HighExp
Bin1

To analyze the relations between the different plates in Stain3, I calculated the correlation between the PC1 loadings of the mean aggregated profiles of every plate. The BR00115130 plate stands out. This agrees with the findings in https://github.com/jump-cellpainting/pilot-analysis/issues/20 where this plate achieved the lowest PR, likely due to a lot wells having evaporated or something else being messed up. Another plate that stand out is BR00115132. This plate contains a 4-fold dilution of all dyes. BR00115130 will be left out of analysis altogether. BR00115132 will not be left out, although it will not be used as a training plate.

Number of cells per well per plate

EchteRobert · 2022-03-23T14:43:18Z

Baseline calculation Stain3

Just like Stain2, I calculated the baseline PR and mAP for all plates to compare the model performance with.

Benchmark Stain3

plate	Training mAP model	Validation mAP model	PR model
BR00115126highexp	0.32	0.31	53.3
BR00115129	0.38	0.32	52.2
BR00115132highexp	0.36	0.33	63.3
BR00115128	0.39	0.32	61.1
BR00115133highexp	0.38	0.31	60
BR00115134multiplane	0.33	0.3	56.7
BR00115126	0.32	0.28	53.3
BR00115132	0.29	0.27	58.9
BR00115133	0.38	0.3	62.2
BR00115127	0.38	0.31	58.9
BR00115131	0.38	0.29	58.9
BR00115125	0.36	0.28	54.4
BR00115134	0.37	0.33	58.9
BR00115125highexp	0.37	0.3	55.6
BR00115134bin1	0.37	0.31	57.8
BR00115128highexp	0.4	0.33	58.9

EchteRobert · 2022-03-23T14:52:23Z

Analysis

First analysis will be done on plate BR00115125, BR00115126, and BR00115127 of S3 using the best model trained on Stain2 (S2) (#5 (comment)). Processing the plates takes quite some time so it may take some days until I can do a full Stain3 (S3) analysis. It also appears that S3 plates are two times as large in size as the S2 plates. I'm assuming that's due to higher cell seeding (because nr. of features remains the same), but this is not mentioned in the S3 issue https://github.com/jump-cellpainting/pilot-analysis/issues/20. @niranjchandrasekaran Do you have an idea of why this is the case?

Hypothesis

Based on previous experiments in S2 (#5), I found that there appears to be a correlation between the performance of the model on a given plate and the correlation between the PC1 loadings of that plate and the plates used to train the model on. Given that correlation I calculate the correlation between the PC1 loadings of both S2 and S3 plates, shown below.
Based on these correlations, I expect the model to not perform well on these new S3 plates.

Results

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
BR00115125	0.25	0.36	0.18	0.28	68.9	54.4
BR00115126	0.22	0.32	0.23	0.28	46.7	53.3
BR00115127	0.29	0.38	0.22	0.31	80	58.9

As expected the baseline outperforms the model when it is trained on Stain2 plates.

Next up

Before training the model on Stain3 plates and evaluating it on those (as I expect this will be relatively easy), I will try to improve the generalization of the model a bit more by tweaking the training process on Stain2 plates a bit.

PC1 loadings S2 and S3 plates

S2 plates span the left side of the plate (13 plates) and S3 the right side (17 plates)

EchteRobert · 2022-03-24T15:58:57Z

Experiment

To see if the model's dependency on specific features can be reduced, I added a dropout layer on top of the input. This means that some fraction of the values in the feature-cell array will be set to zero during training. I started off with a fraction of 0.2 and then used 0.3, i.e. 20 or 30% of the input values are set to zero during training at random.

Main takeaways

Although we are handing in some performance on the training plates and training compounds in terms of mAP, there are some (small) gains in terms of validation compounds (both in training and validation plates). I think this is a desired effect, so I will keep dropout in mind as a future hyperparameter to tweak.

Dropout 0.2 results

Stain3

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
BR00115125	0.25	0.36	0.21	0.28	62.2	54.4
BR00115126	0.23	0.32	0.31	0.28	57.8	53.3
BR00115127	0.3	0.38	0.21	0.31	74.4	58.9

Stain2

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
BR00112202	0.44	0.34	0.38	0.3	94.4	54.4
BR00112197standard	0.46	0.4	0.38	0.28	93.3	56.7
BR00112203	0.17	0.3	0.26	0.27	52.2	56.7
BR00112199	0.28	0.32	0.24	0.28	70	57.8
BR00113818	0.31	0.28	0.27	0.3	68.9	52.2
BR00113819	0.31	0.28	0.24	0.25	63.3	48.9
BR00112198	0.57	0.35	0.42	0.3	100	56.7
BR00112197repeat	0.45	0.41	0.41	0.31	95.6	63.3
BR00112204	0.59	0.35	0.41	0.29	98.9	58.9
BR00113820	0.27	0.3	0.24	0.3	64.4	55.6
BR00113821	0.14	0.24	0.16	0.22	35.6	47.8
BR00112197binned	0.4	0.41	0.36	0.3	84.4	58.9
BR00112201	0.66	0.4	0.46	0.32	97.8	66.7

Dropout 0.3 results

Stain3

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
BR00115125	0.24	0.36	0.2	0.28	62.2	54.4
BR00115126	0.25	0.32	0.24	0.28	60	53.3
BR00115127	0.31	0.38	0.21	0.31	81.1	58.9

Stain2

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
BR00112202	0.43	0.34	0.36	0.3	94.4	54.4
BR00112197standard	0.45	0.4	0.33	0.28	92.2	56.7
BR00112203	0.18	0.3	0.24	0.27	56.7	56.7
BR00112199	0.29	0.32	0.22	0.28	77.8	57.8
BR00113818	0.33	0.28	0.24	0.3	76.7	52.2
BR00113819	0.33	0.28	0.25	0.25	78.9	48.9
BR00112198	0.58	0.35	0.39	0.3	100	56.7
BR00112197repeat	0.45	0.41	0.39	0.31	95.6	63.3
BR00112204	0.56	0.35	0.36	0.29	100	58.9
BR00113820	0.25	0.3	0.25	0.3	66.7	55.6
BR00113821	0.14	0.24	0.14	0.22	38.9	47.8
BR00112197binned	0.4	0.41	0.34	0.3	87.8	58.9
BR00112201	0.65	0.4	0.42	0.32	97.8	66.7

EchteRobert · 2022-04-01T20:44:43Z

Experiment

This experiment uses the same model setup as in #5 (comment) to train and validate on cluster 4, which contains Stain3 plates. The results are compared against the benchmark results (mean aggregation with feature selection). As clarification I plotted the zoomed in hierarchical clustering map of cluster 4 as well. I used the best validation model for inference.

Main takeaways

The model is not generalizing as well as in the previous and similar experiment on Stain2. My intuition is that the training plates are a bit too similar here, whereas there was slightly more diversity in the Stain2 training plates. I will repeat this experiment with BR00115133, BR00115127, and BR00115125highexp.

Clusters are all you need

mind the colorbar scale

Click here for results!

Note that BR00115126 is not in cluster 4 and serves as a sanity check, i.e. the model is expected to underperform on that plate.

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training
BR00115128highexp	0.55	0.4	0.36	0.33	100	58.9
BR00115134	0.59	0.37	0.26	0.33	95.6	58.9
BR00115131	0.41	0.38	0.33	0.29	93.3	58.9
Validation
BR00115125	0.3	0.36	0.14	0.28	74.4	54.4
BR00115126	0.21	0.32	0.17	0.28	54.4	53.3
BR00115127	0.4	0.38	0.26	0.31	97.8	58.9
BR00115128	0.45	0.39	0.32	0.32	97.8	61.1
BR00115129	0.42	0.38	0.27	0.32	87.8	52.2

EchteRobert · 2022-04-05T13:58:48Z

Experiment

I noticed that the number of cells per well is much higher in the Stain3 plates than in the previously used Stain2 plates (~6000-7000 versus ~3000-4000 respectively, see the first comment in this issue). So I increased the number of cells sampled from 1500 (std 400) to 4000 (std 900) in this experiment.

Main takeaways

Training compound performance has increased for both training and validation plates. However, this comes at the cost of performance on the validation compounds in all plates.
It seems like I am missing some factor that is causing a stronger overfit on these plates than the Stain2 plates, i.e. there is a larger discrepancy between the training data and the validation data (both plates and compounds). The last option I can currently think of is to use different (and/or more) training plates.

table

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training
BR00115128highexp	0.62	0.4	0.34	0.33	98.9	58.9
BR00115134	0.66	0.37	0.23	0.33	96.7	58.9
BR00115131	0.42	0.38	0.31	0.29	91.1	58.9
Validation
BR00115125highexp	0.34	0.37	0.19	0.3	75.6	55.6
BR00115125	0.33	0.36	0.19	0.28	71.1	54.4
BR00115126	0.2	0.32	0.18	0.28	42.2	53.3
BR00115133	0.31	0.38	0.22	0.3	85.6	62.2
BR00115127	0.41	0.38	0.27	0.31	91.1	58.9
BR00115133highexp	0.32	0.38	0.14	0.31	75.6	60
BR00115128	0.47	0.39	0.33	0.32	94.4	61.1
BR00115129	0.41	0.38	0.3	0.32	93.3	52.2

EchteRobert · 2022-04-05T15:27:17Z

Experiment

I increased the number of training plates to 5 and switched them around as well. Then I trained the model with the current default setup.

Main takeaways

The training compounds on training plates do not have higher mAP values than the BM. Using 5 training plates requires the model to train for longer.

table!

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training
BR00115133	0.55	0.38	0.23	0.3	94.4	62.2
BR00115125	0.5	0.36	0.22	0.28	96.7	54.4
BR00115128	0.34	0.39	0.25	0.32	75.6	61.1
BR00115129	0.37	0.38	0.23	0.32	81.1	52.2
BR00115127	0.38	0.38	0.29	0.31	88.9	58.9
Validation
BR00115128highexp	0.35	0.4	0.24	0.33	76.7	58.9
BR00115125highexp	0.37	0.37	0.2	0.3	93.3	55.6
BR00115134	0.36	0.37	0.25	0.33	82.2	58.9
BR00115131	0.36	0.38	0.25	0.29	83.3	58.9
BR00115126	0.22	0.32	0.2	0.28	45.6	53.3
BR00115133highexp	0.43	0.38	0.18	0.31	88.9	60

loss curves

EchteRobert · 2022-04-06T02:37:59Z

Experiment

I discovered a mistake in my code, which lead to the model always being trained on 2 plates (instead of the 3 or 5 I wanted to use in previous experiments). I have now fixed this and trained the model again with 3 plates, while still using the higher cell count as used in the previous two experiments.

Main takeaways

The problem we saw before in overfitting the training compounds on the training plates has now been resolved (by using an extra training plate).
The problem of overfitting the validation compounds is still here. At least the problem is consistent across both training and validation plates, which may mean that solving this problem for the training plates will result in solving it for the validation plates.

TableTime!

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training
BR00115134	0.56	0.37	0.24	0.33	97.8	58.9
BR00115125	0.54	0.36	0.3	0.28	96.7	54.4
BR00115133highexp	0.54	0.38	0.26	0.31	96.7	60
Validation
BR00115128highexp	0.39	0.4	0.27	0.33	87.8	58.9
BR00115125highexp	0.41	0.37	0.21	0.3	92.2	55.6
BR00115131	0.42	0.38	0.28	0.29	92.2	58.9
BR00115133	0.43	0.38	0.22	0.3	93.3	62.2
BR00115127	0.43	0.38	0.33	0.31	93.3	58.9
BR00115128	0.4	0.39	0.3	0.32	87.8	61.1
BR00115129	0.44	0.38	0.26	0.32	92.2	52.2
BR00115126	0.24	0.32	0.19	0.28	44.4	53.3

bethac07 · 2022-04-06T12:34:55Z

Sorry, I tried to send this comment via email yesterday, but it apparently didn't take!

For Stain2, because we were in a time crunch, we only analyzed 4 sites per well; all other experiments should typically be 9 sites per well. The discrepancy in cell count is therefore expected.

(Link for Broad folks only - scroll up for more context if needed https://broadinstitute.slack.com/archives/C3QFQ3WQM/p1594911343036200?thread_ts=1594911343.036200&cid=C3QFQ3WQM )

niranjchandrasekaran · 2022-04-06T12:54:56Z

Thanks Beth!! I had completely forgotten this and have been going through my emails and notes for a week, looking for an explanation.

EchteRobert · 2022-04-06T18:25:46Z

MOA prediction / percent matching benchmark for Stain3

These scores here are calculated by considering the moa tag as replicate pairs. I calculate the mAP and percent matching by both including all compounds that target a specific moa (inc. all compounds) and by only considering the sister compounds (different compound targeting same moa) as a replicate (exclusively sister compounds).

BM all plates Stain3 cluster (inc. all compounds)

plate	mAP BM	PR BM
BR00115126highexp	0.23	40.4
BR00115129	0.27	55.3
BR00115132highexp	0.25	66
BR00115128	0.27	59.6
BR00115133highexp	0.27	57.4
BR00115134multiplane	0.24	51.1
BR00115126	0.23	46.8
BR00115132	0.21	57.4
BR00115133	0.26	61.7
BR00115127	0.27	53.2
BR00115131	0.27	61.7
BR00115125	0.25	55.3
BR00115134	0.26	53.2
BR00115125highexp	0.26	53.2
BR00115134bin1	0.25	55.3
BR00115128highexp	0.28	63.8

Benchmark MOA prediction with model (inc. all compounds)

Using model shown in #6 (comment)

plate	mAP model	mAP BM	PR model	PR BM
BR00115128highexp	0.26	0.28	70.2	63.8
BR00115125highexp	0.27	0.26	70.2	53.2
BR00115134	0.33	0.26	70.2	53.2
BR00115125	0.34	0.25	66	55.3
BR00115131	0.28	0.27	68.1	61.7
BR00115126	0.18	0.23	42.6	46.8
BR00115133	0.27	0.26	68.1	61.7
BR00115127	0.29	0.27	68.1	53.2
BR00115133highexp	0.32	0.27	66	57.4
BR00115128	0.26	0.27	68.1	59.6
BR00115129	0.28	0.27	70.2	55.3

Percent matching histograms (inc. all compounds)

BM all plates Stain3 cluster (exclusively sister compounds)

plate	mAP BM	PR BM
BR00115126highexp	0.11	34
BR00115129	0.13	48.9
BR00115132highexp	0.11	46.8
BR00115128	0.13	42.6
BR00115133highexp	0.14	46.8
BR00115134multiplane	0.13	36.2
BR00115126	0.11	34
BR00115132	0.1	44.7
BR00115133	0.13	40.4
BR00115127	0.14	46.8
BR00115131	0.14	51.1
BR00115125	0.13	44.7
BR00115134	0.13	40.4
BR00115125highexp	0.13	44.7
BR00115134bin1	0.12	34
BR00115128highexp	0.14	44.7

Benchmark MOA prediction with model (exclusively sister compounds)

Using model shown in #6 (comment)

plate	mAP model	mAP BM	PR model	PR BM
BR00115128highexp	0.11	0.14	46.8	44.7
BR00115125highexp	0.12	0.13	46.8	44.7
BR00115134	0.12	0.13	42.6	40.4
BR00115125	0.13	0.13	48.9	44.7
BR00115131	0.12	0.14	44.7	51.1
BR00115133	0.11	0.13	48.9	40.4
BR00115127	0.11	0.14	48.9	46.8
BR00115133highexp	0.12	0.14	46.8	46.8
BR00115128	0.1	0.13	44.7	42.6
BR00115129	0.1	0.13	57.4	48.9

Percent matching histograms (exclusively sister compounds)

EchteRobert · 2022-04-11T15:34:57Z

Hyperparameter sweep

I ran a large hyperparameter sweep to get an idea of what hyperparameters influence model generalization to validation compounds. The most important parameters I varied were:

cell_layers (layers before summation operation)
latent_dim (dimension of collapsed representation)
initial_cells (mean of the gaussian distribution that I use to pick a number of cells to sample from each epoch)
cell_variance (standard deviation of that gaussian distribution)
kFilters (indication of width of all layers in model - smaller means larger width!
output_dim (dimension of loss space dimension)

Main takeaways

Decreasing model depth, i.e. cell_layers and proj_layers, and increasing model width, i.e. latent_dim, kFilters, and output_dim, results in better generalization. This seems to be in line with current deep learning theory.
Picking the mean and standard deviation for the gaussian function that selects the number of cells each epoch has less influence on model performance than expected, but is still in the top explaining variables.

The full sweep report can be viewed here:
https://wandb.ai/rvdijk/FeatureAggregation/reports/Hyperparameter-sweep-Stain3--VmlldzoxODIzNzE2?accessToken=9goex3hbjwxddwwvafwmtuudkx6qz02t4ao9pazo507o2xgnv7323svgf9wqldwh

EchteRobert · 2022-04-11T15:58:54Z

Experiment

Using the best hyperparameters from the previous sweep I trained a model for 150 epochs. This experiment is to see if:

Training a model for a longer time increases generalization (despite overfitting)
Training a model using 'optimal' hyperparameters increases generalization

Main takeaways

Both training for longer as well as using the 'optimal' hyperparameters increases generalization to the validation compounds.
Interestingly, using the model that trained for 150 epochs, but grossly overfit the data (Tloss: 1.23 Vloss: 2.60) performed better on validation compounds than selecting the best validation loss model at 41 epochs (Tloss: 1.74 Vloss: 2.34). This points out an issue with the way the validation loss is calculated.
Using this best model the model now is able to beat the performance of the BM on most plates, except for BR00115133highexp (training), BR00115125highexp (validation), and BR00115133 (validation).
I am a bit suspicious of the mAP on the training compounds, especially on the training plates. This extremely high mAP does not seem to translate directly to validation plates, which means it is still finding some general way to aggregate the single-cell features. I think it may mean that this method should not be used to infer profiles on the plates it is trained on.

Next up

Recalculate the validation loss so that it better represents the evaluation metric (mAP on validation compounds). Results: 03. Model for Stain2 #5 (comment)
Train a model with these hyperparameters on the Stain2 data and evaluate performance there. Results: 03. Model for Stain2 #5 (comment)
Investigate other method to further decrease the validation mAP of the model. Current ideas:
- Reconstruction loss
- Further increasing model width
- Further increasing training time

Model with best training loss

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00115134	0.92	0.37	0.33	0.33	97.8	58.9
BR00115125	0.84	0.36	0.34	0.29	100	54.4
BR00115133highexp	0.95	0.38	0.28	0.31	97.8	60
Validation plates
BR00115128highexp	0.47	0.4	0.36	0.33	88.9	58.9
BR00115125highexp	0.54	0.37	0.3	0.31	96.7	55.6
BR00115131	0.5	0.38	0.34	0.29	91.1	58.9
BR00115133	0.56	0.38	0.29	0.3	95.6	62.2
BR00115127	0.53	0.38	0.36	0.31	94.4	58.9
BR00115128	0.5	0.39	0.36	0.32	87.8	61.1
BR00115129	0.49	0.38	0.39	0.32	93.3	52.2
BR00115126	0.3	0.32	0.27	0.28	52.2	53.3

Model with best validation loss

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00115134	0.61	0.37	0.29	0.33	98.9	58.9
BR00115125	0.61	0.36	0.34	0.29	97.8	54.4
BR00115133highexp	0.62	0.38	0.29	0.31	97.8	60
Validation plates
BR00115128highexp	0.45	0.4	0.35	0.33	92.2	58.9
BR00115125highexp	0.49	0.37	0.27	0.31	94.4	55.6
BR00115131	0.48	0.38	0.32	0.29	92.2	58.9
BR00115133	0.5	0.38	0.26	0.3	93.3	62.2
BR00115127	0.49	0.38	0.32	0.31	95.6	58.9
BR00115128	0.46	0.39	0.36	0.32	96.7	61.1
BR00115129	0.47	0.38	0.39	0.32	94.4	52.2
BR00115126	0.27	0.32	0.23	0.28	52.2	53.3

EchteRobert · 2022-04-13T17:02:57Z

Experiment wide model

To explore current deep learning generalization research, I train a very wide model (4096 parameters in each layer instead of the usual 512) and evaluate it's performance. I also trained this model for more than 150 epochs to understand the generalization curve better.

Main takeaways

The overfit is larger than it was before. One of the possible explanations would be that the model is still underfitting the training data. To explore it further I would have to train an even wider model, but I will leave this option for now.

Next up

Instead I will use more training plates to see if the mAP on the validation compounds of the training plates can be increased.

Results

Last epoch wide model

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00115134	0.84	0.37	0.27	0.33	98.9	58.9
BR00115125	0.83	0.36	0.36	0.29	98.9	54.4
BR00115133highexp	0.89	0.38	0.29	0.31	96.7	60
Validation plates
BR00115128highexp	0.46	0.4	0.28	0.33	92.2	58.9
BR00115125highexp	0.53	0.37	0.27	0.31	98.9	55.6
BR00115131	0.48	0.38	0.32	0.29	94.4	58.9
BR00115133	0.51	0.38	0.25	0.3	90	62.2
BR00115127	0.49	0.38	0.34	0.31	92.2	58.9
BR00115128	0.45	0.39	0.29	0.32	91.1	61.1
BR00115129	0.47	0.38	0.32	0.32	94.4	52.2
BR00115126	0.27	0.32	0.25	0.28	56.7	53.3

EchteRobert · 2022-04-14T17:00:24Z

Experiment 4 plates

This model is trained with plates BR00115134_FS, BR00115125_FS, BR00115133highexp_FS, and BR00115128_FS, where the first three have been used in the previous experiments.

Main takeaways

Using 4 training plates did not result in significant improvements of the validation mAP of the model. I used the last epoch model for inferring the profiles.

Results

mAPs

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00115134	0.66	0.37	0.31	0.33	98.9	58.9
BR00115125	0.67	0.36	0.38	0.29	98.9	54.4
BR00115133highexp	0.66	0.38	0.29	0.31	97.8	60
BR00115128	0.7	0.39	0.38	0.32	100	61.1
Validation plates
BR00115128highexp	0.54	0.4	0.4	0.33	96.7	58.9
BR00115125highexp	0.54	0.37	0.26	0.31	98.9	55.6
BR00115131	0.52	0.38	0.34	0.29	97.8	58.9
BR00115133	0.52	0.38	0.27	0.3	95.6	62.2
BR00115127	0.55	0.38	0.36	0.31	95.6	58.9
BR00115129	0.52	0.38	0.39	0.32	100	52.2
BR00115126	0.32	0.32	0.26	0.28	63.3	53.3

EchteRobert · 2022-04-15T16:05:34Z

As referenced in #6 (comment) there was an issue with the way the validation loss was calculated. I updated it since then to calculate the validation loss on profiles that are inferred the same way as they would be during evaluation metrics calculation. This is done by using all of the cells in a given well.
However, after that I still consistently got better results using the model of the last trained epoch and not the one with the best validation loss. This non-scientific source gives some good intuition about that: https://twitter.com/fchollet/status/1469282405420781569.

Main takeaways

I now track the validation mAP and select the best model based on that metric.
Even though we are overfitting according to the loss curve, the validation mAP steadily increases and flattens out! This explains the performance behaviour I have been seeing.
As long as the validation loss is decreasing, the validation mAP is increasing, showing that they are still inversely correlated (but only during the part where validation loss decreases). This is an important finding because if this was not the case, then the hyperparameter sweep I did before would not have been very relevant.
Unfortunately I mixed a '<' with a '>' and thus did not save the best validation mAP model, but I now know that we can squeeze some more performance out of the model by using this model. The final validation mAP was 0.46, while the best validation mAP was 0.51.

Majestic training curves

EchteRobert · 2022-04-20T14:12:08Z

Hyperparameter optimization 2

I wanted to see if changing the batch size, number of cells sampled each epoch and the number of samples sampled each epoch significantly impacted the generalization of the model. I performed another hyperparmeter sweep changing these factors, along with a few model width hyperparameters, to investigate this.

Main conclusions

Although achieving a slightly higher best validation mAP (0.53 instead of 0.51), optimizing these hyperparameters did not result in a significant increase in the mAP of all plates.
The batch size has significant influence over the generalization of the model, but also over the training loss and process itself. This will be one of the most important parameters to precisely tune for the final model.

Results

https://wandb.ai/rvdijk/FeatureAggregation/sweeps/6ssjjxt9?workspace=user-rvdijk

mean average precision

Training plates are marked in bold

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
BR00115128highexp	0.43	0.4	0.36	0.33	96.7	58.9
BR00115125highexp	0.47	0.37	0.26	0.31	95.6	55.6
BR00115134	0.61	0.37	0.32	0.33	98.9	58.9
BR00115125	0.6	0.36	0.34	0.29	98.9	54.4
BR00115131	0.47	0.38	0.32	0.29	96.7	58.9
BR00115126	0.25	0.32	0.22	0.28	54.4	53.3
BR00115133	0.47	0.38	0.26	0.3	93.3	62.2
BR00115127	0.47	0.38	0.35	0.31	98.9	58.9
BR00115133highexp	0.6	0.38	0.27	0.31	97.8	60
BR00115128	0.43	0.39	0.38	0.32	95.6	61.1
BR00115129	0.46	0.38	0.34	0.32	96.7	52.2

Loss curves

EchteRobert · 2022-04-25T20:51:15Z

More hyperparameter tuning

After another round of tuning (this time based on the actual evaluation metric we want to optimize for: validation mAP), I trained a model for a longer period on Stain3.

Main takeaways

The model is now able to generalize to nearly every plate, except for BR00115125highexp which achieves performance close to the baseline.
Notably, the model even generalizes to plate BR00115126, which is considered to be an outlier plate based on the PC1 loadings plot in the first comment.
The model was improved upon by doubling the output feature vector from 1024 to 2048 and decreasing the learning rate from 1e-3 to 5e-4

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00115133highexp	0.7	0.38	0.36	0.31	97.8	60
BR00115134	0.7	0.37	0.38	0.33	97.8	58.9
BR00115125	0.66	0.36	0.38	0.29	98.9	54.4
Validation plates
BR00115128highexp	0.48	0.4	0.41	0.33	92.2	58.9
BR00115125highexp	0.54	0.37	0.29	0.31	97.8	55.6
BR00115131	0.51	0.38	0.4	0.29	93.3	58.9
BR00115133	0.51	0.38	0.31	0.3	94.4	62.2
BR00115127	0.52	0.38	0.44	0.31	94.4	58.9
BR00115128	0.49	0.39	0.4	0.32	94.4	61.1
BR00115129	0.51	0.38	0.43	0.32	90	52.2
BR00115126	0.32	0.32	0.3	0.28	56.7	53.3

bethac07 · 2022-10-11T08:57:40Z

For Stain2, because we were in a time crunch, we only analyzed 4 sites per well; all other experiments should typically be 9. (Link for Broad folks only - scroll up for more context https://broadinstitute.slack.com/archives/C3QFQ3WQM/p1594911343036200?thread_ts=1594911343.036200&cid=C3QFQ3WQM )

…

On Tue, Apr 5, 2022, 9:59 AM Robert van Dijk ***@***.***> wrote: Experiment I noticed that the number of cells per well is much higher in the Stain3 plates than in the previously used Stain2 plates (~6000-7000 versus ~3000-4000 respectively, see the first comment in this issue). So I increased the number of cells sampled from 1500 (std 400) to 4000 (std 900) in this experiment. table plate Training mAP model Training mAP BM Validation mAP model Validation mAP BM PR model PR BM BR00115128highexp 0.6 0.4 0.32 0.33 100 58.9 BR00115125highexp 0.3 0.37 0.17 0.3 75.6 55.6 BR00115134 0.62 0.37 0.26 0.33 98.9 58.9 BR00115125 0.29 0.36 0.16 0.28 76.7 54.4 BR00115131 0.4 0.38 0.3 0.29 96.7 58.9 BR00115126 0.18 0.32 0.18 0.28 43.3 53.3 BR00115133 0.31 0.38 0.19 0.3 75.6 62.2 BR00115127 0.4 0.38 0.26 0.31 86.7 58.9 BR00115133highexp 0.31 0.38 0.13 0.31 68.9 60 BR00115128 0.47 0.39 0.32 0.32 100 61.1 — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABTI724TDIALLPJ5VXMGNVTVDRBKFANCNFSM5RLKQRAQ> . You are receiving this because you are subscribed to this thread.Message ID: <broadinstitute/FeatureAggregation_single_cell/issues/6/1088739617@ github.com>

EchteRobert self-assigned this Mar 22, 2022

EchteRobert added the Development label Mar 22, 2022

EchteRobert mentioned this issue Apr 12, 2022

03. Model for Stain2 #5

Open

EchteRobert mentioned this issue Apr 14, 2022

General model experiments #3

Open

EchteRobert mentioned this issue Apr 21, 2022

05. Model for Stain4 #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04. Model for Stain3 #6

04. Model for Stain3 #6

EchteRobert commented Mar 22, 2022 •

edited

Loading

EchteRobert commented Mar 23, 2022

EchteRobert commented Mar 23, 2022 •

edited

Loading

EchteRobert commented Mar 24, 2022 •

edited

Loading

EchteRobert commented Apr 1, 2022 •

edited

Loading

EchteRobert commented Apr 5, 2022 •

edited

Loading

EchteRobert commented Apr 5, 2022 •

edited

Loading

EchteRobert commented Apr 6, 2022

bethac07 commented Apr 6, 2022

niranjchandrasekaran commented Apr 6, 2022

EchteRobert commented Apr 6, 2022 •

edited

Loading

EchteRobert commented Apr 11, 2022 •

edited

Loading

EchteRobert commented Apr 11, 2022 •

edited

Loading

EchteRobert commented Apr 13, 2022 •

edited

Loading

EchteRobert commented Apr 14, 2022 •

edited

Loading

EchteRobert commented Apr 15, 2022 •

edited

Loading

EchteRobert commented Apr 20, 2022 •

edited

Loading

EchteRobert commented Apr 25, 2022 •

edited

Loading

bethac07 commented Oct 11, 2022 via email

04. Model for Stain3 #6

04. Model for Stain3 #6

Comments

EchteRobert commented Mar 22, 2022 • edited Loading

EchteRobert commented Mar 23, 2022

Baseline calculation Stain3

EchteRobert commented Mar 23, 2022 • edited Loading

Analysis

Hypothesis

Results

Next up

EchteRobert commented Mar 24, 2022 • edited Loading

Experiment

Main takeaways

EchteRobert commented Apr 1, 2022 • edited Loading

Experiment

Main takeaways

EchteRobert commented Apr 5, 2022 • edited Loading

Experiment

Main takeaways

EchteRobert commented Apr 5, 2022 • edited Loading

Experiment

Main takeaways

EchteRobert commented Apr 6, 2022

Experiment

Main takeaways

bethac07 commented Apr 6, 2022

niranjchandrasekaran commented Apr 6, 2022

EchteRobert commented Apr 6, 2022 • edited Loading

MOA prediction / percent matching benchmark for Stain3

EchteRobert commented Apr 11, 2022 • edited Loading

Hyperparameter sweep

Main takeaways

EchteRobert commented Apr 11, 2022 • edited Loading

Experiment

Main takeaways

Next up

EchteRobert commented Apr 13, 2022 • edited Loading

Experiment wide model

Main takeaways

Next up

Results

EchteRobert commented Apr 14, 2022 • edited Loading

Experiment 4 plates

Main takeaways

Results

EchteRobert commented Apr 15, 2022 • edited Loading

Main takeaways

EchteRobert commented Apr 20, 2022 • edited Loading

Hyperparameter optimization 2

Main conclusions

Results

EchteRobert commented Apr 25, 2022 • edited Loading

More hyperparameter tuning

Main takeaways

bethac07 commented Oct 11, 2022 via email

EchteRobert commented Mar 22, 2022 •

edited

Loading

EchteRobert commented Mar 23, 2022 •

edited

Loading

EchteRobert commented Mar 24, 2022 •

edited

Loading

EchteRobert commented Apr 1, 2022 •

edited

Loading

EchteRobert commented Apr 5, 2022 •

edited

Loading

EchteRobert commented Apr 5, 2022 •

edited

Loading

EchteRobert commented Apr 6, 2022 •

edited

Loading

EchteRobert commented Apr 11, 2022 •

edited

Loading

EchteRobert commented Apr 11, 2022 •

edited

Loading

EchteRobert commented Apr 13, 2022 •

edited

Loading

EchteRobert commented Apr 14, 2022 •

edited

Loading

EchteRobert commented Apr 15, 2022 •

edited

Loading

EchteRobert commented Apr 20, 2022 •

edited

Loading

EchteRobert commented Apr 25, 2022 •

edited

Loading