You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for integrating and opensource the Benckmark dataset.
I noticed that there are some inconsistencies between statistics in the paper and the released data in benchmarks/CodonBERT/data. Here are the confusing parts:
For the MLOS flu vaccine data, you show 543 mRNA samples in Table 1 in the paper, but I only found 167 samples in the released data.
For SARS-Cov-2 vaccine degradation data, you show 2400 mRNA samples in Table 1 in the paper, but I only found 233 samples in the released data.
Could you kindly clarify them?
BTW, I noticed that some of the datasets are very small. When using a 0.7/0.15/0.15 split on such a small dataset and computing metrics like correlation, the results are not reliable. It would be better that you use k-fold cross validation.
The text was updated successfully, but these errors were encountered:
Thank you for integrating and opensource the Benckmark dataset.
I noticed that there are some inconsistencies between statistics in the paper and the released data in
benchmarks/CodonBERT/data
. Here are the confusing parts:Could you kindly clarify them?
BTW, I noticed that some of the datasets are very small. When using a 0.7/0.15/0.15 split on such a small dataset and computing metrics like correlation, the results are not reliable. It would be better that you use k-fold cross validation.
The text was updated successfully, but these errors were encountered: