Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chronologer Retention Time model predictor #761

Open
wants to merge 109 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
a6b1639
correct Within calculation
Nov 18, 2021
fa4da8b
update unit tests
Nov 18, 2021
3246567
conflicts resolved back to upstream
Feb 4, 2022
a018d4d
Merge remote-tracking branch 'upstream/master'
Feb 15, 2022
15a37d0
Merge remote-tracking branch 'upstream/master'
Feb 17, 2022
892fa45
this is the spot
Feb 18, 2022
211013c
Merge remote-tracking branch 'upstream/master'
Feb 25, 2022
68104ee
Merge branch 'master' of https://github.com/trishorts/mzLib
trishorts Mar 9, 2022
d715a08
Merge remote-tracking branch 'upstream/master'
Mar 16, 2022
3565522
Merge remote-tracking branch 'upstream/master'
Mar 23, 2022
72e7b53
Merge remote-tracking branch 'upstream/master'
Mar 29, 2022
593872a
Merge remote-tracking branch 'upstream/master'
trishorts Apr 13, 2022
42dd034
Merge branch 'master' of https://github.com/trishorts/mzLib
trishorts Apr 13, 2022
fbeaec0
Merge remote-tracking branch 'upstream/master'
trishorts Jun 1, 2022
614ded7
Merge remote-tracking branch 'upstream/master'
Jun 14, 2022
47307c8
Merge branch 'master' of https://github.com/trishorts/mzLib
Jun 14, 2022
28e05ae
Merge remote-tracking branch 'upstream/master'
Jul 6, 2022
0a7c609
Merge remote-tracking branch 'upstream/master'
Jul 26, 2022
630d8c7
Merge remote-tracking branch 'upstream/master'
trishorts Jul 27, 2022
f6a386b
Merge branch 'master' of https://github.com/trishorts/mzLib
trishorts Jul 27, 2022
d673800
Merge remote-tracking branch 'upstream/master'
Sep 11, 2022
675a0ae
Merge branch 'master' of https://github.com/trishorts/mzLib
Sep 11, 2022
15d4baf
Merge remote-tracking branch 'upstream/master'
Sep 27, 2022
03ca9f7
Merge remote-tracking branch 'upstream/master'
Oct 4, 2022
d0a4c79
Merge remote-tracking branch 'upstream/master'
Jan 30, 2023
894b998
Merge remote-tracking branch 'upstream/master'
Mar 15, 2023
88269a1
Merge remote-tracking branch 'upstream/master'
trishorts Apr 24, 2023
9a9b24a
Merge remote-tracking branch 'upstream/master'
trishorts Jun 29, 2023
b4ad231
add space
trishorts Jun 29, 2023
bc59b38
Merge remote-tracking branch 'upstream/master'
trishorts Oct 10, 2023
f3c83ae
first move
trishorts Nov 6, 2023
d6d934b
psmFromTsv unit tests
trishorts Nov 6, 2023
2db71cd
moved library spectrum
trishorts Nov 6, 2023
562f69d
empty unit test for library spectrum
trishorts Nov 6, 2023
d3dcbe9
m
trishorts Nov 6, 2023
2c4334a
library spectrum unit tests
trishorts Nov 7, 2023
a86d68e
lib spec unit tests
trishorts Nov 7, 2023
c7ce32d
PSMTSV unit tests
trishorts Nov 7, 2023
c610791
add tests for variants and localized glycans
trishorts Nov 7, 2023
5e09c14
capitalization convention
trishorts Nov 7, 2023
9055644
read internal ions test
trishorts Nov 7, 2023
74b80ad
uncomment lines
trishorts Nov 7, 2023
d1bc75c
moved fragmentation and library spectrum to new project Omics
trishorts Nov 8, 2023
cec311a
Revert "moved fragmentation and library spectrum to new project Omics"
trishorts Nov 9, 2023
8d88b32
someInterfaces
trishorts Nov 9, 2023
df0f605
good midpont
trishorts Nov 9, 2023
cad0d1c
omics classes and interfaces seem tobe working
trishorts Nov 9, 2023
8991e14
move LibrarySpectrum class to Omics. Create SpectrumMatchFromTsvHeade…
trishorts Nov 10, 2023
02bf807
not working
trishorts Nov 15, 2023
b7d15d6
Fixed up the PR
nbollis Nov 15, 2023
2502322
Merge pull request #2 from trishorts/tempPsmFromTsv
trishorts Nov 16, 2023
924e99f
fix broken test
trishorts Nov 16, 2023
10f53a2
some unit tests
trishorts Nov 16, 2023
d0a55b2
dhg
trishorts Nov 16, 2023
81f9338
Expanded test coverage on file classes
nbollis Nov 16, 2023
382c0da
new header and xlink psmtsv reader unit tests
trishorts Nov 20, 2023
3abe9a3
CPU(windows, linux, and mac) dll
elaboy Nov 20, 2023
71c3ead
init
elaboy Nov 21, 2023
7a84810
Merge branch 'pr/737' into TrainingMethodsForChronologer
elaboy Nov 21, 2023
79e3d09
Custom Datasets and training functions
elaboy Nov 21, 2023
848f81c
cool progress
elaboy Nov 21, 2023
d8576aa
training working
elaboy Nov 22, 2023
81fe5b6
Working
elaboy Nov 24, 2023
d9bf11a
updated Directory
elaboy Feb 13, 2024
4b4d624
cleaning code
elaboy Feb 13, 2024
7ecc7c6
Update ChronologerRetentionTimeEstimator.cs
elaboy Feb 13, 2024
e786bd6
Merge branch 'master' into Chronologer
elaboy Feb 13, 2024
816f031
.
elaboy Feb 13, 2024
4cdc4cf
Update TerminusSpecificProductTypes.cs
elaboy Feb 13, 2024
367cd94
Delete ChronologerTest.tsv
elaboy Feb 13, 2024
d139c70
Merge branch 'Chronologer' of https://github.com/elaboy/mzLib-Fork in…
elaboy Feb 13, 2024
638a635
Update TestFlashLFQ.cs
elaboy Feb 13, 2024
143f45f
internal and comments
elaboy Feb 16, 2024
1dddd92
Merge branch 'master' into Chronologer
elaboy Feb 21, 2024
ba8e65c
changed estimator class and added comments
elaboy Feb 22, 2024
684842a
Merge branch 'Chronologer' of https://github.com/elaboy/mzLib-Fork in…
elaboy Feb 22, 2024
ebee682
oops
elaboy Feb 22, 2024
10fd45b
Merge branch 'master' into Chronologer
trishorts Feb 29, 2024
adfe301
Merge branch 'master' into Chronologer
trishorts Mar 1, 2024
06f0798
Merge branch 'master' into Chronologer
elaboy May 1, 2024
a484e08
static method to access chronologer
elaboy May 1, 2024
fb5e618
Got rid of variables that were not being used
elaboy Jul 11, 2024
383ea53
Merge branch 'master' into Chronologer
elaboy Jul 11, 2024
371a077
fixed the terminus integers and N-acetylation
elaboy Jul 12, 2024
a70cc12
Merge branch 'Chronologer' of https://github.com/elaboy/mzLib-Fork in…
elaboy Jul 12, 2024
d2f0a34
Merge branch 'master' into Chronologer
elaboy Jul 12, 2024
80f6373
updated the dictionary to include all the supported mods
elaboy Jul 17, 2024
01e339b
more testing and changes to the tensorize method
elaboy Jul 17, 2024
3279426
removed unused/repeated packagess
elaboy Jul 17, 2024
d9b34e3
Merge branch 'master' into Chronologer
elaboy Jul 22, 2024
c599e9d
tensorize method now looks at the base sequence for selenocysteine an…
elaboy Jul 23, 2024
c5b852b
Merge branch 'master' into Chronologer
nbollis Jul 25, 2024
0e0fba3
Merge branch 'master' into Chronologer
trishorts Jul 30, 2024
345577b
Merge branch 'master' into Chronologer
trishorts Aug 5, 2024
d93cf8f
making gpu available
elaboy Aug 30, 2024
d96e0cb
Revert "Merge branch 'master' into Chronologer"
elaboy Aug 30, 2024
bd08d85
making gpu available
elaboy Aug 30, 2024
a9fba92
Merge branch 'master' into Chronologer
elaboy Aug 30, 2024
6b0d3f7
cuda support
elaboy Aug 30, 2024
4b8bf13
Merge branch 'Chronologer' of https://github.com/elaboy/mzLib-Fork in…
elaboy Aug 30, 2024
833d810
adds cuda support
elaboy Aug 30, 2024
abb88a5
add cuda support
elaboy Aug 30, 2024
c3369bb
try catch for cuda intitialization
elaboy Aug 30, 2024
38d6ada
removed tests that where not supposed to be in this branch
elaboy Aug 30, 2024
629c2a3
contructor is now internal, now use to it being public and some cleanup
elaboy Aug 30, 2024
1635fee
fixing tests and making identification of non-compatible sequences cl…
elaboy Sep 3, 2024
79af77c
removing unused code
elaboy Sep 3, 2024
495444b
same
elaboy Sep 3, 2024
a99a4c4
cpu torchsharp for OSX
elaboy Sep 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

namespace Omics.Fragmentation.Peptide
{
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space

public class TerminusSpecificProductTypes
{
/// <summary>
Expand Down
11 changes: 5 additions & 6 deletions mzLib/Proteomics/Protein/Protein.cs
Original file line number Diff line number Diff line change
Expand Up @@ -78,14 +78,14 @@ public Protein(string sequence, string accession, string organism = null, List<T

/// <summary>
/// Protein construction that clones a protein but assigns a different base sequence
/// For use in SILAC experiments and in decoy construction
/// For use in SILAC experiments
/// </summary>
nbollis marked this conversation as resolved.
Show resolved Hide resolved
/// <param name="originalProtein"></param>
/// <param name="newBaseSequence"></param>
/// <param name="silacSequence"></param>
/// <param name="silacAccession"></param>
public Protein(Protein originalProtein, string newBaseSequence)
public Protein(Protein originalProtein, string silacSequence)
{
BaseSequence = newBaseSequence;
BaseSequence = silacSequence;
Accession = originalProtein.Accession;
NonVariantProtein = originalProtein.NonVariantProtein;
Name = originalProtein.Name;
Expand Down Expand Up @@ -158,7 +158,7 @@ public Protein(string variantBaseSequence, Protein protein, IEnumerable<Sequence
/// <summary>
/// Base sequence, which may contain applied sequence variations.
/// </summary>
public string BaseSequence { get; private set; }
public string BaseSequence { get; }

public string Organism { get; }
public bool IsDecoy { get; }
Expand Down Expand Up @@ -208,7 +208,6 @@ public string GetUniProtFastaHeader()
{
var n = GeneNames.FirstOrDefault();
string geneName = n == null ? "" : n.Item2;

return string.Format("mz|{0}|{1} {2} OS={3} GN={4}", Accession, Name, FullName, Organism, geneName);
}

Expand Down
18 changes: 10 additions & 8 deletions mzLib/Proteomics/Proteomics.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@
<DebugSymbols>true</DebugSymbols>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="TorchSharp" Version="0.103.0" />
<PackageReference Include="TorchSharp-cuda-windows" Version="0.103.0" Condition="'$(OS)' == 'Windows_NT'"/>
<PackageReference Include="TorchSharp-cuda-linux" Version="0.103.0" Condition="'$(OS)' == 'Linux'"/>
<PackageReference Include="TorchSharp-cpu" Version="0.103.0" Condition="'$(OS)' == 'OSX'"/>
</ItemGroup>

<ItemGroup>
<ProjectReference Include="..\Chemistry\Chemistry.csproj" />
<ProjectReference Include="..\MassSpectrometry\MassSpectrometry.csproj" />
Expand All @@ -22,14 +29,9 @@
<None Update="ProteolyticDigestion\proteases.tsv">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
</ItemGroup>

<ItemGroup>
<Folder Include="Fragmentation\" />
</ItemGroup>

<ItemGroup>
<PackageReference Include="CsvHelper" Version="32.0.3" />
<None Update="RetentionTimePrediction\ChronologerModel\Chronologer_20220601193755_TorchSharp.dat">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
</ItemGroup>

</Project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
using System;
using System.IO;
using TorchSharp;
using TorchSharp.Modules;

namespace Proteomics.RetentionTimePrediction.ChronologerModel;
/// <summary>
/// Chronologer is a deep learning model for highly accurate prediction of peptide C18 retention times (reported in % ACN).
/// Chronologer was trained on a new large harmonized database of > 2.6 million retention time observations
/// (2.25M unique peptides) constructed from 11 community datasets
/// and natively supports prediction of 17 different modification types.
/// With only a few observations of a new modification type (> 10 peptides),
/// Chronologer can be easily re-trained to predict up to 10 user supplied modifications.
///
/// Damien Beau Wilburn, Ariana E. Shannon, Vic Spicer, Alicia L. Richards, Darien Yeung, Danielle L. Swaney, Oleg V. Krokhin, Brian C. Searle
/// bioRxiv 2023.05.30.542978; doi: https://doi.org/10.1101/2023.05.30.542978
///
/// https://github.com/searlelab/chronologer
///
/// Licensed under Apache License 2.0
///
/// </summary>
internal class Chronologer : torch.nn.Module<torch.Tensor, torch.Tensor>
{
internal Chronologer() : this(Path.Combine(AppDomain.CurrentDomain.BaseDirectory,
"RetentionTimePrediction",
"ChronologerModel", "Chronologer_20220601193755_TorchSharp.dat"))
{
RegisterComponents();
}

/// <summary>
/// Initializes a new instance of the Chronologer model class with pre-trained weights from the paper
/// Deep learning from harmonized peptide libraries enables retention time prediction of diverse post
/// translational modifications paper (https://github.com/searlelab/chronologer).
/// Eval mode is set to true and training mode is set to false by default.
///
/// Please use .Predict() for using the model, not .forward().
/// </summary>
/// <param name="weightsPath"></param>
/// <param name="evalMode"></param>
private Chronologer(string weightsPath, bool evalMode = true) : base(nameof(Chronologer))
{
RegisterComponents();

LoadWeights(weightsPath);//loads weights from the file

if (evalMode)
{
eval(); //evaluation mode doesn't update the weights
train(false);
}
}

/// <summary>
/// Do not use for inferring. Use .Predict() instead. Why forward() is not used when predicting outside the training method? ->
/// https://stackoverflow.com/questions/58508190/in-pytorch-what-is-the-difference-between-forward-and-an-ordinary-method
/// </summary>
/// <param name="x"></param>
/// <returns></returns>
public override torch.Tensor forward(torch.Tensor x)
{
var input = seq_embed.forward(x).transpose(1, -1);

var residual = input.clone(); //clones the tensor, later will be added to the input (residual connection)
input = conv_layer_1.forward(input); //renet_block
input = norm_layer_1.forward(input); //batch normalization
input = relu.forward(input); //relu activation
input = conv_layer_2.forward(input); //convolutional layer
input = norm_layer_2.forward(input); //batch normalization
input = relu.forward(input); //relu activation
input = term_block.forward(input); //identity block
input = residual + input; //residual connection
input = relu.forward(input); //relu activation

residual = input.clone(); //clones the tensor, later will be added to the input (residual connection)
input = conv_layer_4.forward(input); //renet_block
input = norm_layer_4.forward(input); //batch normalization
input = relu.forward(input); //relu activation
input = conv_layer_5.forward(input); //convolutional layer
input = norm_layer_5.forward(input); //batch normalization
input = relu.forward(input); //relu activation
input = term_block.forward(input); //identity block
input = residual + input; //residual connection
input = relu.forward(input); //relu activation

residual = input.clone(); //clones the tensor, later will be added to the input (residual connection)
input = conv_layer_7.forward(input); //renet_block
input = norm_layer_7.forward(input); //batch normalization
input = term_block.forward(input); //identity block
input = relu.forward(input); //relu activation
input = conv_layer_8.forward(input); //convolutional layer
input = norm_layer_8.forward(input); //batch normalization
input = relu.forward(input); //relu activation
input = term_block.forward(input); //identity block
input = residual + input; //residual connection
input = relu.forward(input); //relu activation

input = dropout.forward(input); //dropout layer
input = flatten.forward(input); //flatten layer
input = output.forward(input); //output layer

return input;
}

/// <summary>
/// Loads pre-trained weights from the file Chronologer_20220601193755_TorchSharp.dat.
/// </summary>
/// <param name="weightsPath"></param>
private void LoadWeights(string weightsPath)
{
//load weights from the file
load(weightsPath, true);
}

/// <summary>
/// Predicts the retention time of the input peptide sequence. The input must be a torch.Tensor of shape (1, 52).
/// </summary>
/// <param name="input"></param>
/// <returns></returns>
internal torch.Tensor Predict(torch.Tensor input)
{
return call(input);
}

//All Modules (shortcut modules are for loading the weights only, not used but required for the weights)
private Embedding seq_embed = torch.nn.Embedding(55, 64, 0);
private torch.nn.Module<torch.Tensor, torch.Tensor> conv_layer_1 = torch.nn.Conv1d(64, 64, 1, Padding.Same, dilation: 1);
private torch.nn.Module<torch.Tensor, torch.Tensor> conv_layer_2 = torch.nn.Conv1d(64, 64, 7, Padding.Same, dilation: 1);
private torch.nn.Module<torch.Tensor, torch.Tensor> conv_layer_3 = torch.nn.Conv1d(64, 64, 1, Padding.Same, dilation: 1); //shortcut
private torch.nn.Module<torch.Tensor, torch.Tensor> conv_layer_4 = torch.nn.Conv1d(64, 64, 1, Padding.Same, dilation: 2);
private torch.nn.Module<torch.Tensor, torch.Tensor> conv_layer_5 = torch.nn.Conv1d(64, 64, 7, Padding.Same, dilation: 2);
private torch.nn.Module<torch.Tensor, torch.Tensor> conv_layer_6 = torch.nn.Conv1d(64, 64, 1, Padding.Same, dilation: 2); //shortcut
private torch.nn.Module<torch.Tensor, torch.Tensor> conv_layer_7 = torch.nn.Conv1d(64, 64, 1, Padding.Same, dilation: 3);
private torch.nn.Module<torch.Tensor, torch.Tensor> conv_layer_8 = torch.nn.Conv1d(64, 64, 7, Padding.Same, dilation: 3);
private torch.nn.Module<torch.Tensor, torch.Tensor> conv_layer_9 = torch.nn.Conv1d(64, 64, 1, Padding.Same, dilation: 3); //shortcut
private torch.nn.Module<torch.Tensor, torch.Tensor> norm_layer_1 = torch.nn.BatchNorm1d(64);
private torch.nn.Module<torch.Tensor, torch.Tensor> norm_layer_2 = torch.nn.BatchNorm1d(64);
private torch.nn.Module<torch.Tensor, torch.Tensor> norm_layer_3 = torch.nn.BatchNorm1d(64); //shortcut
private torch.nn.Module<torch.Tensor, torch.Tensor> norm_layer_4 = torch.nn.BatchNorm1d(64);
private torch.nn.Module<torch.Tensor, torch.Tensor> norm_layer_5 = torch.nn.BatchNorm1d(64);
private torch.nn.Module<torch.Tensor, torch.Tensor> norm_layer_6 = torch.nn.BatchNorm1d(64); //shortcut
private torch.nn.Module<torch.Tensor, torch.Tensor> norm_layer_7 = torch.nn.BatchNorm1d(64);
private torch.nn.Module<torch.Tensor, torch.Tensor> norm_layer_8 = torch.nn.BatchNorm1d(64);
private torch.nn.Module<torch.Tensor, torch.Tensor> norm_layer_9 = torch.nn.BatchNorm1d(64);
private torch.nn.Module<torch.Tensor, torch.Tensor> term_block = torch.nn.Identity();
private torch.nn.Module<torch.Tensor, torch.Tensor> relu = torch.nn.ReLU(true);
private torch.nn.Module<torch.Tensor, torch.Tensor> dropout = torch.nn.Dropout(0.01);
private torch.nn.Module<torch.Tensor, torch.Tensor> flatten = torch.nn.Flatten(1);
private torch.nn.Module<torch.Tensor, torch.Tensor> output = torch.nn.Linear(52 * 64, 1);
}
Loading
Loading