Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTM Stoichiometry #797

Draft
wants to merge 23 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8bf52b1
Bug fix. Previous ParseModifications implementation could give negati…
pcruzparri Aug 27, 2024
dcede87
Saving draft implementation of a site-occupancy calculation.
pcruzparri Aug 27, 2024
0e59e82
Merge branch 'master' of https://github.com/smith-chem-wisc/mzLib int…
pcruzparri Sep 12, 2024
41ef6f4
Saving some initial progress on the occupancy calculation. Started Mx…
pcruzparri Sep 13, 2024
f06af28
temp
pcruzparri Sep 17, 2024
2ebe188
Merge branch 'master' of https://github.com/smith-chem-wisc/mzLib int…
pcruzparri Oct 11, 2024
b8fb4cb
PTM calculation implemented into FlashLFQ engine. Base method exists …
pcruzparri Oct 11, 2024
8d8658d
Removed the sandbox test Peter and changed the default arguments of P…
pcruzparri Oct 11, 2024
8fb7360
Added fixes to the FlashLFQResults and PositionFrequencyAnalysis impl…
pcruzparri Oct 11, 2024
74ed705
Fixed flipped logic in FlashLFQ/Peptide.GetTotalIntensity(). Cleaned …
pcruzparri Oct 14, 2024
6c18e9f
Transcriptomics Digestion and Fragmentation (#801)
nbollis Oct 15, 2024
68165b0
Refactored the PositionFrequencyAnalysis code to eliminate the nested…
pcruzparri Oct 18, 2024
58e6346
Bug fix. Previous ParseModifications implementation could give negati…
pcruzparri Aug 27, 2024
f0d67d0
Saving draft implementation of a site-occupancy calculation.
pcruzparri Aug 27, 2024
7b04937
Saving some initial progress on the occupancy calculation. Started Mx…
pcruzparri Sep 13, 2024
d2c240e
temp
pcruzparri Sep 17, 2024
af278f0
PTM calculation implemented into FlashLFQ engine. Base method exists …
pcruzparri Oct 11, 2024
ef3ec35
Removed the sandbox test Peter and changed the default arguments of P…
pcruzparri Oct 11, 2024
f577298
Added fixes to the FlashLFQResults and PositionFrequencyAnalysis impl…
pcruzparri Oct 11, 2024
f21d365
Fixed flipped logic in FlashLFQ/Peptide.GetTotalIntensity(). Cleaned …
pcruzparri Oct 14, 2024
f6caa30
Refactored the PositionFrequencyAnalysis code to eliminate the nested…
pcruzparri Oct 18, 2024
b146768
Merge branch 'ptm_stoich' of https://github.com/pcruzparri/mzLib into…
pcruzparri Oct 18, 2024
848413d
saving progress on PeptideToProteinPTMOccupancy and updated Regex mod…
pcruzparri Dec 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mzLib/Chemistry/ClassExtensions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ public static double ToMass(this double massToChargeRatio, int charge)
return Math.Abs(charge) * massToChargeRatio - charge * Constants.ProtonMass;
}

public static double? RoundedDouble(this double myNumber, int places = 9) => RoundedDouble(myNumber as double?, places);
public static double? RoundedDouble(this double? myNumber, int places = 9)
{
if (myNumber != null)
Expand Down
24 changes: 24 additions & 0 deletions mzLib/FlashLFQ/FlashLFQResults.cs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
using Easy.Common.Extensions;
using MathNet.Numerics.Statistics;
using MzLibUtil;
using Proteomics;
using System;
using System.Collections.Generic;
using System.IO;
Expand All @@ -14,6 +16,7 @@
public readonly Dictionary<string, Peptide> PeptideModifiedSequences;
public readonly Dictionary<string, ProteinGroup> ProteinGroups;
public readonly Dictionary<SpectraFileInfo, List<ChromatographicPeak>> Peaks;
public Dictionary<string, MzLibUtil.UtilProteinGroup> ModInfo { get; private set; }
private readonly HashSet<string> _peptideModifiedSequencesToQuantify;

public FlashLfqResults(List<SpectraFileInfo> spectraFiles, List<Identification> identifications, HashSet<string> peptides = null)
Expand Down Expand Up @@ -331,6 +334,27 @@
}
}
}
/// <summary>
/// Calculate peptide level ptm occupancy with either all peptides to be quantified (by intensity) or a subset of FlashLFQ-identified peptides with an arbitrary peptide-level quantifier.
/// </summary>
/// <param name="quantifiedPeptides"> Dictionary where keys are string-typed peptide full sequences in PeptideModifiedSequences and the value is a double-typed quantifier of that peptide.</param>
/// <param name="IncludeNTerminus"> If true, the index of modifications at the N-terminus will be 0 (zero-based indexing). Otherwise, it is the index of the first amino acid (one-based indexing).</param>
/// <param name="IncludeCTerminus"> If true, the index of modifications at the C-terminus will be one more than the index of the last amino acid. Otherwise, it is the index of the last amino acid.</param>
/// <returns> Dictionary with the key being the amino acid position of the mod and the value being the string representing the mod</returns>
public void CalculatePTMOccupancy(Dictionary<string, double> quantifiedPeptides = null, bool IncludeNTerminus = true, bool IncludeCTerminus = true)
{
if (quantifiedPeptides == null) quantifiedPeptides = new Dictionary<string, double>();

var peptides = _peptideModifiedSequencesToQuantify
.Where(pep => PeptideModifiedSequences.ContainsKey(pep))
.Select(pep => Tuple.Create(
PeptideModifiedSequences[pep].Sequence,
PeptideModifiedSequences[pep].BaseSequence,
PeptideModifiedSequences[pep].ProteinGroups.Select(pg => pg.ProteinGroupName).ToList(),
quantifiedPeptides.GetValueOrDefault(pep, PeptideModifiedSequences[pep].GetTotalIntensity()))).ToList();

ModInfo = PositionFrequencyAnalysis.PeptidePTMOccupancy(peptides, IncludeNTerminus, IncludeCTerminus);
}

/// <summary>
/// This method uses the median polish algorithm to calculate protein quantities in each biological replicate.
Expand All @@ -345,7 +369,7 @@
{
proteinGroup.Value.SetIntensity(file, 0);
}
}

Check failure on line 372 in mzLib/FlashLFQ/FlashLFQResults.cs

View workflow job for this annotation

GitHub Actions / build

An object reference is required for the non-static field, method, or property 'PositionFrequencyAnalysis.PeptidePTMOccupancy(List<Tuple<string, string, List<string>, double>>, bool, bool)'

Check failure on line 372 in mzLib/FlashLFQ/FlashLFQResults.cs

View workflow job for this annotation

GitHub Actions / build

An object reference is required for the non-static field, method, or property 'PositionFrequencyAnalysis.PeptidePTMOccupancy(List<Tuple<string, string, List<string>, double>>, bool, bool)'

// associate peptide w/ proteins in a dictionary for easy lookup
List<Peptide> peptides = PeptideModifiedSequences.Values.Where(p => p.UnambiguousPeptideQuant()).ToList();
Expand Down
3 changes: 3 additions & 0 deletions mzLib/FlashLFQ/FlashLfqEngine.cs
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,9 @@ public FlashLfqResults Run()
// do top3 protein quantification
_results.CalculateProteinResultsMedianPolish(UseSharedPeptidesForProteinQuant);

// calculate ptm occupancy at the peptide level
_results.CalculatePTMOccupancy();

// do Bayesian protein fold-change analysis
if (BayesianProteinQuant)
{
Expand Down
16 changes: 15 additions & 1 deletion mzLib/FlashLFQ/Peptide.cs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
using System.Collections.Generic;
using Easy.Common.Extensions;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;

namespace FlashLFQ
Expand Down Expand Up @@ -67,6 +69,18 @@ public void SetIntensity(SpectraFileInfo fileInfo, double intensity)
}
}

public double GetTotalIntensity()
{
if (Intensities.IsNotNullOrEmpty())
{
return Intensities.Sum(i => i.Value);
}
else
{
return 0;
}
}

public DetectionType GetDetectionType(SpectraFileInfo fileInfo)
{
if (DetectionTypes.TryGetValue(fileInfo, out DetectionType detectionType))
Expand Down
5 changes: 5 additions & 0 deletions mzLib/MassSpectrometry/Enums/DissociationType.cs
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,11 @@ public enum DissociationType
/// </summary>
LowCID,

/// <summary>
pcruzparri marked this conversation as resolved.
Show resolved Hide resolved
/// activated ion electron photo detachment dissociation
/// </summary>
aEPD,

Unknown,
AnyActivationType,
Custom,
Expand Down
79 changes: 79 additions & 0 deletions mzLib/MzLibUtil/ClassExtensions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,91 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text.RegularExpressions;

namespace MzLibUtil
{
public static class ClassExtensions
{
/// <summary>
/// Parses the full sequence to identify mods.
/// </summary>
/// <param name="fullSequence"> Full sequence of the peptide in question</param>
/// <param name="modOnNTerminus"> If true, the index of modifications at the N-terminus will be 0 (zero-based indexing). Otherwise, it is the index of the first amino acid (one-based indexing).</param>
/// <param name="modOnCTerminus"> If true, the index of modifications at the C-terminus will be one more than the index of the last amino acid. Otherwise, it is the index of the last amino acid.</param>
/// <returns> Dictionary with the key being the amino acid position of the mod and the value being the string representing the mod</returns>
public static Dictionary<int, List<string>> ParseModifications(this string fullSequence, bool modOnNTerminus=false, bool modOnCTerminus=false)
{
// use a regex to get all modifications
string pattern = @"\[(.+?)\](?<!\[I+\])"; //The "look-behind" condition prevents matching ] for metal ion modifications
Regex regex = new(pattern);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make sure that this method never thinks that
[hydroxylation]EPT[phospho] is accidentaly identified as a mod for P[hydroxylation]EPT[phospho]IDE
I'm not sure that ]EPT[ won't be ignored by your regex

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After finding an opening bracket, regex will always find the next closing bracket, except (updated now) in the case where the closing bracket belongs to an ion charge state.


// remove each match after adding to the dict. Otherwise, getting positions
// of the modifications will be rather difficult.
//int patternMatches = regex.Matches(fullSequence).Count;
Dictionary<int, List<string>> modDict = new();

MatchCollection matches = regex.Matches(fullSequence);
int captureLengthSum = 0;
foreach (Match match in matches)
{
GroupCollection group = match.Groups;
string val = group[1].Value;
int startIndex = group[0].Index;
int captureLength = group[0].Length;

List<string> modList = new List<string>();
modList.Add(val);

// The position of the amino acids is tracked by the positionToAddToDict variable. It takes the
// startIndex of the modification Match and removes the cumulative length of the modifications
// found (including the brackets). The difference will be the number of nonmodification characters,
// or the number of amino acids prior to the startIndex in the sequence.
int positionToAddToDict = startIndex - captureLengthSum;

// Handle N terminus indexing
if ((positionToAddToDict == 0) && !modOnNTerminus)
{
positionToAddToDict++;
}

// Handle C terminus indexing
if ((fullSequence.Length == startIndex + captureLength) && modOnCTerminus)
{
positionToAddToDict++;
}

// check to see if key already exist
// if the already key exists, update the current position with the capture length + 1.
// otherwise, add the modification to the dict.
if (modDict.ContainsKey(positionToAddToDict))
{
modDict[positionToAddToDict].Add(val);
}
else
{
modDict.Add(positionToAddToDict, modList);
}
captureLengthSum += captureLength;
}
return modDict;
}

/// <summary>
/// Fixes an issue where the | appears and throws off the numbering if there are multiple mods on a single amino acid.
/// </summary>
/// <param name="fullSequence"></param>
/// <param name="replacement"></param>
/// <param name="specialCharacter"></param>
/// <returns></returns>
public static void RemoveSpecialCharacters(ref string fullSequence, string replacement = @"", string specialCharacter = @"\|")
{
// next regex is used in the event that multiple modifications are on a missed cleavage Lysine (K)
pcruzparri marked this conversation as resolved.
Show resolved Hide resolved
Regex regexSpecialChar = new(specialCharacter);
fullSequence = regexSpecialChar.Replace(fullSequence, replacement);
}

public static double[] BoxCarSmooth(this double[] data, int points)
{
// Force to be odd
Expand Down
9 changes: 2 additions & 7 deletions mzLib/MzLibUtil/MzLibException.cs
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,6 @@
namespace MzLibUtil
{
[Serializable]
public class MzLibException : Exception
{
public MzLibException(string message)
: base(message)
{
}
}
public class MzLibException(string message, Exception innerException = null)
: Exception(message, innerException);
}
1 change: 0 additions & 1 deletion mzLib/MzLibUtil/MzLibUtil.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,4 @@
<PackageReference Include="MathNet.Numerics" Version="5.0.0" />
<PackageReference Include="Microsoft.Win32.Registry" Version="5.0.0" />
</ItemGroup>

</Project>
Loading
Loading