Peptidoform output from Proteome Discoverer #510

mlocardpaulet · 2024-12-17T15:46:51Z

I added a test file in test/data/dda_quant/ProteomeDIscoverer_PeptideGroups_trimmed.txt. It is trimmed, and has all possible columns that can be exported from this level.

mlocardpaulet · 2024-12-17T17:05:39Z

From what I understand:

there is one peptidoform per row
stripped sequences are in Sequence
modifications are in Modifications
all protein accessions matching a given peptidoform and the species are in the column Protein Accessions (this requires a specific parsing rule of the fasta within Protein Discoverer that I'll need to add in the documentation)
multiple accessions are separated with ";"
contaminants can be identified with "Cont_" in the column Master Protein Descriptions (which does not contain all the accessions matching the peptide). actually this may be an issue? I should maybe change the parsing rule so that it is in the column Protein Accessions. Or we need to have somewhere a list of all the contaminant accessions in the fasta. Or maybe we can use the column Contaminant? But I would have to double check how this was set up in the search parameters.
then for quant I would suggest to use the values of the columns:
Abundances Normalized F1 Sample ConditionA, Abundances Normalized F2 Sample ConditionA, Abundances Normalized F3 Sample ConditionA, Abundances Normalized F4 Sample ConditionB, Abundances Normalized F5 Sample ConditionB, Abundances Normalized F6 Sample ConditionB
the sample names may vary depending on how users set up their analysis. I have to check how to make it consistant...

So... a few points are not completely clear yet, sorry.

mlocardpaulet · 2024-12-18T09:50:33Z

Regarding the contaminants:
I think we should do as if the "Cont_" were in the column Protein Accessions.
I will change the parsing of the fasta so that it works.
It means that with the file that I uploaded right now, no contaminants will be removed but should work on the next one.

I don't think that we should rely on the the column Contaminants. It would mean sending the fasta with only the contaminants, and some parameters to set up in PD.

mlocardpaulet · 2024-12-18T14:17:17Z

So... here is a new file, where I did not indicate anything in terms of experimental plan, so the column names where the quantities are are:

Abundances (Normalized): F1: Sample
Abundances (Normalized): F2: Sample
Abundances (Normalized): F3: Sample
Abundances (Normalized): F4: Sample
Abundances (Normalized): F5: Sample
Abundances (Normalized): F6: Sample

Sub-optimal. I'll have to find something else. I suspect that they are ordered (A, A, A and B, B, B), but I am not entirely sure.

But the issues with the column Protein Accessions are fixed: it now contains all the accessions matching a given proteoform AND the prefix "Cont_" for contaminants.
MulticonsensusProteoBench_DDAmodule2_SequestHT_Percolator_Quanti_241218_PeptideGroups_trimmed.txt.zip

I will work on the documentation, and try to find a "simple" way to get column headers that make more sense for the quantification...

mlocardpaulet added 2 commits December 17, 2024 16:44

add test file with peptidoform output from Proteome Discoverer.

7e2da1f

rename file

d0e707f

Merge branch 'main' into peptidoform-module-ProteomeDiscoverer

8e53b55

RobbinBouwmeester and others added 3 commits December 19, 2024 08:16

Merge branch 'main' into peptidoform-module-ProteomeDiscoverer

1ec6eab

Merge branch 'main' into peptidoform-module-ProteomeDiscoverer

149aad0

Support PD

010fc0f

RobbinBouwmeester merged commit 392cb3c into main Dec 19, 2024
8 checks passed

RobbinBouwmeester deleted the peptidoform-module-ProteomeDiscoverer branch December 19, 2024 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peptidoform output from Proteome Discoverer #510

Peptidoform output from Proteome Discoverer #510

mlocardpaulet commented Dec 17, 2024

mlocardpaulet commented Dec 17, 2024 •

edited

Loading

mlocardpaulet commented Dec 18, 2024 •

edited

Loading

mlocardpaulet commented Dec 18, 2024

Peptidoform output from Proteome Discoverer #510

Peptidoform output from Proteome Discoverer #510

Conversation

mlocardpaulet commented Dec 17, 2024

mlocardpaulet commented Dec 17, 2024 • edited Loading

mlocardpaulet commented Dec 18, 2024 • edited Loading

mlocardpaulet commented Dec 18, 2024

mlocardpaulet commented Dec 17, 2024 •

edited

Loading

mlocardpaulet commented Dec 18, 2024 •

edited

Loading