Parse settings #101

enryH · 2023-09-25T16:02:46Z

Aim: Parse parameter files of tools and map these to selected parameters for a given task for tool comparison

Started with mqpar.xml files.
Mappings of keys between tool parameter dictionaries and selected parameters needs to be crafted per task

Issue: #58

- allow longer lines - code was not black8 formatted (which is mentioned in dev dependencies)

…o parse_settings

- later: map selected entries per task (needs to be done) - changed to pytest (as it is run in a github action)

py37 support will be drop soon

- potential: allow to combine MaxQuant parameter files - easy to inspect csv parameter file ToDo: - could update the 4th index level to reflect some of the groups -> see comments in maxquant.py

- new dependency for excel file reading - three excel sheets contain information needed

- otherwise updates from parameter files will continue to be aware of the spelling mistake

Reference https://github.com/bigbio/proteomics-sample-metadata/blob/master/sdrf-proteomics/assets/param2sdrf.yml

proteobench/io/params/__init__.py

mlocardpaulet · 2023-11-10T12:20:22Z

proteobench/io/params/__init__.py

+    missed_cleavages: Optional[str] = None  # allowed_miscleavages
+    min_pep_length: Optional[str] = None  # min_peptide_length
+    max_pep_length: Optional[str] = None  # max_peptide_length
+    fixed_modifications: Optional[str] = None  # fixex_mods


should be 'fixed_mods'

mlocardpaulet · 2023-11-10T12:21:41Z

proteobench/io/params/__init__.py

+    fixed_modifications: Optional[str] = None  # fixex_mods
+    variable_modifications: Optional[str] = None  # variable_mods
+    max_num_modifications: Optional[str] = None  # max_mods
+    precursor_charge: Optional[str] = None  # min_precursor_charge


if we don't enter all the possible precursor charges, then we should have min_ and max_precursor_charge

But the maximum can be unlimited in some tools?

I am not sure. I guess that if this happens, we'll have a missing value there.
What I meant was more that in cases like in MQ, where they give all the charges that they consider, we need to get the min. and max. No?

MQ specifies a range of min. and max. charge. We only have a problem if a tools would e.g. allow peptides with charges 2,3 and 5, but excludes charges with 4. But I guess this is quite uncommon?

Yep, you are absolutely right. There could be an issue in this case. I don't like this.
Do you think that it is/was/should be discussed for the SDRF-extended format? Because this problem can also arise for them.

In that case yes, as we should not diverge from SDRF-extended.

mlocardpaulet · 2023-11-10T12:30:21Z

proteobench/io/params/__init__.py

There are two parameters missing, and I did not find the correspondance in the SDRF doc. But maybe I missed it:
'software_name' for the software tool (that combines a search engine and some algorithms for quantification)
'search_engine_version'

yes, me neither. I guess this is something SDRF does intend to have.

Could you check with Veit? Or anybody else? I think that we don't take any risk any way if we name them this way, No?

no, we can decide the name for parameters specific to our needs. SDRF is independent of software tools, so it would not make sense to have this kind of information there.

- Update ProteBenchParameters to reflect SDRF extended naming - Update Proline parameter extraction Could update DataPoint where it matches 1:1 old and new, but external changes needed first: https://github.com/Proteobench/Results_Module2_quant_DDA - test uses external git repo for validation...

- most selected parameters are easy to get - differences between version 1.6 and higher to previous (1.5) - fragment_mass_tolerance -> which fragenation method was used? -> missing information in extracted data for v1.5

enryH · 2023-11-13T20:28:41Z

One more question regarding fragement_mass_tolerance. Which fractionation technique was used? MQ seems to save several settings for different methods.

And it also has two version in our 3 xml files

1.6.0.0 and higher.
https://github.com/Proteobench/ProteoBench/blob/682825d27f5fb85692e6f2884b40d79334234920/test/params/mqpar_MQ1.6.3.3_MBR.xml#L384C1-L465C31

And before:
https://github.com/Proteobench/ProteoBench/blob/682825d27f5fb85692e6f2884b40d79334234920/test/params/mqpar1.5.3.30_MBR.xml#L220C1-L249C22

mlocardpaulet · 2023-11-14T07:47:59Z

One more question regarding fragement_mass_tolerance. Which fractionation technique was used? MQ seems to save several settings for different methods.

And it also has two version in our 3 xml files

1.6.0.0 and higher. https://github.com/Proteobench/ProteoBench/blob/682825d27f5fb85692e6f2884b40d79334234920/test/params/mqpar_MQ1.6.3.3_MBR.xml#L384C1-L465C31

And before: https://github.com/Proteobench/ProteoBench/blob/682825d27f5fb85692e6f2884b40d79334234920/test/params/mqpar1.5.3.30_MBR.xml#L220C1-L249C22

Maybe to check with Bart, but I'd say that you take the "FTMS" params. Is it what you asked?

enryH · 2023-11-14T13:18:25Z

Yes, I would like to have the information double check w.r.t to all tolerances. If it is FTMS, is this also be noted somewhere in the parameter file?
@brvpuyve Could you check which tolerances you would select for reporting in the two above mentioned MQ parameter files?

mlocardpaulet · 2023-11-14T15:12:21Z

Yes, I would like to have the information double check w.r.t to all tolerances. If it is FTMS, is this also be noted somewhere in the parameter file? @brvpuyve Could you check which tolerances you would select for reporting in the two above mentioned MQ parameter files?

MQ will look for the in formation in the raw file (for the FTMS). And here we used an orbitrap so this should be it.

- from 1.6 onwords, information is given explicitly

- MQ stores several settings in parameter file, which are then applied based on information stored in the rawfile metadata

- see open PR #95

RobbinBouwmeester · 2023-11-22T14:30:45Z

CONTRIBUTING.md

Was this not specifically removed by @RalfG?

enryH · 2023-11-22T15:03:08Z

No you were fast:) I'll add one test later, but this is only to keep track of changes.

Henry added 7 commits September 25, 2023 11:35

🎨 format code (max-line-length=120!)

8fc4166

- allow longer lines - code was not black8 formatted (which is mentioned in dev dependencies)

Merge branch 'main' of https://github.com/Proteobench/ProteoBench int…

f5445a7

…o parse_settings

Merge branch 'main' into parse_settings

f05c2a9

✨ parse mqpar.xml to dictionary

56e846f

- later: map selected entries per task (needs to be done) - changed to pytest (as it is run in a github action)

Merge main into parse_settings

b8a56b1

🐛 add support for newer annotations in py38

d7e38b7

py37 support will be drop soon

✨ DataFrame and csv format for MQ parameters

532157f

- potential: allow to combine MaxQuant parameter files - easy to inspect csv parameter file ToDo: - could update the 4th index level to reflect some of the groups -> see comments in maxquant.py

enryH force-pushed the parse_settings branch from ca06d9d to 532157f Compare October 12, 2023 18:28

Henry added 2 commits October 12, 2023 20:32

Merge branch 'main' into parse_settings

9b69350

🎨 format merged file

be17263

enryH mentioned this pull request Oct 17, 2023

Update search parameters of new data point with values from parsed meta data file before submission to proteobot #98

Closed

Henry Webel added 3 commits October 21, 2023 14:19

Merge branch 'main' into parse_settings

27079ef

✨ Parse MSFragger parameter files

fcce49f

✨ Proline params + mapping params to our names

dc1233e

- new dependency for excel file reading - three excel sheets contain information needed

enryH linked an issue Oct 27, 2023 that may be closed by this pull request

Implement support for uploading (and storing) workflow configuration files #58

Closed

Henry Webel added 3 commits November 9, 2023 21:15

Merge branch 'main' into parse_settings

07ff672

🎨 correct spelling mistake fragmnent -> fragment

8ab896a

- otherwise updates from parameter files will continue to be aware of the spelling mistake

🚧 Prepare renaming according to SDRF

a83dbcf

Reference https://github.com/bigbio/proteomics-sample-metadata/blob/master/sdrf-proteomics/assets/param2sdrf.yml

mlocardpaulet reviewed Nov 10, 2023

View reviewed changes

proteobench/io/params/__init__.py Outdated Show resolved Hide resolved

mlocardpaulet reviewed Nov 10, 2023

View reviewed changes

proteobench/io/params/__init__.py Outdated Show resolved Hide resolved

mlocardpaulet reviewed Nov 10, 2023

View reviewed changes

Henry added 2 commits November 11, 2023 17:09

🚧 start extracting MQ params

682825d

- most selected parameters are easy to get - differences between version 1.6 and higher to previous (1.5) - fragment_mass_tolerance -> which fragenation method was used? -> missing information in extracted data for v1.5

Henry Webel added 2 commits November 13, 2023 21:28

Merge branch 'main' into parse_settings

33aebf4

🐛 update test ("fragmnent" -> "fragment")

df208d2

Henry added 3 commits November 19, 2023 17:39

✨ Add parsing of attributes for MaxQuant 1.5

5a95fa0

- from 1.6 onwords, information is given explicitly

✨ extract ms2 parameter based on specified method

f47c6fc

- MQ stores several settings in parameter file, which are then applied based on information stored in the rawfile metadata

👷

769624a

- see open PR #95

enryH force-pushed the parse_settings branch from 333c0ba to 769624a Compare November 19, 2023 18:19

enryH marked this pull request as ready for review November 22, 2023 09:22

Merge branch 'main' into parse_settings

27f8f1f

RobbinBouwmeester approved these changes Nov 22, 2023

View reviewed changes

CONTRIBUTING.md

Copy link

Contributor

RobbinBouwmeester Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this not specifically removed by @RalfG?

RobbinBouwmeester merged commit 902dd1b into main Nov 22, 2023
4 checks passed

RobbinBouwmeester deleted the parse_settings branch November 22, 2023 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse settings #101

Parse settings #101

enryH commented Sep 25, 2023 •

edited

Loading

mlocardpaulet Nov 10, 2023

mlocardpaulet Nov 10, 2023

enryH Nov 10, 2023

mlocardpaulet Nov 13, 2023

enryH Nov 13, 2023

mlocardpaulet Nov 13, 2023

enryH Nov 13, 2023

mlocardpaulet Nov 10, 2023

enryH Nov 10, 2023

mlocardpaulet Nov 13, 2023

enryH Nov 13, 2023

enryH commented Nov 13, 2023

mlocardpaulet commented Nov 14, 2023

enryH commented Nov 14, 2023

mlocardpaulet commented Nov 14, 2023

RobbinBouwmeester Nov 22, 2023

enryH commented Nov 22, 2023

Parse settings #101

Parse settings #101

Conversation

enryH commented Sep 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enryH commented Nov 13, 2023

mlocardpaulet commented Nov 14, 2023

enryH commented Nov 14, 2023

mlocardpaulet commented Nov 14, 2023

Choose a reason for hiding this comment

enryH commented Nov 22, 2023

enryH commented Sep 25, 2023 •

edited

Loading