From 4472f8caee988989f105e3817def0907da070615 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 29 Sep 2023 15:10:45 +0100 Subject: [PATCH] minor changes --- docs/index.rst | 95 +------------------------------------------ docs/introduction.rst | 81 ++++++++++++++++++++++++++++++++++++ docs/tools.rst | 43 ++++++++++++++++++++ 3 files changed, 126 insertions(+), 93 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index 9cd5729..f353c4f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -13,103 +13,12 @@ Contents .. toctree:: :maxdepth: 1 - usage introduction - formats - preprocessing - identification - dda - dia - statistics - pmultiqc - benchmarks - debug - faq - presentations - dev - contact + de + tools | -The ``.qms`` folder will contain multiple metadata files that will be -used to describe the project, the samples, the data acquisition and the -data processing. Each of these files will be described in the following -sections: - -- `METADATA.md `__: A json file for metadata about the - analyzed project -- `AE `__ or `DE `__: A csv file based on the - MSstats (TODO link) format for either absolute expression or - differential expression. - -Some general rules for all the files: - -- The files are tab-delimited, json or parquet files -- Parquet files are compressed and can be read with pandas. - -Common data structures and formats ----------------------------------- - -We have some concepts that are common for some outputs and would be good -to define and explain them here: - -Peptidoform -~~~~~~~~~~~ - -- **Peptidoform**: A peptidoform is a peptide sequence with - modifications. For example, the peptide sequence ``PEPTIDM`` with a - modification of ``Oxidation`` would be ``PEPTIDM[Oxidation]``. The - peptidoform show be written using the `Proforma - specification `__. This concept - is used in the following outputs: - - - `PSM `__ - - `FEATURES `__ - - `PEPTIDE `__ - -Modifications -~~~~~~~~~~~~~ - -- **Modifications**: A modification is a chemical change in the peptide - sequence. Modifications can be annotated as part of the Proforma - notation inside the peptide or as a separate column. When annotating - the modification as a separate column, the format should be as close - as possible to the `mzTab format - notation `__. - The modifications will encode the following information on each - peptide or psm: - - - Modification name or accession: For example, ``Oxidation`` or - ``UNIMOD:35``. Modifications SHOULD be reported using UNIMOD. If a - modification is not defined in UNIMOD, a CHEMMOD definition must - be used like ``CHEMMOD:-18.0913``, where the number is the mass - shift in Daltons. - - Position: The position of the modification in the peptide - sequence. Terminal modifications in proteins and peptides MUST be - reported with the position set to 0 (N-terminal) or the amino acid - length +1 (C-terminal) respectively. For example, ``1`` or - ``1,2,3``. - - Localization Probability: The probability of the modification - being in the reported position. - -Those three properties can be combined in one string as: - -:: - - {position}({Probabilistic Score:0.9})|{position2}|..-{modification accession or name} - -For example: - -:: - - 1(Probabilistic Score:0.8)|2(Probabilistic Score:0.9)|3-UNIMOD:35`. - -This concept is used in the following outputs: - -- `PSM `__ -- `FEATURES `__ -- `PEPTIDE `__ - The following links should be followed to get support and help with the quantms maintainers: |Get help on Slack| |Report Issue| |Get help on GitHub Forum| diff --git a/docs/introduction.rst b/docs/introduction.rst index e69de29..575457f 100644 --- a/docs/introduction.rst +++ b/docs/introduction.rst @@ -0,0 +1,81 @@ +Introduction to quantms.io +====================================== + +The ``.qms`` folder will contain multiple metadata files that will be +used to describe the project, the samples, the data acquisition and the +data processing. Each of these files will be described in the following +sections: + +- `METADATA.md `__: A json file for metadata about the + analyzed project +- `AE `__ or `DE `__: A csv file based on the + MSstats (TODO link) format for either absolute expression or + differential expression. + +Some general rules for all the files: + +- The files are tab-delimited, json or parquet files +- Parquet files are compressed and can be read with pandas. + +Common data structures and formats +---------------------------------- + +We have some concepts that are common for some outputs and would be good +to define and explain them here: + +Peptidoform +~~~~~~~~~~~ + +- **Peptidoform**: A peptidoform is a peptide sequence with + modifications. For example, the peptide sequence ``PEPTIDM`` with a + modification of ``Oxidation`` would be ``PEPTIDM[Oxidation]``. The + peptidoform show be written using the `Proforma + specification `__. This concept + is used in the following outputs: + + - `PSM `__ + - `FEATURES `__ + - `PEPTIDE `__ + +Modifications +~~~~~~~~~~~~~ + +- **Modifications**: A modification is a chemical change in the peptide + sequence. Modifications can be annotated as part of the Proforma + notation inside the peptide or as a separate column. When annotating + the modification as a separate column, the format should be as close + as possible to the `mzTab format + notation `__. + The modifications will encode the following information on each + peptide or psm: + + - Modification name or accession: For example, ``Oxidation`` or + ``UNIMOD:35``. Modifications SHOULD be reported using UNIMOD. If a + modification is not defined in UNIMOD, a CHEMMOD definition must + be used like ``CHEMMOD:-18.0913``, where the number is the mass + shift in Daltons. + - Position: The position of the modification in the peptide + sequence. Terminal modifications in proteins and peptides MUST be + reported with the position set to 0 (N-terminal) or the amino acid + length +1 (C-terminal) respectively. For example, ``1`` or + ``1,2,3``. + - Localization Probability: The probability of the modification + being in the reported position. + +Those three properties can be combined in one string as: + +:: + + {position}({Probabilistic Score:0.9})|{position2}|..-{modification accession or name} + +For example: + +:: + + 1(Probabilistic Score:0.8)|2(Probabilistic Score:0.9)|3-UNIMOD:35`. + +This concept is used in the following outputs: + +- `PSM `__ +- `FEATURES `__ +- `PEPTIDE `__ \ No newline at end of file diff --git a/docs/tools.rst b/docs/tools.rst index e69de29..8a7ad7c 100644 --- a/docs/tools.rst +++ b/docs/tools.rst @@ -0,0 +1,43 @@ +quantms.io tools +================================= + +psm converter tool +------------------------- + +Use cases +~~~~~~~~~ + +Note:: Make sure before generating the psm feature file that you generate the project.json + +:: + + python psm_command.py convert-psm-file + --mztab_file PXD014414.sdrf_openms_design_openms.mzTab + --output_folder result + +- Non-PRIDE project(Don’t not need to run the ``project_command.py``) + +:: + + python feature_command.py convert-psm-file + --mztab_file PXD014414.sdrf_openms_design_openms.mzTab + --generate_project False + --output_folder result + +Optional parameter +~~~~~~~~~~~~~~~~~~ + +- –-output_prefix_file The prefix of the result file. +- –-verbose Output debug information. + +Compare psm.parquet +------------------- + +Use case +~~~~~~~~ + +:: + + python feature_command.py compare-set-of-psms + --parquets PXD014414-comet.parquet PXD014414-sage.parquet PXD014414-msgf.parquet + --tags comet sage msgf