Skip to content

Commit

Permalink
minor changes
Browse files Browse the repository at this point in the history
  • Loading branch information
ypriverol committed Sep 29, 2023
1 parent 7006fad commit 4472f8c
Show file tree
Hide file tree
Showing 3 changed files with 126 additions and 93 deletions.
95 changes: 2 additions & 93 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,103 +13,12 @@ Contents
.. toctree::
:maxdepth: 1

usage
introduction
formats
preprocessing
identification
dda
dia
statistics
pmultiqc
benchmarks
debug
faq
presentations
dev
contact
de
tools

|
The ``.qms`` folder will contain multiple metadata files that will be
used to describe the project, the samples, the data acquisition and the
data processing. Each of these files will be described in the following
sections:

- `METADATA.md <METADATA.md>`__: A json file for metadata about the
analyzed project
- `AE <AE.rst>`__ or `DE <DE.rst>`__: A csv file based on the
MSstats (TODO link) format for either absolute expression or
differential expression.

Some general rules for all the files:

- The files are tab-delimited, json or parquet files
- Parquet files are compressed and can be read with pandas.

Common data structures and formats
----------------------------------

We have some concepts that are common for some outputs and would be good
to define and explain them here:

Peptidoform
~~~~~~~~~~~

- **Peptidoform**: A peptidoform is a peptide sequence with
modifications. For example, the peptide sequence ``PEPTIDM`` with a
modification of ``Oxidation`` would be ``PEPTIDM[Oxidation]``. The
peptidoform show be written using the `Proforma
specification <https://github.com/HUPO-PSI/ProForma>`__. This concept
is used in the following outputs:

- `PSM <PSM.rst>`__
- `FEATURES <FEATURES.rst>`__
- `PEPTIDE <PEPTIDE.rst>`__

Modifications
~~~~~~~~~~~~~

- **Modifications**: A modification is a chemical change in the peptide
sequence. Modifications can be annotated as part of the Proforma
notation inside the peptide or as a separate column. When annotating
the modification as a separate column, the format should be as close
as possible to the `mzTab format
notation <https://github.com/HUPO-PSI/mzTab/tree/master/specification_document-releases/1_0-Proteomics-Release>`__.
The modifications will encode the following information on each
peptide or psm:

- Modification name or accession: For example, ``Oxidation`` or
``UNIMOD:35``. Modifications SHOULD be reported using UNIMOD. If a
modification is not defined in UNIMOD, a CHEMMOD definition must
be used like ``CHEMMOD:-18.0913``, where the number is the mass
shift in Daltons.
- Position: The position of the modification in the peptide
sequence. Terminal modifications in proteins and peptides MUST be
reported with the position set to 0 (N-terminal) or the amino acid
length +1 (C-terminal) respectively. For example, ``1`` or
``1,2,3``.
- Localization Probability: The probability of the modification
being in the reported position.

Those three properties can be combined in one string as:

::

{position}({Probabilistic Score:0.9})|{position2}|..-{modification accession or name}

For example:

::

1(Probabilistic Score:0.8)|2(Probabilistic Score:0.9)|3-UNIMOD:35`.

This concept is used in the following outputs:

- `PSM <PSM.rst>`__
- `FEATURES <FEATURES.rst>`__
- `PEPTIDE <PEPTIDE.rst>`__

The following links should be followed to get support and help with the quantms maintainers:

|Get help on Slack| |Report Issue| |Get help on GitHub Forum|
Expand Down
81 changes: 81 additions & 0 deletions docs/introduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
Introduction to quantms.io
======================================

The ``.qms`` folder will contain multiple metadata files that will be
used to describe the project, the samples, the data acquisition and the
data processing. Each of these files will be described in the following
sections:

- `METADATA.md <METADATA.md>`__: A json file for metadata about the
analyzed project
- `AE <AE.rst>`__ or `DE <DE.rst>`__: A csv file based on the
MSstats (TODO link) format for either absolute expression or
differential expression.

Some general rules for all the files:

- The files are tab-delimited, json or parquet files
- Parquet files are compressed and can be read with pandas.

Common data structures and formats
----------------------------------

We have some concepts that are common for some outputs and would be good
to define and explain them here:

Peptidoform
~~~~~~~~~~~

- **Peptidoform**: A peptidoform is a peptide sequence with
modifications. For example, the peptide sequence ``PEPTIDM`` with a
modification of ``Oxidation`` would be ``PEPTIDM[Oxidation]``. The
peptidoform show be written using the `Proforma
specification <https://github.com/HUPO-PSI/ProForma>`__. This concept
is used in the following outputs:

- `PSM <PSM.rst>`__
- `FEATURES <FEATURES.rst>`__
- `PEPTIDE <PEPTIDE.rst>`__

Modifications
~~~~~~~~~~~~~

- **Modifications**: A modification is a chemical change in the peptide
sequence. Modifications can be annotated as part of the Proforma
notation inside the peptide or as a separate column. When annotating
the modification as a separate column, the format should be as close
as possible to the `mzTab format
notation <https://github.com/HUPO-PSI/mzTab/tree/master/specification_document-releases/1_0-Proteomics-Release>`__.
The modifications will encode the following information on each
peptide or psm:

- Modification name or accession: For example, ``Oxidation`` or
``UNIMOD:35``. Modifications SHOULD be reported using UNIMOD. If a
modification is not defined in UNIMOD, a CHEMMOD definition must
be used like ``CHEMMOD:-18.0913``, where the number is the mass
shift in Daltons.
- Position: The position of the modification in the peptide
sequence. Terminal modifications in proteins and peptides MUST be
reported with the position set to 0 (N-terminal) or the amino acid
length +1 (C-terminal) respectively. For example, ``1`` or
``1,2,3``.
- Localization Probability: The probability of the modification
being in the reported position.

Those three properties can be combined in one string as:

::

{position}({Probabilistic Score:0.9})|{position2}|..-{modification accession or name}

For example:

::

1(Probabilistic Score:0.8)|2(Probabilistic Score:0.9)|3-UNIMOD:35`.

This concept is used in the following outputs:

- `PSM <PSM.rst>`__
- `FEATURES <FEATURES.rst>`__
- `PEPTIDE <PEPTIDE.rst>`__
43 changes: 43 additions & 0 deletions docs/tools.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
quantms.io tools
=================================

psm converter tool
-------------------------

Use cases
~~~~~~~~~

Note:: Make sure before generating the psm feature file that you generate the project.json

::

python psm_command.py convert-psm-file
--mztab_file PXD014414.sdrf_openms_design_openms.mzTab
--output_folder result

- Non-PRIDE project(Don’t not need to run the ``project_command.py``)

::

python feature_command.py convert-psm-file
--mztab_file PXD014414.sdrf_openms_design_openms.mzTab
--generate_project False
--output_folder result

Optional parameter
~~~~~~~~~~~~~~~~~~

- –-output_prefix_file The prefix of the result file.
- –-verbose Output debug information.

Compare psm.parquet
-------------------

Use case
~~~~~~~~

::

python feature_command.py compare-set-of-psms
--parquets PXD014414-comet.parquet PXD014414-sage.parquet PXD014414-msgf.parquet
--tags comet sage msgf

0 comments on commit 4472f8c

Please sign in to comment.