Skip to content

Commit

Permalink
Release 1.3 (#3)
Browse files Browse the repository at this point in the history
* # Changed mutational-context-range according to pore type and updated plots

## mutational context
- r9 mut_context range is now 5
- r10 mut_context range is now 9

## plots
- *MeanDistAvgStdev* are now called *MeDAS* plots
- added a MeDAS plot for excluding low coverage positions
- added a MeDAS plot using sns.replot using different markers for positions with coverage below 10 and above

* switched segmentation algorithm to f5c

* Fixed range check

* Update tests according to new r9 range

* added --rna and --r10

* Fix hue in MeDAS coverage plot

* Script to filter .magnipore for coverage

* Add entry point of cov_filter.py

* Added description to cov_filter

* Make plots prettier

* Update gitignore

* Update READMEs and descriptions

* Set default coverage threshold to 10

* Update plots

* rename filter script

* sort imports

* Update

* Reduce Runtime, Multiprocessing

- Using multiprocessing for model comparisons
- Remove pandas dataframe
- Added plotting script after comparison

* Update tests

* Update meta data

* Move progress print to subprocess to see real progress

* Multiprocessing Model Building

* Update tests according to code changes

* Update tests

* Add modules to namespace

* Remove slower multiprocessing

* Update gitignore

* Add argument for the coverage

* Remove comments

* Remove namespace

* Update tests

* Fix bugs

* remove logs from tests

* Updated test

* Update test

* Make gzip overwrite already existing files

* Remove duplicate command logs

* Improve plots

* Improve plots

* Improve plots

* Improve plots and add more description

* Fix count for  `Positions with no data X, ...`

* Add magnicheck

* Change argument order

* Add total eval_pos to output

* Take strand into account

* Fix KeyError

* Fix Bug

* Fix output kmer value

* Add output

* Output more information

* Fix Bug

- multiprocessing.value counters are now incremented using a Lock

* still trying to fix incrementation bug

* Reduce plotting size

* Bug fixes

* Switched argument order in f5c index command

* do not print warnings

* sort imports

* Fix usage message

* Add --help messages for all scripts

* Check tests
  • Loading branch information
JannesSP authored Sep 11, 2023
1 parent a9a19cf commit 81548a7
Show file tree
Hide file tree
Showing 25 changed files with 4,379 additions and 4,002 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -132,3 +132,9 @@ pull_request.md
zenodo/trick_zenodo.py
readme.md
.vscode/settings.json
local_test.sh
test.py
test.txt
magnipore/nanosherlock_mp.py
magnipore/nanosherlock_old.py
tests/segmentation/test_*/log.txt
64 changes: 6 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,64 +148,11 @@ magnipore --basecalls_first_sample basecalls_first_sample --basecalls_sec_sample

Using the same reference sequence for both samples results in no reported mutations. Magnipore will only report potential modifications in this case. If you assume there are mutations between the samples, try to provide different reference sequences containing these mutations.

### Help Message
### Help Messages

<details><summary>Click here to see help message:</summary>
[Complete help messages can be found here!](help/help_messages.md)

```bash
usage: Magnipore [-h] [--guppy_bin GUPPY_BIN] [--guppy_model GUPPY_MODEL] [--guppy_device GUPPY_DEVICE] [-b1 FASTQ] [-b2 FASTQ] [-s1 TXT] [-s2 TXT] [-d] [-t THREADS] [-fr]
[-mx {map-ont,splice,ava-ont}] [-mk MINIMAP2K] [--timeit] [--rna] [-v]
raw_data_first_sample reference_first_sample label_first_sample raw_data_sec_sample reference_sec_sample label_sec_sample working_dir

Required tools: see github https://github.com/JannesSP/magnipore

positional arguments:
raw_data_first_sample
Parent directory of FAST5 files of first sample, can also be a single SLOW5 or BLOW5 file of first sample, that contains all reads, if FASTQs are
provided
reference_first_sample
reference FASTA file of first sample, POSITIVE (+) or FORWARD strand, ATTENTION: can only contain a single sequence
label_first_sample Name of the sample or pipeline run
raw_data_sec_sample Parent directory of FAST5 files of second sample, can also be SLOW5 or BLOW5 file of second sample, that contains all reads, if FASTQs are provided
reference_sec_sample reference FASTA file of second sample, POSITIVE (+) or FORWARD strand, ATTENTION: can only contain a single sequence
label_sec_sample Name of the sample or pipeline run
working_dir Path to write all output files

optional arguments:
-h, --help show this help message and exit
--guppy_bin GUPPY_BIN
Guppy binary (default: None)
--guppy_model GUPPY_MODEL
Guppy model used for basecalling (default: None)
--guppy_device GUPPY_DEVICE
Use the GPU to basecall "cuda:0" to use the GPU with ID 0 (default: cuda:0)
-b1 FASTQ, --basecalls_first_sample FASTQ
Path to existing basecalls of first sample. Basecalls must be in one single file. (default: None)
-b2 FASTQ, --basecalls_sec_sample FASTQ
Path to existing basecalls of second sample. Basecalls must be in one single file. (default: None)
-s1 TXT, --sequencing_summary_first_sample TXT
Use, when sequencing summary is not next to your FASTQ file. Path to existing sequencing summary file of second sample. (default: None)
-s2 TXT, --sequencing_summary_sec_sample TXT
Use, when sequencing summary is not next to your FASTQ file. Path to existing sequencing summary file of first sample. (default: None)
-d, --calculate_data_density
Will calculate data density after building the models. Will increase runtime! (default: False)
-t THREADS, --threads THREADS
Number of threads to use (default: 1)
-fr, --force_rebuild Run commands regardless if files are already present (default: False)
-mx {map-ont,splice,ava-ont}, --minimap2x {map-ont,splice,ava-ont}
-x parameter for minimap2 (default: map-ont)
-mk MINIMAP2K, --minimap2k MINIMAP2K
-k parameter for minimap2 (default: 14)
--timeit Measure and print time used by submodules (default: False)
-rna Use when data is rna (default: False)
-r10 Use when data is from R10.4.1 flowcell (default: False)
-km KMER_MODEL, --kmer_model KMER_MODEL
custom kmer model file for f5c eventalign (default: None)
-v, --version show program's version number and exit
```
</details>
#### required arguments:
#### required arguments for magnipore:
use either the basecalling arguments or provide basecalls
- basecalling arguments:
- guppy_bin : Path to guppy binary
Expand All @@ -215,8 +162,6 @@ use either the basecalling arguments or provide basecalls
- basecalls_first_sample : Path
- basecalls_sec_sample : Path

For optional arguments see magnipore.py --help. Includes small number of mapping parameters and the option to skip basecalling.
## Output File Description

<details><summary>Click here to see overview:</summary>
Expand Down Expand Up @@ -252,6 +197,9 @@ same for second sample:
- 13: Running nanosherlock of the first sample failed
- 14: Running nanosherlock of the second sample failed
- 15: Number of provided reference sequences is not equal 1 or 2
- 16: Unknown pore type
- 17: Error in multiprocessing signal comparison
- 18: Error in magniplot
---
Errors of first sample:
- 119: Cannot basecall .slow5/.blow5 with guppy
Expand Down
66 changes: 8 additions & 58 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -139,63 +139,13 @@ reported mutations. Magnipore will only report potential modifications
in this case. If you assume there are mutations between the samples, try
to provide different reference sequences containing these mutations.

Help Message
------------
Help Messages
-------------

.. code:: bash
`Complete help messages can be found here! <help/help_messages.md>`__

usage: Magnipore [-h] [--guppy_bin GUPPY_BIN] [--guppy_model GUPPY_MODEL] [--guppy_device GUPPY_DEVICE] [-b1 FASTQ] [-b2 FASTQ] [-s1 TXT] [-s2 TXT] [-d] [-t THREADS] [-fr]
[-mx {map-ont,splice,ava-ont}] [-mk MINIMAP2K] [--timeit] [--rna] [-v]
raw_data_first_sample reference_first_sample label_first_sample raw_data_sec_sample reference_sec_sample label_sec_sample working_dir
Required tools: see github https://github.com/JannesSP/magnipore
positional arguments:
raw_data_first_sample
Parent directory of FAST5 files of first sample, can also be a single SLOW5 or BLOW5 file of first sample, that contains all reads, if FASTQs are
provided
reference_first_sample
reference FASTA file of first sample, POSITIVE (+) or FORWARD strand, ATTENTION: can only contain a single sequence
label_first_sample Name of the sample or pipeline run
raw_data_sec_sample Parent directory of FAST5 files of second sample, can also be SLOW5 or BLOW5 file of second sample, that contains all reads, if FASTQs are provided
reference_sec_sample reference FASTA file of second sample, POSITIVE (+) or FORWARD strand, ATTENTION: can only contain a single sequence
label_sec_sample Name of the sample or pipeline run
working_dir Path to write all output files
optional arguments:
-h, --help show this help message and exit
--guppy_bin GUPPY_BIN
Guppy binary (default: None)
--guppy_model GUPPY_MODEL
Guppy model used for basecalling (default: None)
--guppy_device GUPPY_DEVICE
Use the GPU to basecall "cuda:0" to use the GPU with ID 0 (default: cuda:0)
-b1 FASTQ, --basecalls_first_sample FASTQ
Path to existing basecalls of first sample. Basecalls must be in one single file. (default: None)
-b2 FASTQ, --basecalls_sec_sample FASTQ
Path to existing basecalls of second sample. Basecalls must be in one single file. (default: None)
-s1 TXT, --sequencing_summary_first_sample TXT
Use, when sequencing summary is not next to your FASTQ file. Path to existing sequencing summary file of second sample. (default: None)
-s2 TXT, --sequencing_summary_sec_sample TXT
Use, when sequencing summary is not next to your FASTQ file. Path to existing sequencing summary file of first sample. (default: None)
-d, --calculate_data_density
Will calculate data density after building the models. Will increase runtime! (default: False)
-t THREADS, --threads THREADS
Number of threads to use (default: 1)
-fr, --force_rebuild Run commands regardless if files are already present (default: False)
-mx {map-ont,splice,ava-ont}, --minimap2x {map-ont,splice,ava-ont}
-x parameter for minimap2 (default: map-ont)
-mk MINIMAP2K, --minimap2k MINIMAP2K
-k parameter for minimap2 (default: 14)
--timeit Measure and print time used by submodules (default: False)
-rna Use when data is rna (default: False)
-r10 Use when data is from R10.4.1 flowcell (default: False)
-km KMER_MODEL, --kmer_model KMER_MODEL
custom kmer model file for f5c eventalign (default: None)
-v, --version show program's version number and exit
required arguments:
~~~~~~~~~~~~~~~~~~~
required arguments for magnipore:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

use either the basecalling arguments or provide basecalls

Expand All @@ -207,9 +157,6 @@ use either the basecalling arguments or provide basecalls
- basecalls_first_sample : Path
- basecalls_sec_sample : Path

For optional arguments see magnipore.py –help. Includes small number of
mapping parameters and the option to skip basecalling.
Output File Description
=======================

Expand Down Expand Up @@ -252,6 +199,9 @@ Error Codes Explanation
- 13: Running nanosherlock of the first sample failed
- 14: Running nanosherlock of the second sample failed
- 15: Number of provided reference sequences is not equal 1 or 2
- 16: Unknown pore type
- 17: Error in multiprocessing signal comparison
- 18: Error in magniplot

Errors of first sample:

Expand Down
6 changes: 6 additions & 0 deletions conda.recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ test:
commands:
- magnipore --help
- nanosherlock --help
- magnifilter --help
- magniplot --help
- magnicheck --help
- pytest -vv

about:
Expand Down Expand Up @@ -274,6 +277,9 @@ about:
- 13: Running nanosherlock of the first sample failed
- 14: Running nanosherlock of the second sample failed
- 15: Number of provided reference sequences is not equal 1 or 2
- 16: Unknown pore type
- 17: Error in multiprocessing signal comparison
- 18: Error in magniplot with error code
---
Errors of first sample:
Expand Down
Loading

0 comments on commit 81548a7

Please sign in to comment.