Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding HalfDeep #6592

Merged
merged 13 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions tools/halfdeep/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
categories:
- Sequence Analysis
description: "HalfDeep: Automated detection of intervals covered at half depth by sequenced reads."
homepage_url: https://github.com/makovalab-psu/HalfDeep
long_description: |
Automated detection of intervals covered at half depth by sequenced reads.
name: halfdeep
owner: iuc
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/halfdeep
type: unrestricted
92 changes: 92 additions & 0 deletions tools/halfdeep/halfdeep.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
<tool id="halfdeep" name="HalfDeep" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>identifies genomic regions with half-depth coverage based on sequencing read mappings.</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
#import re
##
## reference
##
ln -s '$ref' 'ref.$ref.ext' &&
touch ref.idx &&
richard-burhans marked this conversation as resolved.
Show resolved Hide resolved
##
## reads
##
#set $reads_dir = "reads"
#set $mapped_reads_dir = "halfdeep/ref/mapped_reads"
mkdir -p '$reads_dir' '$mapped_reads_dir' &&
#for $read in $reads
#set $read_base = re.sub('[^\w\-\s]', '_', str($read.element_identifier))
ln -s '$read' '$reads_dir/${read_base}.$read.ext' &&
echo '$reads_dir/${read_base}.$read.ext' >> input.fofn &&
##
## mapped reads
##
#for $mapped_read in $mapped_reads
ln -s '$mapped_read' "$mapped_reads_dir/${read_base}.bam" &&
ln -s "${read_base}.bam" "$mapped_reads_dir/${read_base}.sort.bam" &&
ln -s '$mapped_read.metadata.bam_index' "$mapped_reads_dir/${read_base}.sort.bam.bai" &&
#end for
richard-burhans marked this conversation as resolved.
Show resolved Hide resolved
#end for
##
## run bam_depth.sh
##
#for $line_number in range(1, len($reads) + 1)
bam_depth.sh 'ref.$ref.ext' $line_number &&
#end for
##
## run halfdeep.sh
##
halfdeep.sh 'ref.$ref.ext'
]]></command>
<inputs>
<param name="ref" type="data" format="fasta,fasta.gz" label="Genome Assembly" help="A Genome Assembly in FASTA format."/>
<param name="reads" type="data" format="fastqsanger,fastqsanger.bz2,fastqsanger.gz" multiple="true" label="Sequencing Reads" help="Sequencing Reads for the Genome Assembly in FASTQ format."/>
<param name="mapped_reads" type="data" format="bam" multiple="true" label="Aligned Reads" help="Alignments of the Sequencing Reads to the Genome Assembly in BAM format."/>
richard-burhans marked this conversation as resolved.
Show resolved Hide resolved
</inputs>
<outputs>
<data name="scaffold_len" format="tabular" from_work_dir="halfdeep/ref/scaffold_lengths.dat" label="Scaffold lengths for ${on_string}"/>
<data name="depth_dat" format="tabular.gz" from_work_dir="halfdeep/ref/depth.dat.gz" label="Depth for ${on_string}"/>
<data name="pct_cmds" format="text" from_work_dir="halfdeep/ref/percentile_commands.sh" label="Percentile to value for ${on_string}"/>
<data name="halfdeep_dat" format="bed" from_work_dir="halfdeep/ref/halfdeep.dat" label="HalfDeep on ${on_string}"/>
</outputs>
<tests>
<test expect_num_outputs="4">
<param name="ref" value="ref.fasta.gz" ftype="fasta.gz"/>
<param name="reads" value="reads.fasta.gz" ftype="fasta.gz"/>
<param name="mapped_reads" value="mapped_reads.bam" ftype="bam"/>
richard-burhans marked this conversation as resolved.
Show resolved Hide resolved
<output name="scaffold_len" file="scaffold_lengths.tabular" ftype="tabular"/>
<output name="depth_dat" file="depth.tabular.gz" ftype="tabular.gz"/>
<output name="pct_cmds" file="percentile.txt" ftype="text"/>
<output name="halfdeep_dat" file="halfdeep.bed" ftype="bed"/>
</test>
</tests>
<help><![CDATA[

HalfDeep identifies genomic regions with half-depth coverage based on sequencing read mappings. These regions may reveal insights into heterogametic sex chromosomes, haplotype-specific variation, or potential assembly errors such as heterotypic duplications.

Given the following three inputs:

1. A genome assembly in FASTA format.
2. Reads in FASTQ format.
3. Mapped reads in BAM format

HalfDeep automates the following tasks:

1. Mapping reads and merging individual mapping files.
2. Calculating per-base read depth.
3. Smoothing read coverage using a defined window with genodsp.
4. Determining the percentile of read coverage.
5. Identifying genomic regions with half-depth coverage based on a specified percentile threshold (e.g., 40–60%) and exporting them in BED file forma

HalfDeep produces the following outputs:

1. Scaffold lengths: A tabular file containing the name and legth of each sequence in the genome assembly.
2. Depths: A tabular file containing the read depts.
3. A tabular file containing the name and legth of each sequence in the genome assembly: stuff
4. HalfDeep: BED file containina regions of the genome assembly that are "covered at half depth"
]]></help>
<expand macro="citations"/>
</tool>
23 changes: 23 additions & 0 deletions tools/halfdeep/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<macros>
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">halfdeep</requirement>
</requirements>
</xml>
<token name="@TOOL_VERSION@">0.1.0</token>
<token name="@VERSION_SUFFIX@">0</token>
<token name="@PROFILE@">21.05</token>
<xml name="citations">
<citations>
<citation type="bibtex">
@misc{github_halfdeep,
author = {Makova Lab PSU},
year = "2019",
title = {HalfDeep},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/makovalab-psu/HalfDeep}
</citation>
</citations>
</xml>
</macros>
Binary file added tools/halfdeep/test-data/depth.tabular.gz
Binary file not shown.
Empty file.
Binary file added tools/halfdeep/test-data/mapped_reads.bam
Binary file not shown.
6 changes: 6 additions & 0 deletions tools/halfdeep/test-data/percentile.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
export percentile40=0.975
export percentile50=0.986
export percentile60=1.331
export halfPercentile40=0.4875
export halfPercentile50=0.493
export halfPercentile60=0.6655
Binary file added tools/halfdeep/test-data/reads.fasta.gz
Binary file not shown.
Binary file added tools/halfdeep/test-data/ref.fasta.gz
Binary file not shown.
3 changes: 3 additions & 0 deletions tools/halfdeep/test-data/scaffold_lengths.tabular
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
FAKE1 482501
FAKE2 366529
FAKE3 150970
Loading