Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mmuphin wrapper #6584

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
298cadd
mmuphin wrapper draft with errors
renu-pal Nov 22, 2024
ef26199
Update .shed.yml
renu-pal Nov 22, 2024
801cc88
removing long description from .shed.yml
renu-pal Nov 25, 2024
f0ca96f
Update tools/mmuphin/macros.xml
renu-pal Nov 25, 2024
c19da4d
Update tools/mmuphin/macros.xml
renu-pal Nov 25, 2024
52076ec
Update tools/mmuphin/macros.xml
renu-pal Nov 25, 2024
bab0879
Update tools/mmuphin/macros.xml
renu-pal Nov 25, 2024
7c09803
Update tools/mmuphin/macros.xml
renu-pal Nov 25, 2024
f340c26
Update tools/mmuphin/mmuphin.xml
renu-pal Nov 25, 2024
4ee8d65
reducing CRC_abd file size and adding adjust_batch.R file
renu-pal Nov 25, 2024
e861c0f
adding long description into .shed.yml due to linting issue
renu-pal Nov 25, 2024
f65885e
Update .shed.yml
renu-pal Nov 25, 2024
7fafc59
Update .shed.yml
renu-pal Nov 25, 2024
f79c79f
update
paulzierep Nov 28, 2024
1b335f1
update
paulzierep Nov 28, 2024
d86e487
rm unneeded requs
paulzierep Nov 28, 2024
47d426d
Merge pull request #4 from paulzierep/mmuphin_wrapper
renu-pal Nov 29, 2024
f815e86
changing batch value in test, as first column header is null
renu-pal Dec 3, 2024
f36aa66
removing control_output from test
renu-pal Dec 3, 2024
090b476
reducing file size
renu-pal Dec 3, 2024
c5af559
Update mmuphin.xml
renu-pal Dec 10, 2024
e60c159
fixed tests
paulzierep Dec 13, 2024
9c404da
Merge pull request #5 from paulzierep/mmuphin_wrapper
renu-pal Dec 19, 2024
7c2d2d7
Update tools/mmuphin/mmuphin.xml
renu-pal Jan 12, 2025
c1b167c
getting column names in R directly
renu-pal Jan 13, 2025
550cd60
Apply suggestions from code review
bgruening Jan 13, 2025
8d38c99
removed unnecessary commented code
renu-pal Jan 13, 2025
b6e9c39
Update tools/mmuphin/mmuphin.xml
renu-pal Jan 14, 2025
5109992
improving help section
renu-pal Jan 14, 2025
0c902c3
Update tools/mmuphin/mmuphin.xml
renu-pal Jan 14, 2025
19a33c0
removing additional options
renu-pal Jan 15, 2025
663287a
Update tools/mmuphin/mmuphin.xml
renu-pal Jan 16, 2025
a139a38
adding test with covariate=null and few other updates
renu-pal Jan 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions tools/mmuphin/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: mmuphin
owner: iuc
description: "MMUPHin is an R package implementing meta-analysis methods for microbial community profiles"
homepage_url: https://huttenhower.sph.harvard.edu/mmuphin
long_description: |
MMUPHin is a Bioconductor package implementing meta-analysis methods for microbial community profiles. It has interfaces for: a) covariate-controlled batch and study effect adjustment, b) meta-analytic differential abundance testing, and meta-analytic discovery of c) discrete (cluster-based) or d) continuous unsupervised population structure.

Overall, MMUPHin enables the normalization and combination of multiple microbial community studies. It can then help in identifying microbes, genes, or pathways that are differential with respect to combined phenotypes. Finally, it can find clusters or gradients of sample types that reproduce consistently among studies
remote_repository_url: https://github.com/biobakery/MMUPHin
type: unrestricted
categories:
- Metagenomics
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for the mmuphin function: {{ tool_name }}"
suite:
name: "suite_mmuphin"
description: "A suite of tools that brings the mmuphin project into Galaxy." "
long_description: |
MMUPHin is a Bioconductor package implementing meta-analysis methods for microbial community profiles. It has interfaces for: a) covariate-controlled batch and study effect adjustment, b) meta-analytic differential abundance testing, and meta-analytic discovery of c) discrete (cluster-based) or d) continuous unsupervised population structure.

Overall, MMUPHin enables the normalization and combination of multiple microbial community studies. It can then help in identifying microbes, genes, or pathways that are differential with respect to combined phenotypes. Finally, it can find clusters or gradients of sample types that reproduce consistently among studies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long_description is redundant here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long_description was duplicated in earlier code. It is completely removed now in 801cc88

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood and removed the whole thing but made the correction later: 7fafc59

27 changes: 27 additions & 0 deletions tools/mmuphin/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<?xml version="1.0"?>
<macros>
<token name="@TOOL_VERSION@">1.18.1</token>
<token name="@VERSION_SUFFIX@">0</token>
<token name="@PROFILE@">21.05</token>
renu-pal marked this conversation as resolved.
Show resolved Hide resolved

<xml name="xrefs">
<xrefs>
<xref type="bio.tools">mmuphin</xref>
<xref type="bioconductor">mmuphin</xref>

</xrefs>
</xml>
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">bioconductor-mmuphin</requirement>
<requirement type="package" version="2.0.3">magrittr</requirement>
renu-pal marked this conversation as resolved.
Show resolved Hide resolved
<requirement type="package" version="1.1.4">dplyr</requirement>
renu-pal marked this conversation as resolved.
Show resolved Hide resolved
<requirement type="package" version="0.33">DT</requirement>
renu-pal marked this conversation as resolved.
Show resolved Hide resolved
</requirements>
renu-pal marked this conversation as resolved.
Show resolved Hide resolved
</xml>
<xml name="citations">
<citations>
<citation type="doi"> 10.18129/B9.bioc.MMUPHin </citation>
</citations>
</xml>
</macros>
139 changes: 139 additions & 0 deletions tools/mmuphin/mmuphin.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
<tool id="mmuphin" name="mmuphin" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
renu-pal marked this conversation as resolved.
Show resolved Hide resolved
<description>Performing meta-analyses of microbiome studies</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="xrefs"/>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
Rscript '$rscript'
]]></command>

<configfiles>
<configfile name="rscript"><![CDATA[
bgruening marked this conversation as resolved.
Show resolved Hide resolved

library(MMUPHin)
library(magrittr)
library(dplyr)
library(ggplot2)
library(readr)

source(adjust_batch.R)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also need to add adjust_batch.R script to your test-data and source that here


#input files
print(" Read input files")
abd_data <- read_tsv("$input_data")
meta_data <- read_tsv("$input_metadata")

# Define control list
controls <- list("$zero_inflation",
"$pseudo_count",
"$conv",
"$maxit",
"$verbose",
"$diagnostic_plot")

#Perform batch adjustment
source(adjust_batch.R)
result <- adjust_batch(feature_abd = abd_data,
batch = "$batch_input",
covariates = "$covariates_input",
data = meta_data,
control=controls
)

# Save results into output files
print(result)
write.table(result$feature_abd_adj,file="$output",quote = FALSE)
write.table(result$control,file="$control_output",quote = FALSE)
#save adjust_batch_diagnostic.pdf into diagnostic_plot_output file too
]]></configfile>
</configfiles>
bgruening marked this conversation as resolved.
Show resolved Hide resolved



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please clean those things up? thanks

<inputs>
<param name="input_data" type="data" format="tabular" label="Data (or features) file"/>
<param name="input_metadata" type="data" format="tabular" label="Metadata file"/>
<param argument="batch_input" type="data_column" data_ref="input_metadata" use_header_names="true" label="batch" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please improve all labels and help text. They are not very user-friendly IMHO.

How does a metadata file needs to look like? Or the feature file? "batch"? Maybe "the column in which the batch identifier is species"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening does this work?
5109992

<param argument="covariates_input" type="data_column" data_ref="input_metadata" use_header_names="true" optional="true" label="covariates" />
<section name="additional_options" title="Additional Options" expanded="true">
<param argument="zero_inflation" type="boolean" truevalue="zero_inflation TRUE" falsevalue="zero_inflation FALSE" checked="true" label=" Run zero-inflated model"/>
renu-pal marked this conversation as resolved.
Show resolved Hide resolved
<param argument="pseudo_count" type="float" optional="true" label="Pseudo_count" help="Pseudo count to add feature_abd before the methods' log transformation.Default to NULL, in which case will be set to half of minimal non-zero values in feature_abd"/>
<param argument="conv" type="float" value="0.0001" optional="true" label="Convergence threshold" help="Convergence threshold for the method's iterative algorithm for shrinking batch effect parameters"/>
<param argument="maxit" type="float" value="1000" optional="true" label="Maximum number of iterations" help="Maximum number of iterations allowed for the method's iterative algorithm. Default to 1000"/>
<param argument="verbose" type="boolean" truevalue="verbose TRUE" falsevalue="verbose FALSE" checked="true" label="Print verbose information"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually don't expose those parameters to the user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening ,so should I remove them ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and set a useful default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bgruening , I have the made required changes . Does this work ?
19a33c0

<param argument="diagnostic_plot" type="boolean" truevalue="diagnostic_plot TRUE" falsevalue="diagnostic_plot FALSE" checked="true" label="Generate diagnostic figure file, default: adjust_batch_diagnostic.pdf"/>
</section>
</inputs>


<outputs>
<data name="output" format="tabular" label="Adjusted abundance table"/>
<data name="diagnostic_plot_output" format="pdf" label="diagnostic figure file"/>
<data name="control_output" format="tabular" label="control list used in batch adjustment"/>
</outputs>
<tests>
<test>
renu-pal marked this conversation as resolved.
Show resolved Hide resolved
<param name="input_data" value="CRC_abd.tsv"/>
<param name="input_metadata" value="CRC_meta.tsv"/>
<param name="batch_input" value="1"/>
<param name="covariates_input" value="2"/>
<section name="additional_options">
<param name="zero_inflation" value="TRUE"/>
<param name="pseudo_count" value="3"/>
<param name="conv" value="0.0001"/>
<param name="maxit" value="1000"/>
<param name="verbose" value="TRUE"/>
<param name="diagnostic_plot" value="TRUE"/>
</section>

<output name="output">
<assert_contents>
<has_size value="150053" delta="1000" />
</assert_contents>
</output>
<output name="diagnostic_plot_output" file="adjust_batch_diagnostic.pdf" ftype="pdf"/>
<output name="control_output">
<assert_contents>
<has_size value="1500" delta="100" />
</assert_contents>
</output>
</test>
</tests>
<help><![CDATA[
@HELP_HEADER@
MmuPHin
=========
MMUPHin is an R package implementing meta-analysis methods for microbial community profiles. It has interfaces for:

a) Performing batch (study) effect adjustment with adjust_batch :
------------------------------------------------------------------
It aims to correct for technical batch effects in microbial feature abundances. Batch effects refer to variations in data that arise not from the biological or experimental variables of interest but due to differences in technical or procedural factors during data collection or processing. For example:

Different equipment or lab environments.
Different operators handling the experiment.
Variations in sample preparation, sequencing runs, or platforms.

These unwanted variations can obscure true biological signals and introduce bias, making it critical to adjust for batch effects to ensure accurate and comparable results across datasets.

The function adjust_batch in the MMUPHin package is designed to correct batch effects in microbiome data.

Inputs:
=======
A feature-by-sample abundance matrix (e.g., microbial abundances).
A metadata file, which contains information about samples, including batch identifiers and optional covariates.

Output:
=======
A batch-adjusted abundance matrix for downstream analyses.

b) meta-analytic differential abundance testing
c) meta-analytic discovery of discrete (cluster-based) or continuous unsupervised population structure.

Meta-analysis methods are statistical techniques used to combine and synthesize data from multiple independent studies, typically to derive a more precise or generalizable conclusion. This approach is commonly used in fields such as medicine, psychology, and biology to aggregate research findings and increase the statistical power of analyses by pooling data from different experiments or studies.


]]></help>
<expand macro="citations"/>
</tool>
485 changes: 485 additions & 0 deletions tools/mmuphin/test-data/CRC_abd.tsv

Large diffs are not rendered by default.

Loading