Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interaction:fixed effects and corrections modified #5928

Merged
merged 6 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 143 additions & 16 deletions tools/maaslin2/maaslin2.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,34 @@
<expand macro="xrefs"/>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[

## get column names of fixed and random effect from the input file, since galaxy
## can only return indices with type="data_column"
## using awk so that the file is only parsed on command line execution

#if $fixed_effects
#set idx = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renu-pal you said Yh it did a bit, but it's returning the complete list in case of 'random effect " when user does not choose any option
Could you try to fix that here, i.e. add an if statement, that does not choose any columns if no columns are specified by the user - and also add a test for it ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, working on that

#for $i in $fixed_effects:
#silent idx.append(f'${i}')
#end for
#set idx_for_awk = ','.join(idx)

fixed_effects=`awk -v OFS=',' -F"\t" 'NR == 1 { print $idx_for_awk}' '$input_metadata'` &&
echo 'Assigned fixed effects as:' \$fixed_effects &&
#end if


#if $random_effects
#set idx = []
#for $i in $random_effects:
#silent idx.append(f'${i}')
#end for
#set idx_for_awk = ','.join(idx)

random_effects=`awk -v OFS=',' -F"\t" 'NR == 1 { print $idx_for_awk}' '$input_metadata'` &&
echo 'Assigned random effects as:' \$random_effects &&
#end if

ln -s '$input_data' 'input_data.tsv'
&&
ln -s '$input_metadata' 'input_metadata.tsv'
Expand All @@ -31,10 +59,10 @@ Maaslin2.R
--analysis_method '$additional_options.analysis_method'
#end if
#if $random_effects
--random_effects '$random_effects'
--random_effects \$random_effects
#end if
#if $fixed_effects
--fixed_effects '$fixed_effects'
--fixed_effects \$fixed_effects
#end if
#if $additional_options.correction
--correction '$additional_options.correction'
Expand All @@ -51,19 +79,17 @@ Maaslin2.R
'outputFolder'
&&
cd outputFolder && mkdir -p figures/ && cp *.pdf figures

]]></command>
<inputs>
<param name="input_data" type="data" format="tabular" label="Data (or features) file"/>
<param name="input_metadata" type="data" format="tabular" label="Metadata file"/>
<param argument="--fixed_effects" type="select" multiple="true" optional="true" label="Interactions: Fixed effects" help="The fixed effects for the model, comma-delimited for multiple effects">
<option value="diagnosis" selected="true">diagnosis</option>
<option value="dysbiosisnonIBD" selected="true">dysbiosisnonIBD</option>
<option value="dysbiosisUC" selected="true">dysbiosisUC</option>
<option value="dysbiosisCD" selected="true">dysbiosisCD</option>
<option value="antibiotics" selected="true">antibiotics</option>
<option value="age" selected="true">age</option>
</param>
<param argument="--random_effects" type="text" multiple="true" optional="true" label="Random effects" help="The random effects for the model, comma-delimited for multiple effects"/>
<param argument="--fixed_effects" type="data_column" data_ref="input_metadata" use_header_names="true" multiple="true" optional="true" label="Interactions: Fixed effects" help="The fixed effects for the model, comma-delimited for multiple effects, Default value: All " />

<param argument="--random_effects" type="data_column" data_ref="input_metadata" use_header_names="true" multiple="true" optional="true" label="Random effects" help="The random effects for the model, comma-delimited for multiple effects, Default: None" />



<section name="additional_options" title="Additional Options" expanded="true">
<param argument="--min_abundance" type="float" value="0.0" optional="true" label="Minimum abundance" help="The minimum abundance for each feature"/>
<param argument="--min_prevalence" type="float" value="0.1" optional="true" label="Minimum prevalence" help="The minimum percent of samples for which a feature is detected at minimum abundance"/>
Expand All @@ -87,7 +113,11 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures
<option value="NEGBIN">NEGBIN</option>
<option value="ZINB">ZINB</option>
</param>
<param argument="--correction" type="text" value="BH" optional="true" label="Correction" help="The correction method for computing the q-value"/>
<param argument="--correction" type="select" value="BH" optional="true" label="Correction" help="The correction method for computing the q-value, Default: BH ">

<option value="BH">Benjamini-Hochberg(BH)</option>
<option value="BY">Benjamini-Yekutieli(BY)</option>
</param>
<param argument="--standardize" type="boolean" truevalue="--standardize TRUE" falsevalue="--standardize FALSE" checked="true" label="Apply z-score so continuous metadata are on the same scale"/>
</section>
<section name="output" title="Set Plotting Output" expanded="true">
Expand Down Expand Up @@ -115,8 +145,8 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures
<test expect_num_outputs="5">
<param name="input_data" value="HMP2_taxonomy.tsv"/>
<param name="input_metadata" value="HMP2_metadata.tsv"/>
<param name="random_effects" value="site,subject"/>
<param name="fixed_effects" value="diagnosis,dysbiosisnonIBD,dysbiosisUC,dysbiosisCD,antibiotics,age"/>
<param name="random_effects" value= "2,5"/>
<param name="fixed_effects" value="4,9,10,11,6,3"/>
<section name="additional_options">
<param name="min_abundance" value="0.0"/>
<param name="min_prevalence" value="0.1"/>
Expand Down Expand Up @@ -198,7 +228,7 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures
<test expect_num_outputs="5">
<param name="input_data" value="HMP2_taxonomy.tsv"/>
<param name="input_metadata" value="HMP2_metadata.tsv"/>
<param name="fixed_effects" value="diagnosis,dysbiosisnonIBD"/>
<param name="fixed_effects" value="4,9"/>
<section name="additional_options">
<param name="min_abundance" value="0.0"/>
<param name="min_prevalence" value="0.1"/>
Expand Down Expand Up @@ -245,7 +275,7 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures
<test expect_num_outputs="5">
<param name="input_data" value="HMP2_taxonomy.tsv"/>
<param name="input_metadata" value="HMP2_metadata.tsv"/>
<param name="fixed_effects" value="diagnosis,dysbiosisnonIBD"/>
<param name="fixed_effects" value="4,9"/>
<section name="additional_options">
<param name="min_abundance" value="0.0001"/>
<param name="min_prevalence" value="0.1"/>
Expand Down Expand Up @@ -304,6 +334,100 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures
</element>
</output_collection>
</test>
<test expect_num_outputs="5">
<param name="input_data" value="HMP2_taxonomy.tsv"/>
<param name="input_metadata" value="HMP2_metadata.tsv"/>
<param name="random_effects" value="3" />
<section name="additional_options">
<param name="min_abundance" value="0.0"/>
<param name="min_prevalence" value="0.1"/>
<param name="max_significance" value="0.25"/>
<param name="normalization" value="TSS"/>
<param name="transform" value="LOG"/>
<param name="analysis_method" value="LM"/>
<param name="correction" value="BY"/>
<param name="standardize" value="True"/>
</section>
<section name="output">
<param name="plot_heatmap" value="true"/>
<param name="heatmap_first_n" value="50"/>
<param name="plot_scatter" value="true"/>
<param name="residuals_output" value="true"/>
</section>
<output name="all_results">
<assert_contents>
<has_text text="feature"/>
<has_n_lines n="8092"/>
<has_n_columns n="9"/>
</assert_contents>
</output>
<output name="significant_results">
<assert_contents>
<has_text text="subject"/>
<has_n_lines n="216" delta="5"/>
<has_n_columns n="9"/>
</assert_contents>
</output>
<output name="residuals">
<assert_contents>
<has_size value="671142" delta="1000"/>
</assert_contents>
</output>
<output_collection name="figures_pdfs" type="list">
<element name="heatmap.pdf" ftype="pdf">
<assert_contents>
<has_size value="7000" delta="1000" />
</assert_contents>
</element>
</output_collection>
</test>
<test expect_num_outputs="5">
<param name="input_data" value="HMP2_taxonomy.tsv"/>
<param name="input_metadata" value="HMP2_metadata.tsv"/>

<section name="additional_options">
<param name="min_abundance" value="0.0"/>
<param name="min_prevalence" value="0.1"/>
<param name="max_significance" value="0.25"/>
<param name="normalization" value="TSS"/>
<param name="transform" value="LOG"/>
<param name="analysis_method" value="LM"/>
<param name="correction" value="BH"/>
<param name="standardize" value="True"/>
</section>
<section name="output">
<param name="plot_heatmap" value="true"/>
<param name="heatmap_first_n" value="50"/>
<param name="plot_scatter" value="true"/>
<param name="residuals_output" value="true"/>
</section>
<output name="all_results">
<assert_contents>
<has_text text="feature"/>
<has_n_lines n="8092"/>
<has_n_columns n="9"/>
</assert_contents>
</output>
<output name="significant_results">
<assert_contents>
<has_text text="subject"/>
<has_n_lines n="880"/>
<has_n_columns n="9"/>
</assert_contents>
</output>
<output name="residuals">
<assert_contents>
<has_size value="670759" delta="1000"/>
</assert_contents>
</output>
<output_collection name="figures_pdfs" type="list">
<element name="heatmap.pdf" ftype="pdf">
<assert_contents>
<has_size value="7900" delta="1000" />
</assert_contents>
</element>
</output_collection>
</test>
</tests>
<help><![CDATA[
@HELP_HEADER@
Expand Down Expand Up @@ -347,6 +471,9 @@ Output
- It only includes associations with q-values <= to the threshold.
- Data frame with residuals for each feature (R data file)
- This file contains a data frame with residuals for each feature.

Correction methods to compute the q-value : https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/p.adjust
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is called for the correction, why can it only take BH / BY ? Or is this fixed now ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check this link.
https://forum.biobakery.org/t/invalid-p-value-correction-options/3321

It seems there was some issue from their end. So I believe if we update the version, it should work but I am not sure if they fixed the bug. Any way to check that ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK good to know, then the wrapper is fine as is, but you can remove this text from the help section (it will confuse the user since the option is not there); and put it back when we update the tool :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will update that


2- Visualization output files
- Heatmap of the significant associations (PDF file)
- This file contains a heatmap of the significant associations.
Expand Down
2 changes: 1 addition & 1 deletion tools/maaslin2/macros.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@
<yield/>
</requirements>
</xml>
</macros>
</macros>