-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mmuphin wrapper #6584
base: main
Are you sure you want to change the base?
Mmuphin wrapper #6584
Conversation
41ef84e
to
52c3ce8
Compare
52c3ce8
to
ef26199
Compare
tools/mmuphin/.shed.yml
Outdated
long_description: | | ||
MMUPHin is a Bioconductor package implementing meta-analysis methods for microbial community profiles. It has interfaces for: a) covariate-controlled batch and study effect adjustment, b) meta-analytic differential abundance testing, and meta-analytic discovery of c) discrete (cluster-based) or d) continuous unsupervised population structure. | ||
|
||
Overall, MMUPHin enables the normalization and combination of multiple microbial community studies. It can then help in identifying microbes, genes, or pathways that are differential with respect to combined phenotypes. Finally, it can find clusters or gradients of sample types that reproduce consistently among studies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long_description is redundant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long_description
was duplicated in earlier code. It is completely removed now in 801cc88
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood and removed the whole thing but made the correction later: 7fafc59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@renu-pal Thanks for the contribution. Some of the comments inline.
Please also reduce the test-data size for this file: tools/mmuphin/test-data/CRC_abd.tsv (Ideally, the file needs to be below 1M)
tools/mmuphin/mmuphin.xml
Outdated
library(ggplot2) | ||
library(readr) | ||
|
||
source(adjust_batch.R) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will also need to add adjust_batch.R script to your test-data and source that here
Co-authored-by: Saim Momin <[email protected]>
Co-authored-by: Saim Momin <[email protected]>
Co-authored-by: Saim Momin <[email protected]>
Co-authored-by: Saim Momin <[email protected]>
Co-authored-by: Saim Momin <[email protected]>
Co-authored-by: Saim Momin <[email protected]>
Fixed with: renu-pal#4
@SaimMomin12 there is no need to add R files to the wrapper and source them, the package provides this functions |
Mmuphin wrapper
fixed tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgruening can we merge it ?
@renu-pal can you convert this to ready for review, I think we can merge it ! |
tools/mmuphin/mmuphin.xml
Outdated
#end for | ||
#set idx_for_awk = ','.join(idx) | ||
|
||
export covariates=`awk -v OFS=',' -F"\t" 'NR == 1 { print $idx_for_awk}' '$input_metadata'` && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you not doing this in R directly? I think passing the variables via ENV variables is not very clean. I would prefer to pass them to the script via files or even better do the awk filtering directly in R.
tools/mmuphin/mmuphin.xml
Outdated
<command detect_errors="exit_code"><![CDATA[ | ||
|
||
## get the batch name | ||
export batch=`awk -v idx='$batch_input' -F"\t" 'NR == 1 { print \$idx }' '$input_metadata'` && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my next comment. Can we do this filtering in the R script directly?
Co-authored-by: Björn Grüning <[email protected]>
Hi @bgruening , I have made the necessary changes, let me know if this works |
tools/mmuphin/mmuphin.xml
Outdated
|
||
<!-- <output name="output"> | ||
<assert_contents> | ||
<has_size value="150053" delta="1000" /> | ||
</assert_contents> | ||
</output> --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was to be removed later but I missed it. Removed it now. Thanks for pointing out :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgruening anything else you want me to look into?
tools/mmuphin/mmuphin.xml
Outdated
<param argument="pseudo_count" type="float" optional="true" label="Pseudo_count" help="Pseudo count to add feature_abd before the methods' log transformation.Default to NULL, in which case will be set to half of minimal non-zero values in feature_abd"/> | ||
<param argument="conv" type="float" value="0.0001" optional="true" label="Convergence threshold" help="Convergence threshold for the method's iterative algorithm for shrinking batch effect parameters"/> | ||
<param argument="maxit" type="float" value="1000" optional="true" label="Maximum number of iterations" help="Maximum number of iterations allowed for the method's iterative algorithm. Default to 1000"/> | ||
<param argument="verbose" type="boolean" truevalue="verbose TRUE" falsevalue="verbose FALSE" checked="true" label="Print verbose information"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually don't expose those parameters to the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgruening ,so should I remove them ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and set a useful default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bgruening , I have the made required changes . Does this work ?
19a33c0
tools/mmuphin/mmuphin.xml
Outdated
<inputs> | ||
<param name="input_data" type="data" format="tabular" label="Data (or features) file"/> | ||
<param name="input_metadata" type="data" format="tabular" label="Metadata file"/> | ||
<param argument="batch_input" type="data_column" data_ref="input_metadata" use_header_names="true" label="batch" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please improve all labels and help text. They are not very user-friendly IMHO.
How does a metadata file needs to look like? Or the feature file? "batch"? Maybe "the column in which the batch identifier is species"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgruening does this work?
5109992
tools/mmuphin/mmuphin.xml
Outdated
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please clean those things up? thanks
Co-authored-by: Björn Grüning <[email protected]>
Co-authored-by: Björn Grüning <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the option for no covariates
tools/mmuphin/mmuphin.xml
Outdated
<param argument="covariates_input" type="data_column" data_ref="input_metadata" use_header_names="true" multiple="true" optional="true" label="Covariates Column(s)" help="Select the column(s) in your metadata file that contain covariates (e.g., age, gender, or environmental factors) to be included in the batch correction process."/> | ||
<section name="additional_options" title="Additional Options" expanded="true"> | ||
<param argument="zero_inflation" type="boolean" truevalue="zero_inflation TRUE" falsevalue="zero_inflation FALSE" checked="true" label="Run zero-inflated model"/> | ||
<param argument="pseudo_count" type="hidden" value="NULL" label="Pseudo_count" help="Pseudo count to add feature_abd before the methods' log transformation.Default to NULL, in which case will be set to half of minimal non-zero values in feature_abd" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @renu-pal; I would not hide expert parameters, but put them in an extra block as you did ! What Björn meant is to not show to the user is the verbose
part. So technical parts hide, expert parts allow.
tools/mmuphin/mmuphin.xml
Outdated
</configfiles> | ||
|
||
<inputs> | ||
<param name="input_data" type="data" format="tabular" label="Abundance (or features) file" help="TSV file where columns represent samples and rows include abundance values for a microbial taxa across those samples"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more generic here, this tool is not limited to microbial taxa, it can also be genes, SNPs, path ways...
tools/mmuphin/mmuphin.xml
Outdated
|
||
batch <- colnames(meta_data[$batch_input-1]) | ||
cat("Batch Name:", batch, "\n") | ||
covariates_val <- list($covariates_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covariates can be empty (only correct batch). To handel the python None in R, this should work:
# Ensure covariates_input is checked for NULL
if (!exists("$covariates_input") || is.null($covariates_input)) {
covariates_val <- list() # Assign an empty list if input is NULL or does not exist
} else {
covariates_val <- list($covariates_input)
}
covariates <- c()
# Process covariates only if they exist
if (length(covariates_val) > 0) {
for (i in covariates_val) {
covariates <- c(covariates, colnames(meta_data[i - 1]))
}
cat("Covariates Names:", covariates, "\n")
} else {
cat("No covariates provided.\n")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulzierep this code does not seem to work for some reason when I set covariates input to null. I keep on getting the error : Object 'None' not found.
PS: I removed --- !exists("$covariates_input") from if since it was executing if condition even when covariate was not empty. I believe it is used to just test whether covariate_input exists in the current env , which kept turning false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have pushed the other updates you asked for below, and working on how to fix this null issue.
a139a38
</outputs> | ||
|
||
<tests> | ||
<test expect_num_outputs="2"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test without covariates
Co-authored-by: paulzierep <[email protected]>
New to tool wrapping , so need a bit of help in finding what is wrong with this code :)