Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mmuphin wrapper #6584

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

Mmuphin wrapper #6584

wants to merge 33 commits into from

Conversation

renu-pal
Copy link
Contributor

New to tool wrapping , so need a bit of help in finding what is wrong with this code :)

Comment on lines 19 to 22
long_description: |
MMUPHin is a Bioconductor package implementing meta-analysis methods for microbial community profiles. It has interfaces for: a) covariate-controlled batch and study effect adjustment, b) meta-analytic differential abundance testing, and meta-analytic discovery of c) discrete (cluster-based) or d) continuous unsupervised population structure.

Overall, MMUPHin enables the normalization and combination of multiple microbial community studies. It can then help in identifying microbes, genes, or pathways that are differential with respect to combined phenotypes. Finally, it can find clusters or gradients of sample types that reproduce consistently among studies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long_description is redundant here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long_description was duplicated in earlier code. It is completely removed now in 801cc88

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood and removed the whole thing but made the correction later: 7fafc59

Copy link
Contributor

@SaimMomin12 SaimMomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renu-pal Thanks for the contribution. Some of the comments inline.

Please also reduce the test-data size for this file: tools/mmuphin/test-data/CRC_abd.tsv (Ideally, the file needs to be below 1M)

tools/mmuphin/macros.xml Outdated Show resolved Hide resolved
tools/mmuphin/macros.xml Outdated Show resolved Hide resolved
tools/mmuphin/macros.xml Outdated Show resolved Hide resolved
tools/mmuphin/macros.xml Outdated Show resolved Hide resolved
tools/mmuphin/macros.xml Show resolved Hide resolved
library(ggplot2)
library(readr)

source(adjust_batch.R)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also need to add adjust_batch.R script to your test-data and source that here

tools/mmuphin/mmuphin.xml Outdated Show resolved Hide resolved
@paulzierep
Copy link
Contributor

paulzierep commented Nov 28, 2024

Fixed with: renu-pal#4
Main fix is:

  • the imports were not needed, mmuphin has what we need
  • parsing the needed column variable via awk as in maaslin2

@SaimMomin12 there is no need to add R files to the wrapper and source them, the package provides this functions

Copy link
Contributor

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening can we merge it ?

@paulzierep
Copy link
Contributor

@renu-pal can you convert this to ready for review, I think we can merge it !

@renu-pal renu-pal marked this pull request as ready for review January 7, 2025 14:35
tools/mmuphin/mmuphin.xml Outdated Show resolved Hide resolved
#end for
#set idx_for_awk = ','.join(idx)

export covariates=`awk -v OFS=',' -F"\t" 'NR == 1 { print $idx_for_awk}' '$input_metadata'` &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not doing this in R directly? I think passing the variables via ENV variables is not very clean. I would prefer to pass them to the script via files or even better do the awk filtering directly in R.

tools/mmuphin/mmuphin.xml Outdated Show resolved Hide resolved
<command detect_errors="exit_code"><![CDATA[

## get the batch name
export batch=`awk -v idx='$batch_input' -F"\t" 'NR == 1 { print \$idx }' '$input_metadata'` &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my next comment. Can we do this filtering in the R script directly?

@renu-pal
Copy link
Contributor Author

renu-pal commented Jan 13, 2025

Hi @bgruening , I have made the necessary changes, let me know if this works
c1b167c

tools/mmuphin/macros.xml Outdated Show resolved Hide resolved
tools/mmuphin/mmuphin.xml Outdated Show resolved Hide resolved
tools/mmuphin/mmuphin.xml Outdated Show resolved Hide resolved
tools/mmuphin/mmuphin.xml Outdated Show resolved Hide resolved
tools/mmuphin/mmuphin.xml Outdated Show resolved Hide resolved
Comment on lines 99 to 104

<!-- <output name="output">
<assert_contents>
<has_size value="150053" delta="1000" />
</assert_contents>
</output> -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was to be removed later but I missed it. Removed it now. Thanks for pointing out :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening anything else you want me to look into?

<param argument="pseudo_count" type="float" optional="true" label="Pseudo_count" help="Pseudo count to add feature_abd before the methods' log transformation.Default to NULL, in which case will be set to half of minimal non-zero values in feature_abd"/>
<param argument="conv" type="float" value="0.0001" optional="true" label="Convergence threshold" help="Convergence threshold for the method's iterative algorithm for shrinking batch effect parameters"/>
<param argument="maxit" type="float" value="1000" optional="true" label="Maximum number of iterations" help="Maximum number of iterations allowed for the method's iterative algorithm. Default to 1000"/>
<param argument="verbose" type="boolean" truevalue="verbose TRUE" falsevalue="verbose FALSE" checked="true" label="Print verbose information"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually don't expose those parameters to the user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening ,so should I remove them ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and set a useful default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bgruening , I have the made required changes . Does this work ?
19a33c0

<inputs>
<param name="input_data" type="data" format="tabular" label="Data (or features) file"/>
<param name="input_metadata" type="data" format="tabular" label="Metadata file"/>
<param argument="batch_input" type="data_column" data_ref="input_metadata" use_header_names="true" label="batch" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please improve all labels and help text. They are not very user-friendly IMHO.

How does a metadata file needs to look like? Or the feature file? "batch"? Maybe "the column in which the batch identifier is species"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening does this work?
5109992

Comment on lines 57 to 59



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please clean those things up? thanks

tools/mmuphin/mmuphin.xml Outdated Show resolved Hide resolved
Copy link
Contributor

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the option for no covariates

<param argument="covariates_input" type="data_column" data_ref="input_metadata" use_header_names="true" multiple="true" optional="true" label="Covariates Column(s)" help="Select the column(s) in your metadata file that contain covariates (e.g., age, gender, or environmental factors) to be included in the batch correction process."/>
<section name="additional_options" title="Additional Options" expanded="true">
<param argument="zero_inflation" type="boolean" truevalue="zero_inflation TRUE" falsevalue="zero_inflation FALSE" checked="true" label="Run zero-inflated model"/>
<param argument="pseudo_count" type="hidden" value="NULL" label="Pseudo_count" help="Pseudo count to add feature_abd before the methods' log transformation.Default to NULL, in which case will be set to half of minimal non-zero values in feature_abd" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @renu-pal; I would not hide expert parameters, but put them in an extra block as you did ! What Björn meant is to not show to the user is the verbose part. So technical parts hide, expert parts allow.

</configfiles>

<inputs>
<param name="input_data" type="data" format="tabular" label="Abundance (or features) file" help="TSV file where columns represent samples and rows include abundance values for a microbial taxa across those samples"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more generic here, this tool is not limited to microbial taxa, it can also be genes, SNPs, path ways...

tools/mmuphin/mmuphin.xml Outdated Show resolved Hide resolved

batch <- colnames(meta_data[$batch_input-1])
cat("Batch Name:", batch, "\n")
covariates_val <- list($covariates_input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covariates can be empty (only correct batch). To handel the python None in R, this should work:

        # Ensure covariates_input is checked for NULL
        if (!exists("$covariates_input") || is.null($covariates_input)) {
            covariates_val <- list()  # Assign an empty list if input is NULL or does not exist
        } else {
            covariates_val <- list($covariates_input)
        }

        covariates <- c()

        # Process covariates only if they exist
        if (length(covariates_val) > 0) {     
            for (i in covariates_val) {
                covariates <- c(covariates, colnames(meta_data[i - 1]))
            }
            cat("Covariates Names:", covariates, "\n")
        } else {
            cat("No covariates provided.\n")
        }
        

Copy link
Contributor Author

@renu-pal renu-pal Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulzierep this code does not seem to work for some reason when I set covariates input to null. I keep on getting the error : Object 'None' not found.
PS: I removed --- !exists("$covariates_input") from if since it was executing if condition even when covariate was not empty. I believe it is used to just test whether covariate_input exists in the current env , which kept turning false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have pushed the other updates you asked for below, and working on how to fix this null issue.
a139a38

</outputs>

<tests>
<test expect_num_outputs="2">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test without covariates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants