Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: implement tool to add rank names to phyloseq object #6625

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions tools/phyloseq/add_rank_names_to_phyloseq.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#!/usr/bin/env Rscript

suppressPackageStartupMessages(library("optparse"))
suppressPackageStartupMessages(library("phyloseq"))
suppressPackageStartupMessages(library("tidyverse"))

# Option parsing
option_list <- list(
make_option(c("--input"),
action = "store", dest = "input",
help = "Input file containing a phyloseq object"
),
make_option(c("--output"), action = "store", dest = "output", help = "Output file for the updated phyloseq object")
)

parser <- OptionParser(usage = "%prog [options] file", option_list = option_list)
args <- parse_args(parser, positional_arguments = TRUE)
opt <- args$options

cat("Input file: ", opt$input, "\n")
cat("Output file: ", opt$output, "\n")

# Lade das Phyloseq-Objekt
physeq <- readRDS(opt$input)

# Überprüfen, ob das Phyloseq-Objekt erfolgreich geladen wurde
if (is.null(physeq)) {
stop("Error: Failed to load the Phyloseq object. Check the input file.")
}

cat("Phyloseq object successfully loaded.\n")
cat("Class of loaded object: ", class(physeq), "\n")

# Überprüfe das aktuelle Taxonomy Table
cat("Current tax_table:\n")
print(tax_table(physeq))

# Zuordnung der Rangnamen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess English comments would be better.

rank_names <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species") # Anpassen je nach Bedarf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be more ranks, e.g.: super kingdom, subspecies, strain.
Could you make this more generic, e.g.: the use provides a comma separated list of ranks.
You could make "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species" the default.
If the number of ranks does not match the column numbers of the tax_table the tool fails.


# Füge eine leere Spalte für Species hinzu, falls sie fehlt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not needed, if the logic I proposed before is used

if (ncol(tax_table(physeq)) == 6) {
tax_table(physeq) <- cbind(tax_table(physeq), Species = NA)
}

# Überprüfen, ob die Anzahl der Spalten mit der Anzahl der Rangnamen übereinstimmt
if (ncol(tax_table(physeq)) != length(rank_names)) {
stop("Error: Number of columns in tax_table does not match the length of rank_names.")
}

# Setzen der Spaltennamen
colnames(tax_table(physeq)) <- rank_names

# Bestätige die Änderungen
cat("Updated tax_table:\n")
print(tax_table(physeq))

# Extrahiere das erste Zeichen aus dem ersten Eintrag des tax_table (z.B. Kingdom)
first_char <- substr(tax_table(physeq)[1, 1], 1, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is that for ?

cat("Extracted first character: ", first_char, "\n")

# Speichere das aktualisierte Phyloseq-Objekt
saveRDS(physeq, file = opt$output, compress = TRUE)
cat("Updated Phyloseq object saved to: ", opt$output, "\n")

# Gib das erste Zeichen zurück (es wird später in der XML-Datei verwendet)
cat(first_char, "\n")
36 changes: 36 additions & 0 deletions tools/phyloseq/add_rank_names_to_phyloseq.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
<tool id="add_rank_names_to_phyloseq" name="Add Rank Names to Phyloseq Object" version="1.0">
MaraBesemer marked this conversation as resolved.
Show resolved Hide resolved
<description>Add taxonomy rank names to a phyloseq object</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="bio_tools"/>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
Rscript '${__tool_directory__}/add_rank_names_to_phyloseq.R' --input '$input' --output '$output'
]]></command>

<inputs>
<expand macro="phyloseq_input"/>
</inputs>

<outputs>
<data name="output" format="phyloseq"/>
</outputs>

<tests>
<test>
<param name="input" value="output.phyloseq" ftype="phyloseq"/>
<output name="output" ftype="phyloseq">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could compare the new phylosq object to expected output, like:

<output name="ampvis" value="AalborgWWTPs-subset_samples.rds" ftype="ampvis2" compare="sim_size"/>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better just use an has_size assertion and remove value="AalborgWWTPs-subset_samples.rds".

<!-- Check if the output contains the first character -->
<assert_contents>
<has_text text="B"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please be a bit more strict in this test?

</assert_contents>
</output>
</test>
</tests>

<help>
This tool adds taxonomy rank names to a phyloseq object in the `tax_table` slot.
</help>
<expand macro="citations"/>
</tool>
Loading