Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add spacexr #6212

Draft
wants to merge 46 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
8e3eb4a
started spacexr
nilchia Aug 2, 2024
a8c1906
started adding config file and help
nilchia Aug 5, 2024
460e8f2
params for reference
nilchia Aug 5, 2024
88fd911
added st param
nilchia Aug 5, 2024
54ff8da
correct format
nilchia Aug 5, 2024
8f1fb30
finished config file
nilchia Aug 5, 2024
3287e68
update rctd
nilchia Aug 6, 2024
ea3ad72
corrected tool name
nilchia Aug 6, 2024
74481cf
add shed.yml
nilchia Aug 6, 2024
fac4e1c
Update tools/spacexr/macros.xml
nilchia Aug 25, 2024
3fca0f8
Merge branch 'galaxyproject:main' into spacexr
nilchia Sep 15, 2024
f848cff
Merge branch 'galaxyproject:main' into spacexr
nilchia Sep 27, 2024
26d2d47
add test-data
nilchia Aug 6, 2024
e7d0fa3
add multi
nilchia Sep 27, 2024
6433511
add output
nilchia Sep 27, 2024
1adcbed
update output
nilchia Oct 8, 2024
16bb7e0
add test (is failing)
nilchia Oct 8, 2024
270f26d
first test pass
nilchia Oct 9, 2024
7ba98dd
test for full and multi
nilchia Oct 9, 2024
dae4129
correct categories
nilchia Oct 9, 2024
544dd98
add doi of CSIDE
nilchia Oct 11, 2024
1e0bad9
correct rds output of multi
nilchia Oct 11, 2024
8c2c3a4
started CSIDE input
nilchia Oct 14, 2024
f8dc7f7
add nonparametric script
nilchia Oct 15, 2024
8a7f38c
add pathologic DE input
nilchia Oct 15, 2024
4b1c4ec
cell2cell script
nilchia Oct 15, 2024
ea4ae35
clean macros
nilchia Oct 15, 2024
7b30145
add XY and custom part1
nilchia Oct 18, 2024
1bd854d
custom input and command
nilchia Oct 18, 2024
944acb2
output
nilchia Oct 18, 2024
84ceba2
correct rds output rctd
nilchia Oct 18, 2024
55cf2ca
better test-data for rctd
nilchia Oct 18, 2024
5fb6a97
first test cside
nilchia Oct 18, 2024
73624f3
fix some lint error
nilchia Oct 19, 2024
591ffd1
better label and name
nilchia Oct 21, 2024
df5391f
CDATA
nilchia Oct 22, 2024
3cdc55d
add env varaible, validator for text input, and update macro
nilchia Oct 22, 2024
8e448d5
select box for output
nilchia Oct 22, 2024
642ed70
fix some cheetah errors
nilchia Oct 22, 2024
db796c2
calling env variables is R
nilchia Oct 22, 2024
9857527
better config indentation
nilchia Oct 22, 2024
b83228d
tring to fix the problem with cheetah variables
nilchia Oct 22, 2024
30754b5
cheetah in config
nilchia Oct 23, 2024
7c39d4b
update cside.xml
nilchia Oct 25, 2024
9ad3ea3
update macros and xml
nilchia Oct 29, 2024
d0f4bef
Merge branch 'galaxyproject:main' into spacexr
nilchia Dec 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions tools/spacexr/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: spacexr
owner: iuc
description: Cell type identification and cell type-specific differential expression in spatial transcriptomics
homepage_url: https://github.com/dmcable/spacexr/tree/master
long_description: Computational methods for cell type identification (RCTD) and differential expression (C-SIDE) on spatial transcriptomics datasets
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/spacexr
categories:
- Spatial Analysis
- Transcriptomics
suite:
name: "suite_spacexr"
description: "A suite of Galaxy tools designed to work with the spacexr-tools collection."
type: repository_suite_definition
31 changes: 31 additions & 0 deletions tools/spacexr/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<macros>
<token name="@TOOL_VERSION@">2.2.1</token>
<token name="@VERSION_SUFFIX@">0</token>
<token name="@PROFILE@">24.1</token>
nilchia marked this conversation as resolved.
Show resolved Hide resolved
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">r-spacexr</requirement>
<yield/>
</requirements>
</xml>
<xml name="edam">
<edam_topics>
<edam_topic>topic_4019</edam_topic>
<edam_topic>topic_4028</edam_topic>
<edam_topic>topic_3308</edam_topic>
</edam_topics>
<edam_operations>
<edam_operation>operation_3223</edam_operation>
</edam_operations>
</xml>
<xml name="citations">
<citations>
<citation type="doi">10.1038/s41587-021-00830-w</citation>
<citation type="bibtex">@Manual{github,
title = {SpatialeXpressionR: Cell type identification and cell type-specific differential expression in spatial transcriptomics.},
author = {Dylan Cable},
url = {https://github.com/dmcable/spacexr}}
</citation>
</citations>
</xml>
</macros>
26 changes: 26 additions & 0 deletions tools/spacexr/spacexr_cside.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<tool id="spacexr_cside" name="CSIDE" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>Cell type-specific differential expression with C-SIDE</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="edam"/>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[

]]></command>
<inputs>

</inputs>
<outputs>

</outputs>
<tests>

</tests>
<help><![CDATA[


]]></help>

<expand macro="citations" />
</tool>
258 changes: 258 additions & 0 deletions tools/spacexr/spacexr_rctd.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
<tool id="spacexr_rctd" name="RCTD" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>Cell type identification with RCTD</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="edam"/>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
mkdir -p 'results'
mkdir -p 'figures'

]]></command>
<configfiles>
<configfile name="rctd_script">
# rctd script
# This file is used to specify the parameters for the rctd from spacexr package

# Load the spacexr library
library('spacexr')

### Load the scRNA-seq data
counts &lt;- read.table(file = '${sc_count}', row.names = 1, sep = '\t')
metadata &lt;- read.table(file = '${metadata}', sep = '\t')

# create cell_types named list
cell_types &lt;- metadata[,"annotation"]; names(cell_types) &lt;- metadata[,"barcode"]

# convert to factor data type
cell_types &lt;- as.factor(cell_types)

#if str($sc_umi_input) == 'True':
# create nUMI named list
nUMI &lt;- meta_data[, "nUMI"]; names(nUMI) &lt;- meta_data[,"barcode"]
#end if

# Create reference object
reference &lt;- Reference(
counts = counts,
cell_types = cell_types,
#if str($sc_umi_input) == 'True':
nUMI = nUMI,
#end if
n_max_cells = $n_max_cells,
min_UMI = $min_UMI
)

### Load spatial data
counts &lt;- read.table(file = '${st_count}', row.names = 1, sep = '\t')
coords &lt;- read.table(file = '${coord}', row.names = 1, sep = '\t')

nUMI &lt;- colSums(counts) # In tutorials it is always the sum of counts

# Create SpatialRNA object
puck &lt;- SpatialRNA(
coords = coords,
counts = counts,
nUMI= nUMI,
)

# provide a basic plot of the nUMI of each pixel on the plot:
pdf('figures/nUMI_plot.pdf')
plot_puck_continuous(
puck = puck,
barcodes = colnames(puck@counts),
plot_val = puck@nUMI,
ylimit = c(0,round(quantile(puck@nUMI,0.9))),
title ='plot of nUMI')
dev.off()

### Run the RCTD
myRCTD &lt;- create.RCTD(
spatialRNA = puck,
reference = reference,
gene_cutoff = $gene_cutoff,
fc_cutoff = $fc_cutoff,
gene_cutoff_reg = $gene_cutoff_reg,
fc_cutoff_reg = $fc_cutoff_reg,
UMI_min = $UMI_min,
UMI_max = $UMI_max,
counts_MIN = $counts_MIN,
UMI_min_sigma = $UMI_min_sigma,
class_df = NULL, # set as default
CELL_MIN_INSTANCE = $CELL_MIN_INSTANCE,
#if str($cell_type_names) != "":
cell_type_names = $cell_type_names,
#end if
MAX_MULTI_TYPES = $MAX_MULTI_TYPES,
keep_reference = F, # set as default
cell_type_profiles = NULL, # set as default
CONFIDENCE_THRESHOLD = $CONFIDENCE_THRESHOLD,
DOUBLET_THRESHOLD = $DOUBLET_THRESHOLD,)

myRCTD &lt;- run.RCTD(
myRCTD,
doublet_mode = $doublet_mode,)


# save results
#if str($doublet_mode) == 'doublet':
results &lt;- myRCTD@results

# save the data frame
result_df &lt;- results$results_df
write.table(result_df, file = 'results/doublet_results_df.tabular', sep = '\t', quote = F, row.names = T)

# RCTD plots
# normalize the cell type proportions to sum to 1.
norm_weights &lt;- normalize_weights(results$weights)
cell_type_names &lt;- myRCTD@cell_type_info$info[[2]] #list of cell type names
spatialRNA &lt;- myRCTD@spatialRNA

resultsdir &lt;- 'figures'

# make the plots
# Plots the confident weights for each cell type as in full_mode (saved as 'figures/cell_type_weights.pdf')
plot_weights(cell_type_names, spatialRNA, resultsdir, norm_weights)

# Plots all weights for each cell type as in full_mode. (saved as 'figures/cell_type_weights_unthreshold.pdf')
plot_weights_unthreshold(cell_type_names, spatialRNA, resultsdir, norm_weights)

# Plots the weights for each cell type as in doublet_mode. (saved as 'figures/cell_type_weights_doublets.pdf')
plot_weights_doublet(cell_type_names, spatialRNA, resultsdir, results$weights_doublet,results$results_df)

# Plots the number of confident pixels of each cell type in 'full_mode'. (saved as 'figures/cell_type_occur.pdf')
plot_cond_occur(cell_type_names, resultsdir, norm_weights, spatialRNA)

# makes a map of all cell types, (saved as 'results/all_cell_types.pdf')
plot_all_cell_types(results$results_df, spatialRNA@coords, cell_type_names, resultsdir)

# doublets
#obtain a dataframe of only doublets
doublets &lt;- results$results_df[results$results_df$spot_class == "doublet_certain",]

# Plots all doublets in space (saved as 'results/all_doublets.pdf')
plot_doublets(spatialRNA, doublets, resultsdir, cell_type_names)

# Plots all doublets in space for each cell type (saved as 'results/all_doublets_type.pdf')
plot_doublets_type(spatialRNA, doublets, resultsdir, cell_type_names)

# a table of frequency of doublet pairs
doub_occur &lt;- table(doublets$second_type, doublets$first_type)
# Plots a stacked bar plot of doublet ocurrences (saved as 'results/doublet_stacked_bar.pdf')
plot_doub_occur_stack(doub_occur, resultsdir, cell_type_names)

# save rds file
#if str($rds) == 'True':
saveRDS(myRCTD, file = 'results/rctd_results_doublet.rds')
#end if
#end if

#if str($doublet_mode) == 'full':
results &lt;- myRCTD@results

# RCTD plots
# normalize the cell type proportions to sum to 1.
norm_weights &lt;- normalize_weights(results$weights)
cell_type_names &lt;- myRCTD@cell_type_info$info[[2]] #list of cell type names
spatialRNA &lt;- myRCTD@spatialRNA

resultsdir &lt;- 'figures'

# make the plots
# Plots the confident weights for each cell type as in full_mode (saved as 'figures/cell_type_weights.pdf')
plot_weights(cell_type_names, spatialRNA, resultsdir, norm_weights)

# Plots all weights for each cell type as in full_mode. (saved as 'figures/cell_type_weights_unthreshold.pdf')
plot_weights_unthreshold(cell_type_names, spatialRNA, resultsdir, norm_weights)

# Plots the number of confident pixels of each cell type in 'full_mode'. (saved as 'figures/cell_type_occur.pdf')
plot_cond_occur(cell_type_names, resultsdir, norm_weights, spatialRNA)

# save rds file
#if str($rds) == 'True':
saveRDS(myRCTD, file = 'results/rctd_results_full.rds')
#end if
#end if
</configfile>
</configfiles>
<inputs>
<param name="sc_count" type="data" format="tabular" label="Single-cell count matrix" help="A matrix representing Digital Gene Expression (DGE). Rownames should be genes and colnames represent barcodes/cell names" />
<param name="metadata" type="data" format="tabular" label="Metadata" help="single-cell annotation file with columns: barcode, annotation, and nUMI(optional)" />
<param name="sc_umi_input" type="boolean" truevalue="True" falsevalue="False" checked="false" label="nUMI" help="Does your single-cell metadata have nUMI column?" />
<param name="st_count" type="data" format="tabular" label="Spatial count matrix" help="A matrix representing Digital Gene Expression (DGE). Rownames should be genes and colnames represent barcodes/pixel names" />
<param name="coord" type="data" format="tabular" label="Spatial coordinates" help="A numeric table representing the spatial pixel locations. Rownames are barcodes/pixel names, and there should be two columns for x and for y" />
<param name="doublet_mode" type="select" label="Doublet mode">
<option value="doublet">doublet</option>
<option value="full">full</option>
</param>
<section name="advanced_param" title="Advanced parameters">
<section name="reference_param" title="Reference object parameters">
<param name="n_max_cells" type="integer" min="0" value="10000" label="n_max_cells" help="Maximum number of cells per cell type. Will downsample if this number is exceeded." />
<param name="min_UMI" type="integer" min="0" value="100" label="min_UMI" help="Minimum UMI count for cells to be included in the reference." />
</section>
<section name="st_param" title="SpatialRNA object parameters">
<param name="gene_cutoff" type="float" min="0" value="0.000125" label="gene_cutoff" help="Minimum normalized gene expression for genes to be included in the platform effect normalization step." />
<param name="fc_cutoff" type="float" min="0" value="0.5" label="fc_cutoff" help="Minimum log-fold-change (across cell types) for genes to be included in the platform effect normalization step." />
<param name="gene_cutoff_reg" type="float" min="0" value="0.0002" label="gene_cutoff_reg" help="Minimum normalized gene expression for genes to be included in the RCTD step." />
<param name="fc_cutoff_reg" type="float" min="0" value="0.75" label="fc_cutoff_reg" help="Minimum log-fold-change (across cell types) for genes to be included in the RCTD step." />
<param name="UMI_min" type="integer" min="0" value="100" label="UMI_min" help="Minimum UMI per pixel included in the analysis" />
<param name="UMI_max" type="integer" min="0" value="20000000" label="UMI_max" help="Maximum UMI per pixel included in the analysis" />
<param name="counts_MIN" type="integer" min="0" value="10" label="counts_MIN" help="Minimum total counts per pixel of genes used in the analysis." />
<param name="UMI_min_sigma" type="integer" min="0" value="300" label="UMI_min_sigma" help="Minimum UMI per pixel for the choose_sigma_c function" />
<param name="CELL_MIN_INSTANCE" type="integer" min="0" value="25" label="CELL_MIN_INSTANCE" help="Minimum number of cells required per cell type." />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its only a recommendation, but to stay consistent I would name all variables small letters.

<param name="cell_type_names" type="text" label="cell_type_names" help="A list of cell types to be included from the reference. If NULL, uses all cell types" />
<param name="MAX_MULTI_TYPES" type="integer" min="0" value="4" label="MAX_MULTI_TYPES" help="Max number of cell types per pixel." />
<param name="CONFIDENCE_THRESHOLD" type="integer" min="0" value="5" label="CONFIDENCE_THRESHOLD" help="The minimum change in likelihood (compared to other cell types) necessary to determine a cell type identity with confidence." />
<param name="DOUBLET_THRESHOLD" type="integer" min="0" value="20" label="DOUBLET_THRESHOLD" help="The penalty weight of predicting a doublet instead of a singlet for a pixel." />
</section>
</section>
<section name="output" title="Output options">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple select box

<param name="rds" type="boolean" truevalue="True" falsevalue="False" checked="false" label="save RDS file?"/>
</section>











</inputs>
<outputs>

</outputs>
<tests>

</tests>
<help><![CDATA[

Robust Cell Type Decomposition, or **RCTD**, is an statistical method for learning cell types from spatial transcriptomics data.

**Reference object**:

To create the single-cell reference object, the following inputs are required:
* counts: A matrix representing Digital Gene Expression (DGE). **Rownames** should be genes and **colnames** represent barcodes/cell names. Counts should be **untransformed** count-level data.
* cell_types: A named (by cell barcode) factor of cell type for each cell.
* nUMI: Optional, a named (by cell barcode) list of total counts or UMI&apos;s appearing at each cell. If not provided, nUMI will be assumed to be the total counts appearing on each cell.

**SpatialRNA object**:

To create the spatialRNA object, the following inputs are required:
* coords: A numeric table representing the spatial pixel locations. **Rownames** are barcodes/pixel names, and there should be two columns for **x** and for **y**.
* counts: A matrix representing Digital Gene Expression (DGE). **Rownames** should be genes and **colnames** represent barcodes/pixel names. Counts should be **untransformed** count-level data.

-----

RCTD has **three** modes:
* **doublet mode**, which assigns 1-2 cell types per spot and is recommended for technologies with high spatial resolution such as Slide-seq and MERFISH.
* **full mode**, which assigns any number of cell types per spot and is recommended for technologies with poor spatial resolution such as 100-micron resolution Visium.
* **multi mode**, an extension of doublet mode that can discover more than two cell types per spot (3-4 cell types) as an alternative option to full mode.

]]></help>

<expand macro="citations" />
</tool>
Loading