-
Notifications
You must be signed in to change notification settings - Fork 0
/
interstitial_report4_cluster_attribution.Rmd
357 lines (242 loc) · 10.4 KB
/
interstitial_report4_cluster_attribution.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
---
title: "scRNAseq Interstitial cells : cluster attribution"
author: "Marion Hardy"
date: "`r Sys.Date()`"
output:
html_document:
toc: yes
theme: spacelab
highlight: monochrome
editor_options:
chunk_output_type: console
markdown:
wrap: 72
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(message = FALSE, cache = T, echo = FALSE, warning = F, cache.lazy = F)
knitr::opts_chunk$set(fig.width=6, fig.height=4)
library(tidyverse)
library(openxlsx)
library(Seurat)
library(scCustomize)
```
# Introduction
Data from our collaborator Panagiotis rds file for a
SingleCellExperiment object containing the single cell data for **the
interstitial cells** of *Hydra Vulgaris* during multiples stages of
regeneration after bisection:
<https://www.dropbox.com/scl/fi/dg76sigjnj5u6qr06xk79/sce_interstitial_Juliano.rds?rlkey=f0y3lqt0wdcwq652zisnrpf9t&dl=0>
BUT they mapped it to *Hydra Magnipapillata (105)* "Drop-seq reads from
15 libraries generated for Hydra vulgaris strain AEP were mapped to the
2.0 genome assembly of closely related Hydra vulgaris strain 105
(available at <https://research.nhgri.nih.gov/hydra/>) and processed
using the Hydra 2.0 gene models. Strain Hydra vulgaris 105 was formerly
referred to as Hydra magnipapillata 105."
The coldata of the object contain cell annotation including
- quality metrics: nFeature nCount (not MT percentage interestingly)
- batch info: either 2869 (3162 barcodes), 3113 (10352 barcodes), 3271
(13279 barcodes), 3357 (3875 barcodes)
- originating experiments (head or foot regeneration)
- experimental time points
- pseudo-axis assignment (vals.axis ranging from 0-1, increasing in
the foot-tentacle direction)
- mitotic and apoptotic signatures indices from 0 to 1
The rowdata contains gene annotation, using Entrez-gene identifiers. I
have also noticed that in the sce objects there's
- PCA, tSNE and UMAP coordinates for reduced dimensions + corrected
for batch values
- assay metafeatures hold gene_id, product, gene, is.rib.prot.gene
(T/F), HypoMarkers (T/F), ccyle (T/F), apopt (T/F) etc
I converted the sce objects 6.6gb into a seurat object 2.2 gb and
checked that all cited parameters could be found in it.
# Data loading
Which dataset set should I use? Batch regressed looks weird but might be interesting depending on how timepoints end up clustering...
Let's make a UMAP with timepoints as labels.
I am using the scaled and batch regressed data.
```{r}
# seurat objects
seurat_f = readRDS("data_output/interstitial/seurat_filtered.rds")
# 105 genome annotations
annot = read.xlsx("./data/mcbi_dataset_MH_annotated.xlsx")
annot$Symbol_updated = as.character(annot$Symbol_updated)
# set up to split the object per head or foot
footcond = c("REG_FOOT_t0","REG_FOOT_t6","REG_FOOT_t12","REG_FOOT_t24",
"REG_FOOT_t48","REG_FOOT_t96")
seurat_f$axis <- ifelse(test = seurat_f$EXP_TIME %in% footcond, yes = "Foot", no = "Head")
```
```{r}
# changing labeled time from t6 to t06 so thqt it orders properly in the plots
Idents(seurat_f) = "EXP_TIME"
seurat_f = RenameIdents(seurat_f, "REG_FOOT_t6" = "REG_FOOT_t06")
seurat_f = RenameIdents(seurat_f, "REG_HEAD_t6" = "REG_HEAD_t06")
```
# Subsetting by head/foot and timepoint for both regressed and unregressed
## Not batch regressed
```{r}
Idents(seurat_f) = "EXP_TIME"
DimPlot(seurat_f, label = FALSE)
# Creating the head and foot data subset
hfootf = subset(seurat_f, subset = EXP == "REG_FOOT")
hheadf = subset(seurat_f, subset = EXP == "REG_HEAD")
```
```{r, fig.width=30, fig.height=6}
# Plotting conditions separately
Idents(hfootf) = [email protected]$SCT_snn_res.0.025
Idents(hheadf) = [email protected]$SCT_snn_res.0.025
DimPlot(hfootf, reduction = "umap", split.by = "EXP_TIME", pt.size = 1)
DimPlot(hheadf, reduction = "umap", split.by = "EXP_TIME", pt.size = 1)
```
# Cluster attribution
## UMAP at 0.025 resolution
```{r, fig.width=8, fig.height=6}
Idents(seurat_f) = "SCT_snn_res.0.025"
DimPlot(seurat_f, label = TRUE)+
labs(title = "Resolution = 0.025")
```
## Finding markers per clusters
```{r, eval=FALSE}
# Find markers for each clusters------------------------------------------------
# With the ` FindAllMarkers()` function we are comparing
# each cluster against all other clusters to identify potential marker genes.
# The cells in each cluster are treated as replicates, and essentially a
# differential expression analysis is performed with some statistical test.
# By default, it's a wilcoxon rank sum test
# Seurat lab suggested that DE analysis on obj@assay$RNA@data, which is normalized data (not scaled). So you are right, you should NormalizeData and then FindMarkers. However, Seurat 4 (not sure which small version exactly) starts to promotes DE analysis on obj@[email protected], which is person residual. My experience is to try both and take the one that fits your goal because Seurat said both are not incorrect.
# DO NOT RUN THIS COMMAND IF YOU'VE ALREADY DONE IT ONCE, THIS CHUNK IS EVAL = FALSE
markers = FindAllMarkers(object = seurat_f,
logfc.threshold = 0.5,
assay = "SCT",
slot = "scale.data")
```
```{r}
markers = read.csv("./data_output/interstitial/markers_diff_per_cluster.csv")%>%
filter(p_val_adj < 0.05)
# write.csv(markers, "./data_output/interstitial/markers_diff_per_cluster.csv")
markers$gene = as.character(markers$gene)
```
### Markers that seem specific to cluster identity.
```{r}
head(markers)
```
```{r, , fig.width=15, fig.height=10}
target = markers %>%
group_by(cluster) %>%
top_n(n = 5, wt = avg_diff)
target =
target %>%
filter(!duplicated(gene))
target = target[!duplicated(target$gene),]
# Sooo turns out i'm gonna have to manually give these important genes symbols
# so they become readable on a graph
dp = DotPlot_scCustom(seurat_f, features = target$gene,
colors_use = viridis_plasma_dark_high,
x_lab_rotate = T)+
labs(title = "Dotplot for top 5 differentially expressed genes per cluster")
dp
```
### Theoretical markers
```{r, warning=TRUE, fig.width=5, fig.height=4}
theo = read.xlsx("./data/mcbi_dataset_MH_annotated.xlsx", sheet = "celltype_markers")
dp = DotPlot_scCustom(seurat_f, features = theo$Symbol,
colors_use = viridis_plasma_dark_high,
x_lab_rotate = T)+
labs(title = "Dotplot for theoretical markers")
dp
```
### Combined
```{r, fig.height=10, fig.width=8}
p1 = Clustered_DotPlot(seurat_f, features = target$gene, k = 12)
# print(p1[[2]])
```
```{r, fig.height=25, fig.width=15}
target <- c("NSP4","midasin","mini-COL8","SP-D-like","FH20-X3","CAII",
"CELA3B","zinc-carboxypeptidase","ANO39","TYN1","H2A.2.2","nas2-X2",
"Lwamide-X1","DMRT1","TUBA4A-X1","DMRT1","HTRA3","polRF-X1",
"nop58-X1","OTP","MEP1A","rhammosyl-O-methyltransferase","hywnt3",
"PPOD1","ks1","hyAlx","ELAV2","POU4","MUC2", "ec3", "grm1","myc",
"myc1","wnt3","ec2","ec1B","ec4","ec1A","en1","en1_NDF1_DANRE")
FeaturePlot(object = seurat_f,
features = target,
reduction = "umap",
order = TRUE,
min.cutoff = 'q10',
label = TRUE,
repel = TRUE,
pt.size = 0.75)
```
## Cluster labeling
Cell type attribution was obtained through cross validation of my markers with the https://research.nhgri.nih.gov/HydraAEP/SingleCellBrowser/ database.
```{r, echo=TRUE}
mutate(attr_clusters=
ifelse(seurat_clusters == "0", "ISC/earlyNem",
ifelse(seurat_clusters == "1", "StenoNB", # based on midasin
ifelse(seurat_clusters == "2", "NB.1",
ifelse(seurat_clusters == "3", "GranG/ZymoG", # based on G009335 zinc carboxipeptidase
ifelse(seurat_clusters == "4", "NB.2",
ifelse(seurat_clusters == "5", "unknown",
ifelse(seurat_clusters == "6", "ec1A/B",
ifelse(seurat_clusters == "7", "Ec3N/En1N",
ifelse(seurat_clusters == "8", "IsoNB", # based on CAII
ifelse(seurat_clusters == "9", "ec2",
ifelse(seurat_clusters == "10", "Ec3N/En1N", # based on Lwamide G018128
ifelse(seurat_clusters == "11", "OTP+MEP1A+",
ifelse(seurat_clusters == "12", "ec4",
NA))))))))))))))
# midasin G006924 StenoNB I_ISC I_FemGC
# zinc carboxipeptidase G009335 GranG ZymoG
# Lwamide G018128 Ec3N En1N
# CAII G023665 isoNB
# MEP1A G019984 En_tentacle En_Foot
# OTP G025356 En_Head Ec_Head I_GranGI
```
```{r, fig.width=8, fig.height=6}
Idents(seurat_f) = "attr_clusters"
DimPlot(seurat_f, label = TRUE)+
labs(title = "Attributed clusters")
```
# Transcription marker expression
Celina provided me with a list of markers of interest that I tried to find in the 105 genome.
```{r}
tr = read.xlsx("./data/mcbi_dataset_MH_annotated.xlsx", sheet = "transcription")
head(tr)
```
```{r}
uh = DotPlot_scCustom(seurat_f, features = tr$Symbol_updated,
colors_use = viridis_plasma_dark_high,
x_lab_rotate = T)+
labs(title = "Dotplot for theoretical markers")
uh
```
```{r, fig.height=10, fig.width=15}
FeaturePlot(object = seurat_f,
features = tr$Symbol_updated,
reduction = "umap",
order = TRUE,
min.cutoff = 'q10',
label = TRUE,
repel = TRUE,
pt.size = 0.75)
```
```{r}
Idents(seurat_f) = "EXP_TIME"
DimPlot(seurat_f, label = FALSE)
# Creating the head and foot data subset
hfootf = subset(seurat_f, subset = EXP == "REG_FOOT")
hheadf = subset(seurat_f, subset = EXP == "REG_HEAD")
```
```{r, fig.width=30, fig.height=6, eval = F}
# Plotting conditions separately
Idents(hfootf) = [email protected]$attr_clusters
Idents(hheadf) = [email protected]$attr_clusters
DimPlot(hfootf, reduction = "umap", split.by = "EXP_TIME", pt.size = 1)
DimPlot(hheadf, reduction = "umap", split.by = "EXP_TIME", pt.size = 1)
```
```{r, eval = FALSE}
write_rds(seurat_f, "./data_output/interstitial/seurat_filtered.rds")
```
# Session info
```{r}
sessionInfo()
```