-
Notifications
You must be signed in to change notification settings - Fork 0
/
report.Rmd
241 lines (154 loc) · 5.88 KB
/
report.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
---
title: "BVDE929_report"
author: "Marion Hardy"
date: "2023-03-16"
output:
html_document:
toc: true
theme: spacelab
highlight: monochrome
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(message = FALSE, cache = TRUE, echo = FALSE, warning = F, cache.lazy = F)
knitr::opts_chunk$set(fig.width=10, fig.height=15)
```
```{r data loading, include=FALSE}
library(tidyverse)
library(readxl)
data = read_xlsx("./data/BVDE_929_data.xlsx", sheet = "PeptideGroups")
```
# Introduction
Cells were treated with H2O2 (or ctrl) and proteins were digested, tagged with TMT and loaded onto the LC-MS/MS.
Oxidized methionins were conserved.
We want to know if oxidization might have happened at ionization, in which case the retention time for a non-oxidized
peptide should be the same as the one for an oxidized peptide.
We have 3 excel files
- PSMs
- PeptideGroups
- (ProteingGroups)
# Data clean up
```{r, echo = T}
# Filtering out contaminants
dim(data)
data =
data %>%
filter(Contaminant != "TRUE")
dim(data)
```
```{r, echo=TRUE}
# Reduce the number of rows to keep only the ones of interest
data = data[,c(3:5,9,10,16:23,29:44)]
colnames(data)
colnames(data)[10:13] = c("F1_H2O2","F1_ctrl","F2_H2O2","F2_ctrl")
# Filter out peptides which appear in <4 conditions
cols = sapply(data[14:29], grepl, pattern = "Not.*{2,}")
data = data[rowSums(cols) >= 3, ]
dim(data)
# Filter out PSM <1
data =
data %>%
filter(!`# PSMs` < 1)
```
We have 7690 peptides which appeared in less than four of the samples and none that had PSM = 0.
# Analysis
## Frequency of peptide oxidation
```{r}
pepox =
data %>%
filter(grepl("xidation", Modifications)) # in case it's dioxidated
pepctrl=
data %>%
filter(!grepl("xidation", Modifications))
data =
data %>%
mutate(state = case_when(
grepl("xidation", Modifications) ~ "Oxidated",
!grepl("xidation", Modifications) ~ "Not oxidated"))
```
We have 1541 peptides identified as oxidated and 18 545 as not oxidated.
```{r, echo=TRUE}
dim(pepox)
head(pepox)
dim(pepctrl)
head(pepctrl)
```
```{r, fig.height=4, fig.width=8}
# Add annotations for which samples is which conditions
ldata =
data %>%
pivot_longer(c(10:13),
names_to = "Treatment",
values_to = "Abundance")
table(ldata$Treatment, ldata$state)
ldata %>%
filter(Treatment %in% c("F2_ctrl", "F2_H2O2")) %>%
ggplot(aes(x = Treatment,
y = Abundance,
color = state))+
geom_violin(position = position_dodge(width = 1))+
geom_boxplot(position = position_dodge(width = 1))+
theme_bw()
ldata %>%
ggplot(aes(x = Abundance,
color = state, fill = state))+
geom_histogram(alpha = 0.7, binwidth = 5)+
theme_bw()+
facet_wrap(~Treatment)
ldata %>%
ggplot(aes(x = Abundance,
color = state, fill = state))+
geom_histogram(alpha = 0.7, binwidth = 0.1)+
theme_bw()+
facet_wrap(~Treatment, scales = 'free_y')
```
# Match the retention times to the right peptides
```{r}
data1 = read_xlsx("./data/BVDE_929_data.xlsx", sheet = "PSMs")
data1 = data1[,c(5:9,27,33:38)]
data1 =
data1 %>%
mutate(state = case_when(
grepl("xidation", Modifications) ~ "Oxidated",
!grepl("xidation", Modifications) ~ "Not oxidated")) %>%
mutate(Sequence = toupper(`Annotated Sequence`))
all = left_join(data, data1, by = c("Annotated Sequence" = "Sequence", "state" = "state"))
```
I merged the PSM file which contains the retention times with the PeptideGroup data I filtered before.
Then I removed the unduplicated peptide sequences to be rid of any peptide not oxidized and unoxidized.
However, this filtering is not the best since it would keep two peptides oxidized in different places. But the idea is that the overall population comparison will compare peptides that were oxidized to peptide not oxidized.
```{r, fig.height=4, fig.width=8, echo=TRUE}
colnames(all)[36] = "Retention_time"
table(duplicated(all$`Annotated Sequence`))
all = all[duplicated(all$`Annotated Sequence`),]
lall =
all %>%
pivot_longer(c(10:13),
names_to = "Treatment",
values_to = "Abundance")
```
I removed the violin plots because I found a way to filter more stringently the peptides per condition and state.
## Filtering peptides
### Only the oxidated in H2O2 and the matching unoxidated in ctrl
Filtering peptides to keep only the peptides appearing in the two conditions and oxidated in H2O2 treatment and unoxidated in the ctrl treatment.
```{r}
oxrows = lall %>% filter(state == "Oxidated", Treatment %in% c("F1_H2O2", "F2_H2O2")) %>%
select(`Annotated Sequence`, Modifications.x, Abundance, state, Treatment, Retention_time) %>%
drop_na(Abundance)
unoxrows = lall %>% filter(state == "Not oxidated", Treatment %in% c("F1_ctrl", "F2_ctrl")) %>%
select(`Annotated Sequence`, Modifications.x, Abundance, state, Treatment, Retention_time) %>%
drop_na(Abundance)
```
```{r, fig.height=8, fig.width=10}
targetpep = inner_join(oxrows, unoxrows, by = c("Annotated Sequence"), suffix = c("_H2O2", "_ctrl"))
lall %>%
filter(`Annotated Sequence`%in% targetpep$`Annotated Sequence`,
Treatment%in%c("F2_H2O2","F2_ctrl")) %>%
ggplot(aes(x = `Annotated Sequence`,
y = Retention_time,
color = state, fill = state))+
geom_point(position=position_dodge(width = 1))+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
```
Il y a 49 peptides qui sont oxidés en H2O2 et non oxidés en ctrl mais on ne sait pas à quelle abondance ils sont! Il faut donc aller checker leur delta d'abondance et garder ceux qui sont oxidés et plus abondants en H2O2.