Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnnotationByFeature - get non-overlapping "pure" features // Granges #205

Closed
bri2020 opened this issue Jul 8, 2022 · 1 comment
Closed

Comments

@bri2020
Copy link

bri2020 commented Jul 8, 2022

Dear Altuna Akalin, Dear others,

I would like to know if you have an idea or strategy, how to proceed with the information from a AnnotationByFeature -object generated by annotateWithGeneParts or annotateWithFeatureFlank method. E.g. the aim is, that I would like to extract the methylation status for the specific features.

How could I extract the information for promotor, exon, intron into coordinates or Granges? Is there a way to generate something? I saw in another issue that you mentioned #186
So I basically I could do XYZ_exons=subsetByOverlaps(XYZ_gr,gene.parts$exon

However I like the idea of priority of promoter > exons > introns and - this membership data frame could be interesting to extract only "pure" features. How can I keep the priority for promoter > exons > introns when subsetting the data with subsetByOverlaps and assigning unique features.

  • If I do ....
> getTargetAnnotationStats(XYZ_annot, percentage =FALSE,precedence = TRUE)
  promoter       exon     intron intergenic 
      3072       4253       7613      17987 

.... I would like to get the same numbers for

YZ__gr_exons =subsetByOverlaps(unite_meth_grl,gene.parts$exons)
YZ__gr_promoters =subsetByOverlaps(unite_meth_grl,gene.parts$promoters)
YZ_r_introns =subsetByOverlaps(unite_meth_grl,gene.parts$introns)
YZ__gr_tss =subsetByOverlaps(unite_meth_grl,gene.parts$TSSes)

> length(YZ__gr_exons) #4514
[1] 4514
> length(YZ__gr_promoters) #3072
[1] 3072
> length(YZ_gr_introns)#7923
[1] 7923
> length(YZ__gr_tss)#0
[1] 0

... but thats because there are several hits in different features of course.

Questions

  1. Are you aware of any package to exclude overlapping granges? E.g. Setdiff?

a. from YZ__gr_exons, I would remove all promotor positions
b. from YZ_gr_introns, I would remove all exon and promotor position

Correct?

  1. How to you calculate intergenic in your package? Is there a way to extract this as Granges?

I guess you had only count the ones which, havent gotten any assignment in your code, correct? Here:
intergenic = 100*sum(rowSums(memb)==0)/nrow(memb) )

That means I could also generate a granges of what is not YZ__gr_exons AND YZ__gr_promoters AND YZ_gr_introns.

OR another idea:

  1. what is the order in an object (annotateWithFeatureFlank and/or annotateWithGeneParts) - does it has the order as the target?

a) It would be already helpful to know the order - so than I can extract the methylation matrix and just add/merge the membership matrix to it write a little loop to assign only one feature. like this df = cbind(getAssociationWithTSS(methyl_filtered_annot), as.data.frame(getMembers(methyl_filtered_annot))) and add that to a Methylkit-object.

Can you help me reading you code about the order? Am I at the right position? Is it in both parts the order of the target?

  1. geneparts:
    annotatGrWithGeneParts <- function(gr, prom, exon, intron, strand=FALSE){
memb = data.frame(matrix(rep(0,length(gr)*3),ncol=3) )
    colnames(memb)=c("prom","exon","intron")
    memb[countOverlaps(gr,prom) > 0,1] = 1
    memb[countOverlaps(gr,exon) > 0,2] = 1
    memb[countOverlaps(gr,intron) > 0,3] = 1
  1. annotateWithFeatureFlank:
    setMethod( "annotateWithFeatureFlank",
 memb[countOverlaps(target,feature)>0,1]=1

b) can I extract the position (chromosome and START) for this membership matrix?

I am sorry for my loud thinking ;D, but maybe you can enlighten me a bit more.
Kind regards,
Thanks a lot, Britta Meyer

@bri2020
Copy link
Author

bri2020 commented Jul 14, 2022

I think I answered myself most of the questions meanwhile: I checked and it seems that in both cases "annotateWithFeatureFlank and/or annotateWithGeneParts" the overlap matrix is calculated with "countOverlaps" and thus the order of the query object is kept. I also easily manage to extract the positions knowing the order of counts/overlaps.
However I would be happy for any input/doubts coming up at a later stage.
Thanks Britta

@bri2020 bri2020 closed this as completed Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant