Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy number assessment subtree proposal (SO, EFO) #1404

Closed
mbaudis opened this issue Jan 5, 2022 · 13 comments
Closed

copy number assessment subtree proposal (SO, EFO) #1404

mbaudis opened this issue Jan 5, 2022 · 13 comments
Assignees

Comments

@mbaudis
Copy link

mbaudis commented Jan 5, 2022

This request is for a subtree which probably should reside in SO and has been submitted for consideration there. However, advice or placement elsewhere is wished for :-)

AFAIK EFO's copy number variation is aimed at the measurement, not at representing the resulting qualities.

There is a definite need for a reference set of "relative genomic copy number" classes as discussed in GA4GH VRS and ELIXIR hCNV.

Preferred term label

(genomic) copy number assessment

Synonyms

Copy number variation analysis

Textual definition

Result of genomic copy number assessment of a genomic element or region

Proposed term hierarchy

This is a copy of the three documented in the SO issue.

id: SO:nnnn01
label: copy number assessment
  |
  |-id: SO:nnnn02
  | label: base ploidy
  |   |
  |   |-id: SO:nnnn04
  |     label: copy-neutral loss of heterozygosity
  |
  |-id: SO:nnnn03
    label: copy number variation
      |
      |-id: SO:nnnn05
      | label: copy number loss
      |   |
      |   |-id: SO:nnnn07
      |   | label: low-level copy number loss
      |   |
      |   |-id: SO:nnnn08
      |     label: complete genomic deletion
      |
      |-id: SO:nnnn06
        label: copy number gain
          |
          |-id: SO:nnnn09
          | label: low-level copy number gain
          |
          |-id: SO:nnnn10
            label: genomic amplification
            note: commonly but not consistently used for >=5 copies on a bi-allelic genome region

Attribution

0000-0002-9903-4248

@paolaroncaglia
Copy link
Collaborator

Dear @mbaudis ,

I see that there are no recent comments in the related SO ticket, so I suppose that you'd like the terms added in EFO as placeholders. Please let me know if this is not (or no longer) the case. I'm just back from sick leave and haven't looked into the SO thread closely yet, but if desired, I'll be happy to try to add the terms above in time for the next EFO release. We could fine-tune them if needed before the release after that.

Best wishes,
Paola

@paolaroncaglia paolaroncaglia self-assigned this Jan 10, 2022
@mbaudis
Copy link
Author

mbaudis commented Jan 10, 2022

@paolaroncaglia From my POV it is important to go ahead to have this represented as a set of concepts which can then be expanded upon if necessary; we've had a lot of discussions in GA4GH Beacon & VRS and having these terms would be very helpful in supporting some upcoming developments.
(pinging @ahwagner ...)

@paolaroncaglia
Copy link
Collaborator

Thanks @mbaudis for confirming. I'll work on this ticket this week, and will post updates here when ready.

@mbaudis
Copy link
Author

mbaudis commented Jan 14, 2022

@paolaroncaglia I have updated the term tree slightly, w/ one addition and some re-labeling:

  • Addition: focal genome amplification
  • re-labeling:
    • genomic amplification => high-level copy number gain
    • base ploidy => regional base ploidy
  • on the SO ticket added notes about the expected allele count this is referring to:
  • autosomal chromosome in human germline: 2
  • X-chromosome in human male: 1
  • triploid cancer cell line: 3
    i.e. a region with 2 alleles in a triploid cell line would correspond to a low-level copy number loss
id: SO:nnnn01
label: copy number assessment
  |
  |-id: SO:nnnn02
  | label: regional base ploidy
  |   |
  |   |-id: SO:nnnn04
  |     label: copy-neutral loss of heterozygosity
  |
  |-id: SO:nnnn03
    label: copy number variation
      |
      |-id: SO:nnnn05
      | label: copy number loss
      |   |
      |   |-id: SO:nnnn07
      |   | label: low-level copy number loss
      |   |
      |   |-id: SO:nnnn08
      |     label: complete genomic deletion
      |
      |-id: SO:nnnn06
        label: copy number gain
          |
          |-id: SO:nnnn09
          | label: low-level copy number gain
          |
          |-id: SO:nnnn10
             label: high-level copy number gain
             note: commonly but not consistently used for >=5 copies on a bi-allelic genome region
              |
              |-id: SO:nnnn11
                 label: focal genome amplification
                 note: >-
                   commonly used for localized multi-copy genome amplification events where the
                   region does not extend >3Mb (varying 1-5Mb) and may exist in a large number of
                   copies

@paolaroncaglia
Copy link
Collaborator

@mbaudis thanks. FYI, I plan to add the branch under EFO IAO:0000030 'information entity', which is broader than 'measurement'.

@paolaroncaglia
Copy link
Collaborator

@mbaudis also, I can't label the new term for SO:nnnn03 as 'copy number variation' as it would clash with the existing EFO:0004798. I can't easily relabel EFO:0004798 because it's used by GWAS, so I'll label the new term as 'copy number variation information' (with a related synonym "copy number variation"). All of this may not be ideal, but it's the best I can come up if you'd like all terms to be in the same branch (e.g., EFO has 'ploidy' under 'quality'...).

@mbaudis
Copy link
Author

mbaudis commented Jan 14, 2022

@paolaroncaglia Alternatives:

  • "genomic copy number variation"
  • "relative copy number variation"
  • "observed copy number variation"

... ? "Relative" would be the most specific for what this wants to express ...). I'm not sure if the "information" would express the meaning that some type of CNV was observed.

@paolaroncaglia
Copy link
Collaborator

@mbaudis I'll go with
label 'relative copy number variation',
exact synonym "observed copy number variation"
related synonyms "copy number variation" and "genomic copy number variation".
I'll let you know as soon as the pull request is ready so you may take a look before I commit if you wish. I'm not adding definitions for now, but I added all of your notes as rdfs:comments. Thanks.

paolaroncaglia added a commit that referenced this issue Jan 14, 2022
@paolaroncaglia
Copy link
Collaborator

@mbaudis diff here if you wish to take a look and/or add further info: https://github.com/EBISPOT/efo/pull/1421/files
Otherwise, I'll commit in the early afternoon (I might attempt simple definitions so I can add your ORCID to all terms, not just the top one).

@paolaroncaglia
Copy link
Collaborator

(EFO release is scheduled for Monday morning...)

@mbaudis
Copy link
Author

mbaudis commented Jan 14, 2022

@paolaroncaglia Great - thanks!

One suggestion: Adding a synonym "homozygous deletion" to EFO_0030069 "complete genomic deletion".

paolaroncaglia added a commit that referenced this issue Jan 14, 2022
@paolaroncaglia
Copy link
Collaborator

@mbaudis sure, will do later today. And will then update ticket with new term IDs, for the record.

paolaroncaglia added a commit that referenced this issue Jan 14, 2022
@paolaroncaglia
Copy link
Collaborator

paolaroncaglia commented Jan 14, 2022

@mbaudis I've added basic definitions and your ORCID to all the new terms, plus the suggested synonym, as per (bottom part of) https://github.com/EBISPOT/efo/pull/1424/files. They'll be public with the next EFO release scheduled for Monday January 17th. If further changes are desired, please feel free to open a new ticket for the following EFO release.

While the checks run on the final pull request, here are the new terms' IDs and labels:

EFO_0030063 (copy number assessment)
EFO_0030064 (regional base ploidy)
EFO_0030065 (copy-neutral loss of heterozygosity)
EFO_0030066 (relative copy number variation)
EFO_0030067 (copy number loss)
EFO_0030068 (low-level copy number loss)
EFO_0030069 (complete genomic deletion)
EFO_0030070 (copy number gain)
EFO_0030071 (low-level copy number gain)
EFO_0030072 (high-level copy number gain)
EFO_0030073 (focal genome amplification)

Best,
Paola

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants