Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'At-A-Glance' (AAG) Edge Annotations for high-level EPC information #10

Open
mbrush opened this issue Jan 30, 2023 · 1 comment
Open

Comments

@mbrush
Copy link
Collaborator

mbrush commented Jan 30, 2023

The 'At-A-Glance' (AAG) idea refers to a set of 4-5 edge properties that provide a high level EPC summary, allowing users to make a first pass assessment of confidence and relevance for a given KG Edge (or a 'Result' that maps to a single asserted or predicted KG Edge).

There is a long history of proposals for this type of thing, coming from different perspectives and stakeholders (summarized here). These proposals have been aligned and refined over the past year. IMO we are at a point where we need to move toward implementing it.

This issue proposes an initial set of AAG properties to implement, and can serve as a place to discuss how to move this from idea to practice. Separate tickets will be created for proposals/discussion around developing and implementing each proposed property.

Initial discussions focused on the following five types of information that AAG properties could provide:

  1. Knowledge Level/Type: the level/type of knowledge that is reported in an edge, based on how the knowledge was produced, the strength of evidence supporting it, or our confidence in its validity. (see Define the 'Knowledge Level/Type' AAG Property  #11)
    a. e.g. ‘Knowledge Assertion’, ‘Logical Entailment, ‘Prediction’, ‘Statistical Association’, etc.

  2. Agent Type: the type of agent that generated the statement expressed in an edge (see Defining the 'Agent Type' AAG Property #12)
    a. e.g. 'Manual Agent', 'Automated Agent', 'Computational Model', 'Text-Mining Agent', etc.

  3. Supporting Evidence Type(s): the types of information / data was used as evidence in generating the statement expressed in an Edge
    a. e.g. ‘experimental data’, ‘clinical data’, ’sequence similarity data’, ‘mutant phenotype data’, etc.

  4. Supporting Methodologies: reasoning, analytical, or experimental methodologies that were applied in generating the stated knowledge, and/or the evidence supporting it. t.b.d. if we want to report these at the type level, instance level, or linkouts to free-text descriptions.
    a. examples of type level method info: . 'rule-based graph inference', 'unsupervised machine learning', 'chi-squared analysis', 'hidden markov model', 'electron microscopy', 'yeast-two-hybrid assay' etc.
    b. examples of instance level method info: 2015 ACMG Variant Interpretation Guidelines, ClinGen SOP for Gene Validity Curation, ARAGORN Rule-Mining Prediction algorithm, ICEES correlation analysis pipeline
    c. examples of descriptions: see content of Translator Resource Wiki Pages, e.g. for Improving Agent

  5. Edge Confidence Score(s): qualitative terms and/or quantitative values reflecting how confident an agent is in the veracity of the specific statement expressed an Edge
    a. qualitative scores may include things like 'definitive,' 'possible', 'unlikely', or
    'high confidence', 'medium confidence', 'low confidence'
    b. quantitative scores will likely be scaled between 0 and 1 (e.g. '0.998', '0.032')
    c. t.b.d. if/how we will normalize confidence scores, and if scores for Statements of different Knowledge Types will be directly comparable or evaluated on separate scales or only in comparison to other statements in the same category.


Notes:

  • These AAG properties would be implemented as Association Slots (aka Edge Properties) in the Biolink Model.
  • Where relevant, enumerations would be created in Biolink as well, to constrain permissible values for consistent data entry, and provide a central location to clearly define each value.
  • More detailed representation of EPC metadata will also be supported by the Biolink model - to complement / extend the superficial view provided by the AAG fields.
  • In TRAPI, this information could be captured using Edge Attributes keyed on these Biolink edge properties, alongside other Edge metadata. However, we might consider defining dedicated named properties that hang directly from an Edge object in the TRAPI schema - which would reflect their importance, and promote their visibility / parsability.
  • In the UI, these AAG properties would be prominently displayed for each Edge or Result it returns to users- providing a high level understanding of supporting EPC, and allowing filtering / navigation down to Results of most interest/relevance. More detailed EPC would also be reported where provided by sources, and accessible to users upon deeper exploration of selected answers.
@mbrush
Copy link
Collaborator Author

mbrush commented Dec 9, 2023

Examples of how AAG properties can tell a high level story about a given Edge: https://docs.google.com/document/d/1ESnpiPx_J2EmpsR8K6Q8IVROGfahbFOPF1XLTaM1YNQ/edit#heading=h.90lu494u3a11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant