Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate get_most_severe_consequence_for_summary in favor of more flexible get_most_severe_csq_from_multiple_csq_lists #714

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

jkgoodrich
Copy link
Contributor

Also adds:

  • filter_to_most_severe_consequences, which is used by get_most_severe_csq_from_multiple_csq_lists
  • Adds loftee_labels and no_lof_flags parameters to filter_vep_transcript_csqs_expr for filtering by loftee labels and flags.

Depends on #713

updates_to_get_most_severe_consequence_for_summary.html.zip testing to make sure the same results are returned for get_most_severe_consequence_for_summary

jkgoodrich added 21 commits May 7, 2024 21:25
  - Allow lof_flags to be missing in addition to lof_flags == "" for lof == "HC" to have no flag penalty
  - Pass `csq_order` to `add_most_severe_consequence_to_consequence`
…ble to use in `default_generate_gene_lof_matrix`
…sq_lists and make use of it in process_consequences
… into jg/fix_process_consequences

# Conflicts:
#	gnomad/utils/vep.py
…s` to `filter_to_most_severe_consequences` and clean them up
…f `get_most_severe_consequence_for_summary`
Base automatically changed from jg/make_expr_version_of_filter_vep_transcript_csqs to main July 9, 2024 15:54
tests/utils/test_vep.py Outdated Show resolved Hide resolved
tests/utils/test_vep.py Outdated Show resolved Hide resolved
tests/utils/test_vep.py Outdated Show resolved Hide resolved
tests/utils/test_vep.py Outdated Show resolved Hide resolved
gnomad/utils/vep.py Outdated Show resolved Hide resolved
gnomad/utils/vep.py Outdated Show resolved Hide resolved
- most_severe_consequence: Most severe consequence for variant.
- lof: Whether the variant is a loss-of-function variant.
- no_lof_flags: Whether the variant has any LOFTEE flags (True if no flags).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this returns either True or None right? why not True or False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes, so I'm not sure what the result was before, but now in my test on 2 partitions of the exomes result HT, I get {False: 21, True: 580, None: 78785}. Let me know if you are still seeing no False

)

# Test the function
result = get_most_severe_csq_from_multiple_csq_lists(vep_expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also tested with setting prioritize_loftee_no_flags to True or False -- does it make sense that got the same result regardless of how set this param was set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current vep_expr, yes. I added some changes to it, and a test for it

)

# Build the case expression to determine the most severe consequence.
ms_csq_expr = hl.case(missing_false=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use case expressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to? If you have a clearer solution, I'm happy to make modifications, just let me know what your thinking

)

# Initialize the lof struct with missing values.
lof_expr = hl.struct(lof=hl.missing(hl.tstr), no_lof_flags=hl.missing(hl.tbool))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so lof and no_lof_flags will always be None if prioritize_loftee/prioritize_loftee_no_flags are False, even if lof flags are present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this a bit, let me know if the changes make sense

@jkgoodrich jkgoodrich requested a review from klaricch October 24, 2024 13:19
@pytest.mark.parametrize(
"prioritize_protein_coding, prioritize_loftee, prioritize_loftee_no_flags, additional_order_field, additional_order, expected_most_severe_csq, expected_polyphen_prediction",
[
(False, False, False, None, None, None, None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only set expected_most_severe_csq if it's "stop_gained"?

else polyphen_prediction
),
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Define csq, protein_coding, lof, no_lof_flags, and polyphen_prediction.

additional_order=additional_order,
)

expected_dict = hl.Struct(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how were the values for the expected dict decided?

+ (["lof"] if prioritize_loftee else [])
+ (
["no_lof_flags"]
if prioritize_loftee_no_flags or prioritize_loftee
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find it confusing when no_lof_flags is True in cases where prioritize_loftee is True and prioritize_loftee_no_flags is False and there are flags present

@@ -777,86 +780,96 @@ def filter_to_most_severe_consequences(
:return: ArrayExpression with of the consequences that match the most severe
consequence.
"""
# Get the dtype of the csq_expr ArrayExpression elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Get the dtype of the csq_expr ArrayExpression elements
# Get the dtype of the csq_expr ArrayExpression elements.

(True, True, False, *polyphen_params, None, "possibly_damaging"),
(True, False, True, *polyphen_params, None, "possibly_damaging"),
(False, True, True, *polyphen_params, "stop_gained", None),
# Need to figure out class too large error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also got this error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants