[CRISPR module] Support IUPAC notation in PAMs #280

dgruano · 2024-10-03T15:44:03Z

Staphilococcus pyogenes Cas9 has an NGG PAM, but Staphilococcus aureus Cas9 has an NNGRRT PAM. While N means any nucleotide, R means A or G. We should support this notation so users can input a human-readable PAM sequence using IUPAC notation without the need to coding the associated regex.

Additional note: I would also change the current notation .GG to NGG for the same purpose.

The text was updated successfully, but these errors were encountered:

BjornFJohansson · 2024-10-06T06:30:39Z

Good idea. Is there a solution out there that we can use?

I found this online: https://www.biostars.org/p/298791/

and this: https://benchling.engineering/building-a-regex-search-engine-for-dna-e81f967883d3

dgruano · 2024-10-06T07:59:07Z

Given that right now we use regex to find possible enzyme cuts (and that sequences are relatively short compared to a large database), I created a small function to parse a PAM with ambiguous nucleotides and build a regex. This regex is then fed to the one used for search.

And Biopython has the dictionaries with the ambiguous nucleotides, so I'm taking advantage of that!

BjornFJohansson · 2024-10-06T11:44:07Z

ok, seems a good solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CRISPR module] Support IUPAC notation in PAMs #280

[CRISPR module] Support IUPAC notation in PAMs #280

dgruano commented Oct 3, 2024

BjornFJohansson commented Oct 6, 2024 •

edited

Loading

dgruano commented Oct 6, 2024

BjornFJohansson commented Oct 6, 2024

[CRISPR module] Support IUPAC notation in PAMs #280

[CRISPR module] Support IUPAC notation in PAMs #280

Comments

dgruano commented Oct 3, 2024

BjornFJohansson commented Oct 6, 2024 • edited Loading

dgruano commented Oct 6, 2024

BjornFJohansson commented Oct 6, 2024

BjornFJohansson commented Oct 6, 2024 •

edited

Loading