Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRISPR module] Support IUPAC notation in PAMs #280

Open
dgruano opened this issue Oct 3, 2024 · 3 comments
Open

[CRISPR module] Support IUPAC notation in PAMs #280

dgruano opened this issue Oct 3, 2024 · 3 comments

Comments

@dgruano
Copy link
Contributor

dgruano commented Oct 3, 2024

Staphilococcus pyogenes Cas9 has an NGG PAM, but Staphilococcus aureus Cas9 has an NNGRRT PAM. While N means any nucleotide, R means A or G. We should support this notation so users can input a human-readable PAM sequence using IUPAC notation without the need to coding the associated regex.

Additional note: I would also change the current notation .GG to NGG for the same purpose.

@BjornFJohansson
Copy link
Collaborator

BjornFJohansson commented Oct 6, 2024

Good idea. Is there a solution out there that we can use?

I found this online: https://www.biostars.org/p/298791/

and this: https://benchling.engineering/building-a-regex-search-engine-for-dna-e81f967883d3

@dgruano
Copy link
Contributor Author

dgruano commented Oct 6, 2024

Given that right now we use regex to find possible enzyme cuts (and that sequences are relatively short compared to a large database), I created a small function to parse a PAM with ambiguous nucleotides and build a regex. This regex is then fed to the one used for search.

And Biopython has the dictionaries with the ambiguous nucleotides, so I'm taking advantage of that!

@BjornFJohansson
Copy link
Collaborator

ok, seems a good solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants