You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not out of the box, no.
There are approaches which use embedding (distance) from pLMs for remote homolgy detection s.a. (really not a full list just an excerpt that just came to my mind with the latter being from our group (disclaimer)):
You can use the code base provided in the first link to align proteins and/or you can use the recipe described in the latter link to find remote homologs. My 2 cents: if you really want to align proteins I do not think that embeddings will give you a speed up (at least, I am not aware of an implementation that would a) generate embeddings and b) align them to some DB in less time than MMSeqs2/Diamond. What embeddings might give you is some fast pre-filter if you have your DB already pre-computed (see second link for details).
But I would probably just use foldseek (potentially together with predicted 3Di if you care about speed- disclaimer#2 also from us --> https://github.com/mheinzinger/ProstT5/tree/main/scripts ).
What embeddings might give you is some fast pre-filter if you have your DB already pre-computed
I'll look into your paper for more details but I'm a bit confused. Let's say you have a model that uses protein embeddings for UniRef50. When you're saying having the DB pre-computed are you referring to the query proteins or the reference proteins or both?
I'm looking for a faster alternative to Diamond for aligning proteins to UniRef50 so I can map identifiers to de novo proteins.
Can I use this tool to accomplish this task?
The text was updated successfully, but these errors were encountered: