Fixing cases where mutations are introduced although they do not pass the pssm_threshold #56
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The following line:
probs_masked+=probs*0.001
May introduce mutations that are below the pssm threshold.
For example, when
probs[i,j] =~ 1
(close to 1) andprobs[i,k] = 0
(for k!=j) but thepssm_log_odds_mask[i,j]=0
, the forbbiden aa may still be introduced since now:probs_masked[i,j] =~ 0.001
andprobs_masked[i,k] = 0
for k!=jThen after normalization occurs:
probs = probs_masked/torch.sum(probs_masked, dim=-1, keepdim=True) #[B, 21]
probs[i,j] = 1 now , although it doesn't cross the PSSM threshold.
Is that a bug or a feature? :-D
Meaning, if
pssm_log_odds_mask[i,j] = 0
thenprobs_masked[i,j] = 0
too right?