Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unintended behaviour in dseq.__init__? #253

Open
manulera opened this issue Aug 12, 2024 · 2 comments
Open

Unintended behaviour in dseq.__init__? #253

manulera opened this issue Aug 12, 2024 · 2 comments

Comments

@manulera
Copy link
Collaborator

Hi @BjornFJohansson I was wondering whether we want to support this kind of behaviour for Dseq, or whether it is unintended.

from pydna.dseq import Dseq
from pydna.utils import rc

seq1 = "ACGGCAGCCCGT"
seq2 = rc(seq1)

seq1_padded = "aaa" + seq1 + "aaa"
seq2_padded = "ccc" + seq2 + "ccc"

dseq1 = Dseq(seq1_padded, seq2_padded)

print(repr(dseq1))

gives a dseq with mismatches

Dseq(-18)
aaaACGGCAGCCCGTaaa
cccTGCCGTCGGGCAccc

I wonder if we should constrict the representation to have no mismatches (e.g. use terminal_overlap instead of common_substrings)? Or give an error if one like this comes up?

@BjornFJohansson
Copy link
Collaborator

BjornFJohansson commented Sep 6, 2024

This was by design. It is there so that we can make staggered sequences like so:

from pydna.dseq import Dseq
from pydna.utils import rc

seq1 = "ACGGCAGCCCGT"
seq2 = rc(seq1)

seq1_padded = "aaa" + seq1
seq2_padded = "ccc" + seq2

dseq1 = Dseq(seq1_padded, seq2_padded)

print(repr(dseq1))
Dseq(-18)
aaaACGGCAGCCCGT
   TGCCGTCGGGCAccc

Does this create problems in other use cases? Maybe a warning would be appropriate.

@manulera
Copy link
Collaborator Author

manulera commented Sep 10, 2024

Hi @BjornFJohansson, in my example the returned sequence has mismatches at both ends, that's the problematic bit.

If you are manually typing both strands, you may make a mistake when typing one of them, and you may want to get an error in that case.

You can create a sequence with mismatches and stagger by passing the overhang, but not sure the auto-find of overhangs should be returning sequences with mismatches. In general, most functions of pydna will give unexpected behavior if there are mistmaches I guess? So I think an error would be good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants