Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSDs are not identical #174

Open
ala98412 opened this issue Jul 8, 2024 · 3 comments
Open

TSDs are not identical #174

ala98412 opened this issue Jul 8, 2024 · 3 comments
Labels

Comments

@ala98412
Copy link

ala98412 commented Jul 8, 2024

Hi,

I am trying to determine the insertion target site of my LTR/Gypsy elements. Using 10:22687853..22691116_INT as an example, I assume that:

scaffold name = 10
start_position = 22687853
end_position = 22691116
intact = INT
According to the manual, target site duplications (TSD) should be the same at both the 5' and 3' ends. However, in my genome, the 5' TSD and 3' TSD are not the same. I have noticed some patterns, but I believe they should be identical.

Did I misunderstand something? Sorry for my naive question.

This is my Python3 script:
seq[start_position -1-5:start_position -1], seq[end_position:end_position +5]

Here is my result table:

5'TSD	3'TSD	Repeat_ID
TGACA	TGTAA	10:22687853..22691116_INT
TTACA	TGTCA	13:11754704..11757508_INT
TGACA	TGTAA	13:23094695..23096177_INT
TGACA	TGTAA	13:3475695..3477002_INT
TGACA	TGTAA	20:17957399..17960425_INT
CCACA	TGTAG	3:1605728..1611282_INT
CTACA	TGTGG	3:3264571..3270749_INT
TTACA	TGTGG	3:37194336..37198451_INT
TGACA	TGTCA	9:31434801..31436809_INT
TAACA	TGTAA	22:18312247..18314782_INT
TAACA	TGTCA	1:1966816..1970089_INT
TGACA	TGTAA	1:1249479..1253739_INT
TGACA	TGTCA	4:30995067..30999235_INT
TGACA	TGTCA	13:3164753..3167290_INT
CCACA	TGTAG	8:12600685..12607139_INT
TTACA	TGTCA	8:16012374..16016235_INT
CTACA	TGTAA	9:23014621..23021192_INT
GTACA	TGTGG	8:7106864..7111228_INT
AAACA	TGTTA	2:37860875..37866860_INT
TTACA	TGTCA	18:28467404..28470784_INT
TAACA	TGTCA	23:7707104..7709903_INT
GTACA	TGTGG	15:14569922..14574417_INT
ATACA	TGTTA	9:23918263..23924418_INT
TGACA	TGTCA	19:22608920..22612458_INT
CCACA	TGTAG	5:22942199..22948802_INT
TTACA	TGTAT	24:9397720..9401585_INT
CTTCA	TGTTG	3:40824394..40831408_INT
GTACA	TGTAG	6:34329395..34335570_INT
AAACA	TGTTA	2:37841501..37847533_INT
TTACA	TGTAA	12:17059040..17065533_INT
TAACA	TGTGA	1:21012603..21019343_INT
CAACA	TGTGG	3:2895580..2901609_INT
TGACA	TGTCA	17:26914882..26918649_INT
CTACA	TGTGG	17:1316150..1322707_INT
TGACA	TGTCA	5:3943017..3947283_INT
GCACA	TGTAT	13:30604096..30607812_INT
TGACA	TGTAA	7:19929823..19931152_INT
CTACA	TGTAA	7:9337970..9344181_INT
TTACA	TGTCA	1:32688615..32692545_INT
TTACA	TGTAG	5:764594..771458_INT
TTACA	ACTGT	16:22850998..22854184_INT
TTACA	TGTTA	2:15229994..15233935_INT
TTACA	TGTAA	10:12217720..12223558_INT
CTACA	AAGAT	4:6220653..6227592_INT
ACACA	TGTTA	10:5580976..5585395_INT
TGACA	TGTCA	3:7444737..7448764_INT
TGACA	TGTTA	2:12416092..12419661_INT
GCATG	AAAAA	23:8880995..8883358_INT
TGACA	TGTCA	10:12631883..12635627_INT
GCACA	TGTAC	8:7133855..7139321_INT
GGACA	TGTAG	20:22927000..22934272_INT
TTACA	TGTAA	17:2738792..2743075_INT
TTACA	TGTAA	8:12102948..12109802_INT
TAACA	TGTAA	8:17981899..17985045_INT
CTACA	TGTAA	7:11248909..11251237_INT
TTACA	TGTAA	6:20537116..20543571_INT
TTACA	TGTAG	17:14704587..14710662_INT
TGCCA	TGTTT	3:174059..179427_INT
TTACA	AGTGT	7:24423067..24429244_INT
TGACA	TGTCA	8:16576498..16579370_INT
CCACA	TGTAG	4:17854264..17857441_INT
TGACA	TGTCA	3:38208553..38212350_INT
CAACA	TGTCG	7:631159..636493_INT
ATCCA	TGTTA	1:1123017..1129006_INT
AGCCA	TGTAA	3:26336380..26348605_INT
TTACA	TGTCA	11:9892059..9893863_INT
CTACA	TGTGA	3:44276676..44282169_INT
CAACA	TGTAG	4:32696919..32701358_INT
TTACA	TGTCA	19:24017563..24019609_INT
TGACA	TGTCA	13:3224365..3227310_INT
TAACA	TGTCA	11:29824664..29826386_INT
CTACA	TGTGA	16:14646338..14652607_INT
TAACA	TGTCA	20:9902191..9906967_INT
CTACA	TGTTG	3:3130423..3134376_INT
TCACA	TGTAA	2:17612423..17618953_INT
TAACA	TGTCA	4:31576990..31580084_INT
TTACA	TGTGA	14:15475907..15481394_INT
TTACA	TGTTA	18:18699994..18703723_INT
TTACA	TGTAA	7:21620972..21626540_INT
TAACA	TGTCA	21:20045206..20048733_INT
TTACA	TGTAA	9:22109920..22116800_INT
TTACA	TGTGA	3:8418439..8424555_INT
CTACA	TGTTG	13:3964337..3967749_INT
TAACA	TGTCA	21:7028931..7030389_INT
TCACA	TGTAA	22:13751478..13754925_INT
TTACA	TGTAT	12:16121309..16125551_INT
TAACA	TTTGT	2:44761746..44767043_INT
CTACA	TGTAG	7:21033025..21039338_INT
TTACA	TGTGG	1:46125613..46129869_INT
CGACA	TGTTG	20:4090440..4099600_INT
TTACA	TGTAG	5:1168753..1174941_INT
TGACA	TGTAA	5:3743494..3747596_INT
TGACA	TGTAA	13:16891538..16894937_INT
TTACA	TGTGG	20:7399794..7403248_INT
TTACA	TGTAA	10:14750388..14756880_INT
TTACA	TGTAC	6:11915047..11920736_INT
TAACA	TGTTA	2:19450898..19452802_INT
TAACA	TGTAA	13:26158929..26163373_INT
CCACA	TGTAG	5:23026949..23031309_INT
CTACA	TGTAA	24:2312688..2318290_INT
CTTCA	TGTTG	2:11742344..11751068_INT
TAACA	TGTCA	8:15321620..15325915_INT
CAACA	TGAAG	6:13246814..13254187_INT
TGACA	TGTAA	5:23115747..23119460_INT
CTACA	TGTAG	15:14781989..14788545_INT
CCACA	TGTAG	6:32507841..32512069_INT
CAACA	TGTTA	13:23069820..23074981_INT
TCACA	TGTAG	15:14833322..14837442_INT
CAACA	TGTAG	3:3059958..3064368_INT
TTACA	TGTAA	6:11708287..11712617_INT
TGACA	TGTAA	12:19731181..19732808_INT
TCACA	TGTAA	11:13889865..13896492_INT
TTACA	TGTAA	14:15100784..15107507_INT
TGACA	TGTAA	4:16952220..16954047_INT

Best,
Jui-Hung

@oushujun
Copy link
Owner

Hi Jui-Hung,

Sorry for the delayed response. "_INT" sequences are internal sequences of LTR retrotransposons. TSDs are found flanking LTR elements, so there won't be TSDs flanking "_INT" sequences. You will see "_INT" are flanked by CA and TG dinucleotides, becasuse they are motifs of the LTR regions. An intact LTR element looks like:
TSD-TG...(LTR)...CA----INT----TG...(LTR)...CA-TSD.

Let me know if you have more questions!
Shujun

@ala98412
Copy link
Author

Hi Oushujun,

Thank you for your reply.

I’m interested in studying target sites, similar to the research in this study (https://academic.oup.com/plcell/article/15/8/1771/6010085).
Could I directly use the TSD positions from the pass list to search for patterns in the adjacent sequences?

Thank you.

Best,
Jui-Hung

@oushujun
Copy link
Owner

oushujun commented Aug 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants