Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

direct RNA-Seq with 3' end customed barcodes were trimed automatically #1212

Open
YOUZhen93 opened this issue Jan 9, 2025 · 4 comments
Open
Labels
barcode Issues related to barcoding

Comments

@YOUZhen93
Copy link

Hi, we are trying to add customed barcode sequences right next to RT primer for our dRNA-Seq library with RNA004 kit. The barcode sequence is directly ligated with RT primer sequences.
here's the structure of our sequence:
-------reads-------AAAAAA--barcodes--primer--adapter

We using Dorado for basecalling, adapters, primers, and the barcodes are all trimmed and Dorado documentation says this procedure is mandatory for cDNA and RNA libraries. The code I used for basecalling are:
dorado basecaller -x auto --no-trim --barcode-arrangement bc-arrangement.toml --barcode-sequences bc.fa [email protected] ./pod5/ > test.bam this doesn't work cuz the barcodes are all trimmed and no any barcode information in the output bam file
below is my arrangement file:
name = "custom_barcode"
kit = "BC"

mask1_front = "CACA"
mask1_rear = "TCTT"
mask2_front = "ACAG"
mask2_rear = "TCGA"

Barcode sequences

barcode1_pattern = "BC%02i"
barcode2_pattern = "BC%02i"
first_index = 1
last_index = 96

Scoring options

[scoring]
min_soft_barcode_threshold = 0.2
min_hard_barcode_threshold = 0.2
min_soft_flank_threshold = 0.3
min_hard_flank_threshold = 0.3
min_barcode_score_dist = 0.1

mask sequences are the flanking four bases of my barcode sequence since I have long enough barcode.
below is my bacorde sequence file:

BC01
CACAxxxxxxxxxxxTCTT
BC02
ACAGxxxxxxxxxxxTCGA

I was expected to see demultiplexed reads separated by the two barcode sequences at the 3' end of the reads;
But after basecalling, barcodes are trimmed and sequences are not classified by the barcode information I provided.
Does Dorado support demultiplexing on 3' end barcodes only? and why Dorado trim the barcode sequences as well instead of trimming primer and sequencing adapter only? Dose Dorado detect the poly A signals and trim the rest sequences after polyA tail?

Dorado version: 0.7.0+71cc744
Thanks!

@malton-ont
Copy link
Collaborator

Hi @YOUZhen93,

For custom barcodes only at the 3' end, you need to include the setting:

[arrangement]
rear_only_barcodes = true

in the arrangement toml file.

Are your barcode/primer bases RNA or DNA? For RNA basecalling, dorado automatically trims any DNA signal at the 3' end since the RNA basecall model is highly unlikely to give accurate basecalls on DNA.

@malton-ont malton-ont added the barcode Issues related to barcoding label Jan 9, 2025
@YOUZhen93
Copy link
Author

thanks @malton-ont ! yes we did use DNA oligos for barcoding. Is it possible to baseball with RNA model and DNA model separately to get the RNA sequence and the DNA barcode sequence? if yes, do you have any recommended DNA model to do this?

@malton-ont
Copy link
Collaborator

Hi @YOUZhen93,

This isn't supported, and I doubt it would work correctly given that the DNA models are trained specifically for the corresponding pore and chemistry.

@patbohn
Copy link

patbohn commented Jan 9, 2025

Hi @YOUZhen93 , this is unfortunately a bit more involved than simply basecalling it with different basecallers (which doesn't work as there's no DNA model for RNA pores) but you can check out ADAPTed (made by my amazing colleague Wiep van der Toorn) to extract the DNA portion of your signal (including the variable signal of your barcodes), and then you can explore some sort of clustering/classification based on the raw signal. This is also how we pre-process the data for our dRNA multiplexing method WarpDemuX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
barcode Issues related to barcoding
Projects
None yet
Development

No branches or pull requests

3 participants