You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been running fastp as part of a larger third-party pipeline (i.e. not written or maintained by me), and noticed that it was specifying adapter sequences multiple times on the command line:
But I found that in some cases my read lengths were now different - sometimes only r1 was affected, sometimes only r2, sometimes both. The adapter sequences being specified don't even appear in the fastqs in this case, so I expected them to have no effect.
Steps to reproduce:
# GiaB test data
wget https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/131219_D00360_005_BH814YADXX/Project_RM8398/Sample_U0a/U0a_CGATGT_L001_R{1,2}_001.fastq.gz
# fastp 0.23.4
wget http://opengene.org/fastp/fastp.0.23.4
chmod u+x fastp.0.23.4
ln -s fastp.0.23.4 fastp
# proof that the adapter sequences are absent in the fastqs - so surely should have no effect?forfin U0a_CGATGT_L001_R*;doecho$f;forain CTGTCTCTTATACACATCT AGATGTGTATAAGAGACAG;do zcat $f| grep -c $a;done;done# subset to a minimal example of 3 reads known to be affected
zcat U0a_CGATGT_L001_R1_001.fastq.gz | grep -E '^@HWI-D00360:5:H814YADXX:1:1101:(3756:2236|7206:2194|5147:4880)' -A 3 --no-group-separator | head -n 12 | gzip -c > minimal_r1.fastq.gz
zcat U0a_CGATGT_L001_R2_001.fastq.gz | grep -E '^@HWI-D00360:5:H814YADXX:1:1101:(3756:2236|7206:2194|5147:4880)' -A 3 --no-group-separator | head -n 12 | gzip -c > minimal_r2.fastq.gz
# run fastp with/without duplicated --adapter_sequence args
fastp -i minimal_r1.fastq.gz -I minimal_r2.fastq.gz -o r1_trimmed.fastq.gz -O r2_trimmed.fastq.gz
--thread 8 \
--adapter_sequence CTGTCTCTTATACACATCT \
--adapter_sequence AGATGTGTATAAGAGACAG \
--adapter_sequence AGATGTGTATAAGAGACAG \
--adapter_sequence CTGTCTCTTATACACATCT
fastp -i minimal_r1.fastq.gz -I minimal_r2.fastq.gz -o r1_trimmed_nodup.fastq.gz -O r2_trimmed_nodup.fastq.gz
--thread 8 \
--adapter_sequence CTGTCTCTTATACACATCT \
--adapter_sequence AGATGTGTATAAGAGACAG
The above example consists of three reads, which were each affected in the same way in both the minimal fastqs above and the full size ones:
I've been running fastp as part of a larger third-party pipeline (i.e. not written or maintained by me), and noticed that it was specifying adapter sequences multiple times on the command line:
I tried seeing what fastp would do without the duplicate arguments, expecting to get the same results:
But I found that in some cases my read lengths were now different - sometimes only r1 was affected, sometimes only r2, sometimes both. The adapter sequences being specified don't even appear in the fastqs in this case, so I expected them to have no effect.
Steps to reproduce:
The above example consists of three reads, which were each affected in the same way in both the minimal fastqs above and the full size ones:
Do you know what could be causing this? Is it an expected use-case to specify the same adapter sequence multiple times?
The text was updated successfully, but these errors were encountered: