Error during quantification of FASTQ files #118

dBenedek · 2021-05-06T11:15:03Z

Hello,

I generated the Whippet index file:

julia bin/whippet-index.jl --fasta data/genomes/fasta/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --gtf data/transcriptomes/gencode/gencode.v34/gencode.v34.annotation.gtf.gz --index data/whippet_index

And then approached to run the quantification:

julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz -x /home/bd1/research_mds/data/whippet_index/whippet.jls -o test --biascorrect

The quantification step reports the following error message:

Whippet v1.6.1 loading... 
  Activating environment at `~/tools/Whippet.jl/Project.toml`
 14.281455 seconds.
Loading splice graph index... /home/bd1/research_mds/data/whippet_index/whippet.jls
  5.462022 seconds (6.04 M allocations: 1.040 GiB, 23.73% gc time)
Processing reads from file...
FASTQ_1: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
FASTQ_2: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
ERROR: LoadError: Cannot encode 78 to BioSequences.DNAAlphabet{2}()
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] throw_encode_error(A::BioSequences.DNAAlphabet{2}, src::Vector{UInt8}, soff::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:216
  [3] encode_chunk
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:228 [inlined]
  [4] copyto!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}, doff::Int64, src::Vector{UInt8}, soff::Int64, N::Int64, #unused#::BioSequences.AsciiAlphabet)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:368
  [5] copyto!
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:292 [inlined]
  [6] BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}(src::Vector{UInt8}, startpos::Int64, stoppos::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/constructors.jl:49
  [7] BioSequence
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/types.jl:74 [inlined]
  [8] fill!(rec::Whippet.FASTQRecord, offset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/record.jl:14
  [9] process_paired_reads!(fwd_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, rev_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, param::AlignParam, lib::GraphLib, quant::GraphLibQuant{SGAlignPaired, JointBiasCounter}, multi::MultiMapping{SGAlignPaired, JointBiasCounter}, mod::JointBiasMod; bufsize::Int64, sam::Bool, qualoffset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/reads.jl:103
 [10] macro expansion
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5 [inlined]
 [11] main()
    @ Main ~/tools/Whippet.jl/bin/whippet-quant.jl:143
 [12] top-level scope
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5
in expression starting at /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl:185

Is this error related to the different format of my FASTQ files?

My FASTQ files look like this:

@SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
GTCTTTGGTCTTTTTGACTAAACCTCTTTTATAACATGTTCAATAAAAAGCTGAACTGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCGTGCTAA
+SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
@C@DFFFFDFHHHJJIGIGHJ>GHHGIIII<FEIGIJIGIIIIJJJIJE<H@FHIIHIDACGIJJJDCDBDDDBB<BDDDDDDDDD###############
@SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
TAATTTTTCTTTTCGTATTTTTTTAGAGATGGGATTTTTCCATATTGCTCAGTGTGGTCTTAAACTCCTGAGCTCAGGCAATCCACCTGCCTTGGCCTCTC
+SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
CCCFFFFFHHHHHJJHIJJJJJJIIJIJIJJJJHIJJJJJJJJJJJIJJJJJIIIIJHHIJJJJJJHHHHHFFFFFEEEDEDDDDDDDDDDDDCDDDDDDD

Thanks,
Benedek

The text was updated successfully, but these errors were encountered:

timbitz · 2021-05-06T13:42:20Z

Hi @dBenedek-- Yes Whippet only accepts as input the standard four-line FASTQ file (described here: https://support.illumina.com/bulletins/2016/04/fastq-files-explained.html). The extra name id after the '+' is the problem. This is a standard expectation, but I'll update the documentation to reflect this.

dBenedek · 2021-05-06T13:44:29Z

Thanks for the quick reply!

MiqG · 2021-12-24T01:38:30Z

Thanks for posting the issue @dBenedek, I got quite lost when I got this error...

As this weird third row is quite common in my datasets, In case someone is interested, I managed to bring the third row to Illumina's standard third line and piped it into whippet-quant adding a small awk code (it replaces every 3 lines by "+").

In your example, this should work:

R1=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
R2=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl \
                <(zcat $R1 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                <(zcat $R2 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                -x /home/bd1/research_mds/data/whippet_index/whippet.jls \
                -o test \
                --biascorrect

ptnaimelmm · 2022-02-17T16:18:15Z

Thanks for posting the issue @dBenedek, I got quite lost when I got this error...

As this weird third row is quite common in my datasets, In case someone is interested, I managed to bring the third row to Illumina's standard third line and piped it into whippet-quant adding a small awk code (it replaces every 3 lines by "+").

In your example, this should work:
R1=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
R2=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl \
                <($R1 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                <($R2 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                -x /home/bd1/research_mds/data/whippet_index/whippet.jls \
                -o test \
                --biascorrect

Hi MiqG,

I tried your solution here, but failed to proceed whippet. Have you resolved it? Could you post your detailed solution here? Thanks.

Best regards

ptnaimelmm · 2022-02-17T16:19:12Z

Hello,

I generated the Whippet index file:

julia bin/whippet-index.jl --fasta data/genomes/fasta/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --gtf data/transcriptomes/gencode/gencode.v34/gencode.v34.annotation.gtf.gz --index data/whippet_index

And then approached to run the quantification:

julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz -x /home/bd1/research_mds/data/whippet_index/whippet.jls -o test --biascorrect

The quantification step reports the following error message:

Whippet v1.6.1 loading... 
  Activating environment at `~/tools/Whippet.jl/Project.toml`
 14.281455 seconds.
Loading splice graph index... /home/bd1/research_mds/data/whippet_index/whippet.jls
  5.462022 seconds (6.04 M allocations: 1.040 GiB, 23.73% gc time)
Processing reads from file...
FASTQ_1: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
FASTQ_2: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
ERROR: LoadError: Cannot encode 78 to BioSequences.DNAAlphabet{2}()
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] throw_encode_error(A::BioSequences.DNAAlphabet{2}, src::Vector{UInt8}, soff::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:216
  [3] encode_chunk
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:228 [inlined]
  [4] copyto!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}, doff::Int64, src::Vector{UInt8}, soff::Int64, N::Int64, #unused#::BioSequences.AsciiAlphabet)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:368
  [5] copyto!
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:292 [inlined]
  [6] BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}(src::Vector{UInt8}, startpos::Int64, stoppos::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/constructors.jl:49
  [7] BioSequence
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/types.jl:74 [inlined]
  [8] fill!(rec::Whippet.FASTQRecord, offset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/record.jl:14
  [9] process_paired_reads!(fwd_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, rev_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, param::AlignParam, lib::GraphLib, quant::GraphLibQuant{SGAlignPaired, JointBiasCounter}, multi::MultiMapping{SGAlignPaired, JointBiasCounter}, mod::JointBiasMod; bufsize::Int64, sam::Bool, qualoffset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/reads.jl:103
 [10] macro expansion
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5 [inlined]
 [11] main()
    @ Main ~/tools/Whippet.jl/bin/whippet-quant.jl:143
 [12] top-level scope
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5
in expression starting at /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl:185

Is this error related to the different format of my FASTQ files?

My FASTQ files look like this:

@SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
GTCTTTGGTCTTTTTGACTAAACCTCTTTTATAACATGTTCAATAAAAAGCTGAACTGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCGTGCTAA
+SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
@C@DFFFFDFHHHJJIGIGHJ>GHHGIIII<FEIGIJIGIIIIJJJIJE<H@FHIIHIDACGIJJJDCDBDDDBB<BDDDDDDDDD###############
@SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
TAATTTTTCTTTTCGTATTTTTTTAGAGATGGGATTTTTCCATATTGCTCAGTGTGGTCTTAAACTCCTGAGCTCAGGCAATCCACCTGCCTTGGCCTCTC
+SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
CCCFFFFFHHHHHJJHIJJJJJJIIJIJIJJJJHIJJJJJJJJJJJIJJJJJIIIIJHHIJJJJJJHHHHHFFFFFEEEDEDDDDDDDDDDDDCDDDDDDD

Thanks, Benedek

Hi Benedek,

Have you resolved this issue? Thanks.

Best regards,

Mingming

ptnaimelmm · 2022-02-17T19:25:34Z

Hi @dBenedek-- Yes Whippet only accepts as input the standard four-line FASTQ file (described here: https://support.illumina.com/bulletins/2016/04/fastq-files-explained.html). The extra name id after the '+' is the problem. This is a standard expectation, but I'll update the documentation to reflect this.

Hi,

Do you have any suggestions on how to remove redundant information in the third line in the Fastq files?

Best regards,

ML

dBenedek · 2022-02-17T19:37:20Z

Hello,
I generated the Whippet index file:

julia bin/whippet-index.jl --fasta data/genomes/fasta/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --gtf data/transcriptomes/gencode/gencode.v34/gencode.v34.annotation.gtf.gz --index data/whippet_index

And then approached to run the quantification:

julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz -x /home/bd1/research_mds/data/whippet_index/whippet.jls -o test --biascorrect

The quantification step reports the following error message:

Whippet v1.6.1 loading... 
  Activating environment at `~/tools/Whippet.jl/Project.toml`
 14.281455 seconds.
Loading splice graph index... /home/bd1/research_mds/data/whippet_index/whippet.jls
  5.462022 seconds (6.04 M allocations: 1.040 GiB, 23.73% gc time)
Processing reads from file...
FASTQ_1: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
FASTQ_2: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
ERROR: LoadError: Cannot encode 78 to BioSequences.DNAAlphabet{2}()
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] throw_encode_error(A::BioSequences.DNAAlphabet{2}, src::Vector{UInt8}, soff::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:216
  [3] encode_chunk
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:228 [inlined]
  [4] copyto!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}, doff::Int64, src::Vector{UInt8}, soff::Int64, N::Int64, #unused#::BioSequences.AsciiAlphabet)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:368
  [5] copyto!
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:292 [inlined]
  [6] BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}(src::Vector{UInt8}, startpos::Int64, stoppos::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/constructors.jl:49
  [7] BioSequence
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/types.jl:74 [inlined]
  [8] fill!(rec::Whippet.FASTQRecord, offset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/record.jl:14
  [9] process_paired_reads!(fwd_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, rev_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, param::AlignParam, lib::GraphLib, quant::GraphLibQuant{SGAlignPaired, JointBiasCounter}, multi::MultiMapping{SGAlignPaired, JointBiasCounter}, mod::JointBiasMod; bufsize::Int64, sam::Bool, qualoffset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/reads.jl:103
 [10] macro expansion
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5 [inlined]
 [11] main()
    @ Main ~/tools/Whippet.jl/bin/whippet-quant.jl:143
 [12] top-level scope
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5
in expression starting at /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl:185

Is this error related to the different format of my FASTQ files?
My FASTQ files look like this:

@SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
GTCTTTGGTCTTTTTGACTAAACCTCTTTTATAACATGTTCAATAAAAAGCTGAACTGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCGTGCTAA
+SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
@C@DFFFFDFHHHJJIGIGHJ>GHHGIIII<FEIGIJIGIIIIJJJIJE<H@FHIIHIDACGIJJJDCDBDDDBB<BDDDDDDDDD###############
@SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
TAATTTTTCTTTTCGTATTTTTTTAGAGATGGGATTTTTCCATATTGCTCAGTGTGGTCTTAAACTCCTGAGCTCAGGCAATCCACCTGCCTTGGCCTCTC
+SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
CCCFFFFFHHHHHJJHIJJJJJJIIJIJIJJJJHIJJJJJJJJJJJIJJJJJIIIIJHHIJJJJJJHHHHHFFFFFEEEDEDDDDDDDDDDDDCDDDDDDD

Thanks, Benedek

Hi Benedek,

Have you resolved this issue? Thanks.

Best regards,

Mingming

Hello,

Yes, sorry, I forgot to answer.
But after correcting the FASTQ files everything works fine.

Best wishes,
Benedek

ptnaimelmm · 2022-02-17T19:50:12Z

Hello,
I generated the Whippet index file:

julia bin/whippet-index.jl --fasta data/genomes/fasta/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --gtf data/transcriptomes/gencode/gencode.v34/gencode.v34.annotation.gtf.gz --index data/whippet_index

And then approached to run the quantification:

julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz -x /home/bd1/research_mds/data/whippet_index/whippet.jls -o test --biascorrect

The quantification step reports the following error message:

Whippet v1.6.1 loading... 
  Activating environment at `~/tools/Whippet.jl/Project.toml`
 14.281455 seconds.
Loading splice graph index... /home/bd1/research_mds/data/whippet_index/whippet.jls
  5.462022 seconds (6.04 M allocations: 1.040 GiB, 23.73% gc time)
Processing reads from file...
FASTQ_1: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
FASTQ_2: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
ERROR: LoadError: Cannot encode 78 to BioSequences.DNAAlphabet{2}()
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] throw_encode_error(A::BioSequences.DNAAlphabet{2}, src::Vector{UInt8}, soff::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:216
  [3] encode_chunk
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:228 [inlined]
  [4] copyto!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}, doff::Int64, src::Vector{UInt8}, soff::Int64, N::Int64, #unused#::BioSequences.AsciiAlphabet)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:368
  [5] copyto!
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:292 [inlined]
  [6] BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}(src::Vector{UInt8}, startpos::Int64, stoppos::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/constructors.jl:49
  [7] BioSequence
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/types.jl:74 [inlined]
  [8] fill!(rec::Whippet.FASTQRecord, offset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/record.jl:14
  [9] process_paired_reads!(fwd_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, rev_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, param::AlignParam, lib::GraphLib, quant::GraphLibQuant{SGAlignPaired, JointBiasCounter}, multi::MultiMapping{SGAlignPaired, JointBiasCounter}, mod::JointBiasMod; bufsize::Int64, sam::Bool, qualoffset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/reads.jl:103
 [10] macro expansion
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5 [inlined]
 [11] main()
    @ Main ~/tools/Whippet.jl/bin/whippet-quant.jl:143
 [12] top-level scope
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5
in expression starting at /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl:185

Is this error related to the different format of my FASTQ files?
My FASTQ files look like this:

@SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
GTCTTTGGTCTTTTTGACTAAACCTCTTTTATAACATGTTCAATAAAAAGCTGAACTGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCGTGCTAA
+SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
@C@DFFFFDFHHHJJIGIGHJ>GHHGIIII<FEIGIJIGIIIIJJJIJE<H@FHIIHIDACGIJJJDCDBDDDBB<BDDDDDDDDD###############
@SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
TAATTTTTCTTTTCGTATTTTTTTAGAGATGGGATTTTTCCATATTGCTCAGTGTGGTCTTAAACTCCTGAGCTCAGGCAATCCACCTGCCTTGGCCTCTC
+SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
CCCFFFFFHHHHHJJHIJJJJJJIIJIJIJJJJHIJJJJJJJJJJJIJJJJJIIIIJHHIJJJJJJHHHHHFFFFFEEEDEDDDDDDDDDDDDCDDDDDDD

Thanks, Benedek

Hi Benedek,
Have you resolved this issue? Thanks.
Best regards,
Mingming

Hello,

Yes, sorry, I forgot to answer. But after correcting the FASTQ files everything works fine.

Best wishes, Benedek

Hi Benedek,

Would you please share me how to correct the FASTQ files to remove the redundant information in the third line? I got the same problem, but failed to find a solution to modify the FASTQ files. Thanks so much in advance.

Best regards,

Mingming

dBenedek · 2022-02-17T19:52:47Z

Hello,
I generated the Whippet index file:

julia bin/whippet-index.jl --fasta data/genomes/fasta/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --gtf data/transcriptomes/gencode/gencode.v34/gencode.v34.annotation.gtf.gz --index data/whippet_index

And then approached to run the quantification:

julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz -x /home/bd1/research_mds/data/whippet_index/whippet.jls -o test --biascorrect

The quantification step reports the following error message:

Whippet v1.6.1 loading... 
  Activating environment at `~/tools/Whippet.jl/Project.toml`
 14.281455 seconds.
Loading splice graph index... /home/bd1/research_mds/data/whippet_index/whippet.jls
  5.462022 seconds (6.04 M allocations: 1.040 GiB, 23.73% gc time)
Processing reads from file...
FASTQ_1: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
FASTQ_2: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
ERROR: LoadError: Cannot encode 78 to BioSequences.DNAAlphabet{2}()
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] throw_encode_error(A::BioSequences.DNAAlphabet{2}, src::Vector{UInt8}, soff::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:216
  [3] encode_chunk
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:228 [inlined]
  [4] copyto!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}, doff::Int64, src::Vector{UInt8}, soff::Int64, N::Int64, #unused#::BioSequences.AsciiAlphabet)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:368
  [5] copyto!
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:292 [inlined]
  [6] BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}(src::Vector{UInt8}, startpos::Int64, stoppos::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/constructors.jl:49
  [7] BioSequence
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/types.jl:74 [inlined]
  [8] fill!(rec::Whippet.FASTQRecord, offset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/record.jl:14
  [9] process_paired_reads!(fwd_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, rev_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, param::AlignParam, lib::GraphLib, quant::GraphLibQuant{SGAlignPaired, JointBiasCounter}, multi::MultiMapping{SGAlignPaired, JointBiasCounter}, mod::JointBiasMod; bufsize::Int64, sam::Bool, qualoffset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/reads.jl:103
 [10] macro expansion
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5 [inlined]
 [11] main()
    @ Main ~/tools/Whippet.jl/bin/whippet-quant.jl:143
 [12] top-level scope
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5
in expression starting at /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl:185

Is this error related to the different format of my FASTQ files?
My FASTQ files look like this:

@SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
GTCTTTGGTCTTTTTGACTAAACCTCTTTTATAACATGTTCAATAAAAAGCTGAACTGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCGTGCTAA
+SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
@C@DFFFFDFHHHJJIGIGHJ>GHHGIIII<FEIGIJIGIIIIJJJIJE<H@FHIIHIDACGIJJJDCDBDDDBB<BDDDDDDDDD###############
@SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
TAATTTTTCTTTTCGTATTTTTTTAGAGATGGGATTTTTCCATATTGCTCAGTGTGGTCTTAAACTCCTGAGCTCAGGCAATCCACCTGCCTTGGCCTCTC
+SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
CCCFFFFFHHHHHJJHIJJJJJJIIJIJIJJJJHIJJJJJJJJJJJIJJJJJIIIIJHHIJJJJJJHHHHHFFFFFEEEDEDDDDDDDDDDDDCDDDDDDD

Thanks, Benedek

Hi Benedek,
Have you resolved this issue? Thanks.
Best regards,
Mingming

Hello,
Yes, sorry, I forgot to answer. But after correcting the FASTQ files everything works fine.
Best wishes, Benedek

Hi Benedek,

Would you please share me how to correct the FASTQ files to remove the redundant information in the third line? I got the same problem, but failed to find a solution to modify the FASTQ files. Thanks so much in advance.

Best regards,

Mingming

Use sed.
Something like this:

xargs -P14 -I % sh -c 'zcat fastq/%_1.fastq.gz | sed "s/^+SRR.*/+/g" >fastq_whippet/%_1.fastq && gzip fastq_whippet/%_1.fastq;'

and the same for the _2 files.

MiqG · 2022-02-17T22:55:47Z

Thanks for posting the issue @dBenedek, I got quite lost when I got this error...
As this weird third row is quite common in my datasets, In case someone is interested, I managed to bring the third row to Illumina's standard third line and piped it into whippet-quant adding a small awk code (it replaces every 3 lines by "+").
In your example, this should work:
R1=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
R2=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl \
                <($R1 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                <($R2 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                -x /home/bd1/research_mds/data/whippet_index/whippet.jls \
                -o test \
                --biascorrect
Hi MiqG,

I tried your solution here, but failed to proceed whippet. Have you resolved it? Could you post your detailed solution here? Thanks.

Best regards

Hi, sorry I forgot adding the "zcat" command to start reading and piping the fastq files. I have edited my previous answer to avoid it. Now the code looks like this:

R1=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
R2=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl \
                <(zcat $R1 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                <(zcat $R2 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                -x /home/bd1/research_mds/data/whippet_index/whippet.jls \
                -o test \
                --biascorrect

I hope it works for you now.

Cheers!

ptnaimelmm · 2022-02-18T22:46:38Z

Thanks for posting the issue @dBenedek, I got quite lost when I got this error...
As this weird third row is quite common in my datasets, In case someone is interested, I managed to bring the third row to Illumina's standard third line and piped it into whippet-quant adding a small awk code (it replaces every 3 lines by "+").
In your example, this should work:
R1=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
R2=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl \
                <($R1 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                <($R2 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                -x /home/bd1/research_mds/data/whippet_index/whippet.jls \
                -o test \
                --biascorrect
Hi MiqG,
I tried your solution here, but failed to proceed whippet. Have you resolved it? Could you post your detailed solution here? Thanks.
Best regards
Hi, sorry I forgot adding the "zcat" command to start reading and piping the fastq files. I have edited my previous answer to avoid it. Now the code looks like this:
R1=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
R2=/home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl \
                <(zcat $R1 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                <(zcat $R2 | awk -v count="0" '++count==3{{$0="+";count=-1}} 1') \
                -x /home/bd1/research_mds/data/whippet_index/whippet.jls \
                -o test \
                --biascorrect
I hope it works for you now.

Cheers!

Thanks @MiqG for your help. It works successfully.

Best regards

ptnaimelmm · 2022-02-18T22:47:25Z

Thanks @dBenedek for your help.

dBenedek closed this as completed May 6, 2021

timbitz mentioned this issue Oct 12, 2022

Issue with BioSequences #135

Closed

timbitz reopened this Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error during quantification of FASTQ files #118

Error during quantification of FASTQ files #118

dBenedek commented May 6, 2021

timbitz commented May 6, 2021

dBenedek commented May 6, 2021

MiqG commented Dec 24, 2021 •

edited

Loading

ptnaimelmm commented Feb 17, 2022

ptnaimelmm commented Feb 17, 2022

ptnaimelmm commented Feb 17, 2022

dBenedek commented Feb 17, 2022

ptnaimelmm commented Feb 17, 2022

dBenedek commented Feb 17, 2022

MiqG commented Feb 17, 2022

ptnaimelmm commented Feb 18, 2022

ptnaimelmm commented Feb 18, 2022

Error during quantification of FASTQ files #118

Error during quantification of FASTQ files #118

Comments

dBenedek commented May 6, 2021

timbitz commented May 6, 2021

dBenedek commented May 6, 2021

MiqG commented Dec 24, 2021 • edited Loading

ptnaimelmm commented Feb 17, 2022

ptnaimelmm commented Feb 17, 2022

ptnaimelmm commented Feb 17, 2022

dBenedek commented Feb 17, 2022

ptnaimelmm commented Feb 17, 2022

dBenedek commented Feb 17, 2022

MiqG commented Feb 17, 2022

ptnaimelmm commented Feb 18, 2022

ptnaimelmm commented Feb 18, 2022

MiqG commented Dec 24, 2021 •

edited

Loading