Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error running in paired end mode #6

Open
chklopp opened this issue Dec 14, 2020 · 2 comments
Open

error running in paired end mode #6

chklopp opened this issue Dec 14, 2020 · 2 comments

Comments

@chklopp
Copy link

chklopp commented Dec 14, 2020

I'm running fabfos on one of my samples

python ~/src/FabFos-master/src/FabFos.py -r BifidoMiSeq.S09/fwd -2 BifidoMiSeq.S09/rev -m BifidoMiSeq.S09_miffed.csv -b backbone.fasta -p pe -t F --force -f ./BifidoMiSeq.S09
FabFos version 1.9
Software versions used:
blastn 2.2.31+
bwa 0.7.12-r1039
makeblastdb 2.2.31+
samtools 1.11
spades v3.14.1
trimmomatic.jar 0.39
Output directory for BifidoMiSeq.S09 already exists with files. Overwrite? [y|n]y
Overwriting BifidoMiSeq.S09
Processing 'BifidoMiSeq.S09'. Outputs will written to ./BifidoMiSeq.S09/BifidoMiSeq_S09/BifidoMiSeq.S09/
Finding number of reads in fastq files... done.
Number of raw reads = 269000
Filtering off-target reads...
Unable to find BWA index for backbone.fasta. Indexing now... done.
Resuming alignment... done.
Converting BAM file to FastQ format... done.
Finding number of reads in fastq files... done.
53490 reads removed by filtering background (19.8848%).
Trimming reads... done.
Finding number of reads in fastq files... Traceback (most recent call last):
File "/home/cklopp/src/FabFos-master/src/FabFos.py", line 2409, in
fabfos_main(sys.argv[1:])
File "/home/cklopp/src/FabFos-master/src/FabFos.py", line 2381, in fabfos_main
sample.qc_reads(args.background, args.parity, fos_father.adaptor_trim, executables, args.threads)
File "/home/cklopp/src/FabFos-master/src/FabFos.py", line 284, in qc_reads
self.read_stats.num_reads_assembled = find_num_reads(trimmed_reads)
File "/home/cklopp/src/FabFos-master/src/FabFos.py", line 1526, in find_num_reads
fq = pyfastx.Fastq(file_name=fastq, build_index=False)
RuntimeError: ./BifidoMiSeq.S09/BifidoMiSeq_S09/BifidoMiSeq.S09//BifidoMiSeq.S09_trim_se.2.fq is not plain or gzip compressed fastq formatted file

The problems seems to come from a check made on single end files (se) during a paired end processing.

Here are the files created
-rw-r--r-- 1 cklopp miat 121936302 déc. 14 16:17 BifidoMiSeq.S09.sam
-rw-r--r-- 1 cklopp miat 21914558 déc. 14 16:17 BifidoMiSeq.S09_sorted.bam
-rw-r--r-- 1 cklopp miat 63 déc. 14 16:17 samtools.stdout
drwxr-xr-x 2 cklopp miat 4096 déc. 14 16:17 filtered
-rw-r--r-- 1 cklopp miat 0 déc. 14 16:17 BifidoMiSeq.S09_trim_se.2.fq
-rw-r--r-- 1 cklopp miat 418 déc. 14 16:17 BifidoMiSeq.S09_trim_se.1.fq
-rw-r--r-- 1 cklopp miat 45101276 déc. 14 16:17 BifidoMiSeq.S09_trim_pe.2.fq
-rw-r--r-- 1 cklopp miat 45103398 déc. 14 16:17 BifidoMiSeq.S09_trim_pe.1.fq
-rw-r--r-- 1 cklopp miat 1785 déc. 14 16:17 BifidoMiSeq.S09_trim_stdout.txt

The second single end file (2) is emtpy and therefore the script does not succeed to count reads.

How can I bypass this?

@chklopp
Copy link
Author

chklopp commented Dec 14, 2020

The same process with an interleaved fastq file stops with an other error still due to an empty fastq file.

python ~/src/FabFos-master/src/FabFos.py -r BifidoMiSeq.S09/interleave/ -m BifidoMiSeq.S09_miffed.csv -b backbone.fasta -p pe -t F --force -f ./BifidoMiSeq.S09 FabFos version 1.9
Software versions used:
blastn 2.2.31+
bwa 0.7.12-r1039
makeblastdb 2.2.31+
samtools 1.11
spades v3.14.1
trimmomatic.jar 0.39
Output directory for BifidoMiSeq.S09 already exists with files. Overwrite? [y|n]y
Overwriting BifidoMiSeq.S09
Processing 'BifidoMiSeq.S09'. Outputs will written to ./BifidoMiSeq.S09/BifidoMiSeq_S09/BifidoMiSeq.S09/
Finding number of reads in fastq files... done.
Number of raw reads = 269000
Filtering off-target reads... done.
Converting BAM file to FastQ format... done.
Finding number of reads in fastq files... Traceback (most recent call last):
File "/home/cklopp/src/FabFos-master/src/FabFos.py", line 2409, in
fabfos_main(sys.argv[1:])
File "/home/cklopp/src/FabFos-master/src/FabFos.py", line 2381, in fabfos_main
sample.qc_reads(args.background, args.parity, fos_father.adaptor_trim, executables, args.threads)
File "/home/cklopp/src/FabFos-master/src/FabFos.py", line 274, in qc_reads
self.read_stats.num_filtered_reads = find_num_reads(filtered_reads)
File "/home/cklopp/src/FabFos-master/src/FabFos.py", line 1526, in find_num_reads
fq = pyfastx.Fastq(file_name=fastq, build_index=False)
RuntimeError: ./BifidoMiSeq.S09/BifidoMiSeq_S09/BifidoMiSeq.S09/filtered/BifidoMiSeq.S09.1.fastq is not plain or gzip compressed fastq formatted file

@chklopp
Copy link
Author

chklopp commented Dec 14, 2020

I've modified the find_num_reads function to check only non empty file

def find_num_reads(file_list: list) -> int:
"""
Function to count the number of reads in all FASTQ files in file_list

:param file_list: A list of FASTQ files
:return: integer representing the number of reads in all FASTQ files provided
"""
logging.info("Finding number of reads in fastq files... ")
num_reads = 0
for fastq in file_list:
    if not fastq:
        continue 
    if os.stat(fastq).st_size != 0 : 
        fq = pyfastx.Fastq(file_name=fastq, build_index=False)
        for _ in fq:
            num_reads += 1
    else :
        logging.info("file "+fastq+" is empty.\n")
logging.info("done.\n")
return num_reads

Fabfos runs to its end now

Tony-xy-Liu added a commit that referenced this issue Jun 21, 2024
Conda channel name change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant