Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When only SRA accessions are provided in metadata input file, what happens if both SE and PE fastq reads are downloaded for particular accession? #65

Closed
masudermann opened this issue Apr 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@masudermann
Copy link
Contributor

Description of the bug

The pipeline may already be able to handle this, but what happens if both SE and PE are available for a particular strain? In Camilo's mixed euk dataset (mixed.csv), there is an SRA accession: SRR10432277 that fits this scenario.

I am getting an error when pipeline is at the read alignment step. It is trying to use both the PE and SE reads as inputs, but then I think this cause an issue.

Command used and terminal output

# An example of the error I see:

[83/984875] NOTE: Process `PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM (GCA_031834405_1_PHW726_fox_matthiolae)` terminated with an error exit status (1) -- Execution is retried (1)
ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM (GCA_031834405_1_PHW726_fox_matthiolae)'

Caused by:
  Process `PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM (GCA_031834405_1_PHW726_fox_matthiolae)` terminated with an error exit status (1)

Command executed:

  INDEX=`find -L ./ -name "*.amb" | sed 's/\.amb$//'`
  
  bwa mem \
      -M \
      -t 16 \
      $INDEX \
      SRR10432277_1_subset.fastq.gz SRR10432277_2_subset.fastq.gz SRR10432277_subset.fastq.gz \
      | samtools view  --threads 16 -o GCA_031834405_1_PHW726_fox_matthiolae.bam -
  
  cat <<-END_VERSIONS > versions.yml
  "PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM":
      bwa: $(echo $(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*$//')
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
         -y INT        seed occurrence for the 3rd round seeding [20]
         -c INT        skip seeds with more than INT occurrences [500]
         -D FLOAT      drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50]
         -W INT        discard a chain if seeded bases shorter than INT [0]
         -m INT        perform at most INT rounds of mate rescues for each read [50]
         -S            skip mate rescue
         -P            skip pairing; mate rescue performed unless -S also in use
  
  Scoring options:
  
         -A INT        score for a sequence match, which scales options -TdBOELU unless overridden [1]
         -B INT        penalty for a mismatch [4]
         -O INT[,INT]  gap open penalties for deletions and insertions [6,6]
         -E INT[,INT]  gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1]
         -L INT[,INT]  penalty for 5'- and 3'-end clipping [5,5]
         -U INT        penalty for an unpaired read pair [17]
  
         -x STR        read type. Setting -x changes multiple parameters unless overridden [null]
                       pacbio: -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0  (PacBio reads to ref)
                       ont2d: -k14 -W20 -r10 -A1 -B1 -O1 -E1 -L0  (Oxford Nanopore 2D-reads to ref)
                       intractg: -B9 -O16 -L5  (intra-species contigs to ref)
  
  Input/output options:
  
         -p            smart pairing (ignoring in2.fq)
         -R STR        read group header line such as '@RG\tID:foo\tSM:bar' [null]
         -H STR/FILE   insert STR to header if it starts with @; or insert lines in FILE [null]
         -o FILE       sam file to output results to [stdout]
         -j            treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)
         -5            for split alignment, take the alignment with the smallest coordinate as primary
         -q            don't modify mapQ of supplementary alignments
         -K INT        process INT input bases in each batch regardless of nThreads (for reproducibility) []
  
         -v INT        verbosity level: 1=error, 2=warning, 3=message, 4+=debugging [3]
         -T INT        minimum score to output [30]
         -h INT[,INT]  if there are <INT hits with score >80% of the max score, output all in XA [5,200]
         -a            output all alignments for SE or unpaired PE
         -C            append FASTA/FASTQ comment to SAM output
         -V            output the reference FASTA header in the XR tag
         -Y            use soft clipping for supplementary alignments
         -M            mark shorter split hits as secondary
  
         -I FLOAT[,FLOAT[,INT[,INT]]]
                       specify the mean, standard deviation (10% of the mean if absent), max
                       (4 sigma from the mean if absent) and min of the insert size distribution.
                       FR orientation only. [inferred]
  
  Note: Please read the man page for detailed description of the command line and options.
  
  [main_samview] fail to read the header from "-".

Work dir:
  /home/marthasudermann/pathogensurveillance/work/a0/44cd71b6fd13a59faa6d3786d5301e

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

No response

@masudermann masudermann added the bug Something isn't working label Apr 24, 2024
@masudermann masudermann changed the title When only SRA accessions are provided, what happens if both SE and PE fastq reads are downloaded for particular accession? When only SRA accessions are provided in metadata input file, what happens if both SE and PE fastq reads are downloaded for particular accession? Apr 24, 2024
@zachary-foster
Copy link
Contributor

I ran into this error as well and fixed it by only using the paired end reads when this happens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants