Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normal Sample #33

Open
LucaMannino opened this issue Aug 21, 2024 · 3 comments
Open

Normal Sample #33

LucaMannino opened this issue Aug 21, 2024 · 3 comments

Comments

@LucaMannino
Copy link

Hello,

I have a question regarding the use of normal samples in ScanNeo2.

According to the information provided in the data section of the wiki, it states:
"In addition, normal allows to specify normal samples but is not used currently. Multiple normal samples can be separated by spaces."

How does the software ensure that the identified mutations are somatic if variant calling for normal DNA sequencing is not being performed?

I recently completed a test run with my data, and it appears that no output was generated for the normal DNA data. I suspect that it's either not using normal dna data or I may have incorrectly configured the config file. Below is how I filled in the config file in the data section, could you please confirm if this is correct?

data:
name: bN4
dnaseq:
dna_tumor: Path/to/File/bN4DNA_EKDN230058818-1A_H2CMHDSXC_L3_1.fq.gz Path/to/File/bN4DNA_EKDN230058818-1A_H2CMHDSXC_L3_2.fq.gz
rnaseq:
rna_tumor: Path/to/File/bN4_1.fastq.gz Path/to/File/bN4_2.fastq.gz
normal:
dna_normal1: Path/to/File/bN4DNAbl_EKDN230058820-1A_222T7MLT4_L1_1.fq.gz Path/to/File/bN4DNAbl_EKDN230058820-1A_222T7MLT4_L1_2.fq.gz
dna_normal2: Path/to/File/bN4DNAbl_EKDN230058820-1A_H27YKDSXC_L4_1.fq.gz Path/to/File/bN4DNAbl_EKDN230058820-1A_H27YKDSXC_L4_2.fq.gz

@riasc
Copy link
Collaborator

riasc commented Aug 22, 2024

Hi,
So, when no normal sample is provided, we depend a bit on the output of the used tools. For example, GATK has the tumor-only mode, and we use filtermutectcalls, which include a Panel of Normals to remove false positives.

In any case it should generate an output (but also SNVs/indels - as the other sources are generated from transcriptomic data). Your config looks good. The only thing is that you could specify normal: dna_normal1 dna_normal2 to tell it that these are the normal samples. These are for example excluded in the genotyping.

However, it should print an output. Does it not generate any output? Like the alignment for example?
Also what mode have you specified on the indel module?

mode: BOTH # DNA, RNA or BOTH -

This needs to be set of BOTH or DNA, otherwise it will only be activate for the RNA samples.

@LucaMannino
Copy link
Author

Hi,
thank you for the prompt reply.
On line 75 of config.yaml I have selected mode: BOTH
It does generate an output but only for the tumor samples it doesn't include any alignment analysis for the normal samples, it looks as it is currently not using the normal sample to filter the the germline mutations but only the Panel of Normals to remove false positives.

/dnaseq/reads$ ls
dna_tumor_preproc_failed.fq.gz dna_tumor_preproc_report.json dna_tumor_R1_preproc_unpaired.fq.gz dna_tumor_R2_preproc_unpaired.fq.gz
dna_tumor_preproc_report.html dna_tumor_R1_preproc.fq.gz dna_tumor_R2_preproc.fq.gz

dnaseq/align$ ls
dna_tumor_aligned_BWA.bam dna_tumor_final_BWA.bam dna_tumor_final_BWA.bam.bai dna_tumor_final_BWA_split

could it be that I need to rewrite the data portion of the config.yaml file to:

data:
name: bN4
dnaseq:
dna_tumor: Path/to/File/bN4DNA_EKDN230058818-1A_H2CMHDSXC_L3_1.fq.gz Path/to/File/bN4DNA_EKDN230058818-1A_H2CMHDSXC_L3_2.fq.gz
rnaseq:
rna_tumor: Path/to/File/bN4_1.fastq.gz Path/to/File/bN4_2.fastq.gz
normal: dna_normal1 Path/to/File/bN4DNAbl_EKDN230058820-1A_222T7MLT4_L1_1.fq.gz Path/to/File/bN4DNAbl_EKDN230058820-1A_222T7MLT4_L1_2.fq.gz dna_normal2 Path/to/File/bN4DNAbl_EKDN230058820-1A_H27YKDSXC_L4_1.fq.gz Path/to/File/bN4DNAbl_EKDN230058820-1A_H27YKDSXC_L4_2.fq.gz

Did I interpret the "The only thing is that you could specify normal: dna_normal1 dna_normal2 to tell it that these are the normal samples." of your reply correctly?

@riasc
Copy link
Collaborator

riasc commented Aug 22, 2024

Hi,

Ah, so I think I misread your post before. Sorry about that. In normal, you only need to provide the identifier (e.g., dna_normal) as it has been defined. So, when you define dna_normal samples you have to put them under dnaseq.

# General settings
reference:
  release: 111
  nonchr: false
threads: 30
mapq: 30  # overall required mapping quality
basequal: 20  # overall required base quality 

data:
  name:  bN4
  dnaseq: 
    dna_normal1: Path/to/File/bN4DNAbl_EKDN230058820-1A_222T7MLT4_L1_1.fq.gz  Path/to/File/bN4DNAbl_EKDN230058820-1A_222T7MLT4_L1_2.fq.gz
    dna_normal2: Path/to/File/bN4DNAbl_EKDN230058820-1A_H27YKDSXC_L4_1.fq.gz Path/to/File/bN4DNAbl_EKDN230058820-1A_H27YKDSXC_L4_2.fq.gz
    dna_tumor: Path/to/File/bN4DNA_EKDN230058818-1A_H2CMHDSXC_L3_1.fq.gz Path/to/File/bN4DNA_EKDN230058818-1A_H2CMHDSXC_L3_2.fq.gz
  rnaseq:
    rna_tumor: Path/to/File/bN4_1.fastq.gz Path/to/File/bN4_2.fastq.gz
  normal: dna_normal1 dna_normal2
  
  custom:
    variants:
    hlatyping:
      MHC-I:
      MHC-II:

For example, when we do the indel calling, ScanNeo2 searches for the keys under rnaseq/dnaseq and uses the data that is defined under them.

Like here:

group=list(config['data']['dnaseq'].keys()))

Since it was defined on the same level, it was probably missed. I will think about some routines to catch this. Thanks for the hint.

In the final prioritization file (mhc-I_neoepitopes.txt), you should also find neoantigens with the dna_normal group (in the group field). If you redo the analysis, you might need to delete some intermediate files, like in the annotation/variants directories, as Snakemake works bottom up and only checks if the file is present (regardless of how it was generated—and in your case, only done for the tumor samples).

Let me know if this helps,
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants