-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finding microdeletions #36
Comments
Hi Greeshma, I'm not sure what size exactly the micro deletions you are looking for would be, WISECONDOR was originally written to target fairly lengthy but barely deviating CNVs. I have found the latest version was able to find a CNV between 3 and 4 mb but I'd be hesitant to just take such short CNV results as truth without further testing. To answer your questions directly:
A z-score above 5.08 means a duplication, a negative z-score beyond -5.08 will mean a deletion. Anything in between -5.08 and 5.08 will be considered unaffected.
That's the effect size. It's the determined % of copy number change for that particular region. If it says 100 it found twice as much DNA fragments as expected, 5 means it found 5% more DNA fragments.
That is correct, it is
Those were implemented in an older version of WISECONDOR (as described in the paper). If you wish to use that version you can find it in the legacy branch: If you really aim to find small CNVs perhaps the input data is a bit limiting, it seems you have ~4 million reads, I'd suggest trying something over ~10 million and using a fairly large set of training samples if you are unable to find known short CNVs. Additionally, I believe this fork of WISECONDOR could be of interest to you, as it should contain several improvements over my work: Let me know if something is still unclear. |
Thank you so much Roy Straver, Thank you |
Training samples should preferably be without any CNVs. However, it's pretty much impossible to ensure that is true and if you use many reference samples (i.e. hundreds) and a few (one or two) have the same CNV, I highly doubt it's going to influence your sensitivity much (if at all) as it's not really systematic behaviour.
Anything from 5 to 20 million should be fine, no need to be very stringent unless you go to very small binsizes, you may want to ensure you have enough coverage per bin if you do.
The master branch does not use the sliding-window approach, it has been replaced by a segmentation step. Instead that step will give a stouffers z-score for a region of any possible length, making sure the z-score is the (absolute) maximum possible for that region.
It's a trade-off: Smaller means less data per bin, but more bins to use as reference bins. Surely needs more time per sample, may increase erratic behavior if low coverage, but if enough training data was available may also give good results on small CNVs. |
Hi @rstraver , |
Assuming you are talking about the effect size, that value would mean it measured 8.54% less fragments than it expected to find, which could indicate a microdeletion that is much smaller than the bin, or only is found in a subset of the cells analysed (mosaicism in our case of cell free DNA). |
For a significant microdeletion, how much should be the effect size? |
I'm afraid that is not within my knowledge, I never aimed to find microdeletions and I never tested for them. I suggest you set up some experiments to test the reliability for various thresholds for that. WISECONDOR mostly uses a z-score threshold instead of an effect size based one, as the effect size may be quite high caused by a not-so-relaible reference set, which is taken into account with the z-score. Also, you may find spikes in few or single bins that often turn out to be meaningless, so be careful on that... |
Yes. |
Hi,
I have single-end sequenced maternal blood data.
I executed WISECONDOR and got result like this,
`# BAM information: #
Reads mapped: 4021312
Reads unmapped: 24027
Reads nocoord: 24027
Reads rmdup: 546798
Reads lowqual: 132340
RETRO filtering:
Reads in: 4676255
Reads removed: 1356445
Reads out: 3319810
Z-Score checks:
Z-Score used: 5.08
AvgStdDev: 6.23%
AvgAllStdDev: 31.68%
Test results:
z-score effect mbsize location
6.19 2.07 80.75 2:34250000-115000000
-6.26 -6.52 11.25 7:52500000-63750000
5.14 6.60 7.00 7:75250000-82250000
-6.86 -3.11 49.50 19:9750000-59250000
`
Is the z-score above threshold 5.08 means a duplication and below threshold shows deletion?
Whats the effect field for ?
Is the location indicates the chromosome and start-end base positions?
I read that, there is different methods like,
Single bin, bin test
Single bin, aneuploidy test
Windowed, bin test
Windowed, aneuploidy test
Chromosome wide, aneuploidy test. How to choose these methods?
I executed the steps described in the link https://github.com/VUmcCGP/wisecondor
Please describe steps to find microdeletions along the chromosomes
looking forward to hear from you
Thanks in advance
Greeshma
The text was updated successfully, but these errors were encountered: