-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how is calculate valid_coverage ? Different results between total read or subsample reads #292
Comments
Hello @DelphIONe, The |
It's not clear for me sorry. |
Hello @DelphIONe,
Yes this is correct. My point in the previous comment is that if you estimate the pass threshold from the first set (0% modification) you may see a different value than if you estimate the threshold from the second set.
With the default settings, Modkit will estimate a pass threshold for which the 10% lowest confidence base modification calls are discarded. This is what I meant my "dynamic threshold estimation", that the threshold value is estimated from the input data. There are more details here. It does not use the quality of the reads or the mapping. To your original question, when you subsample the reads, it is possible that you could have higher |
I have mapped reads on my genome. I used modkit pileup and dmr pair using the IVT sample. If I sub-sample these reads and re-run modkit pileup and then dmr I get strange valid_coverage. For example, for one position, the valid_coverage on my sub-sample of reads is higher than the same position in the result obtained by modkit dmr with the total reads (with the same IVT reads). How is this possible? How is valid_coverage calculated?
Thanks a lot for your help,
The text was updated successfully, but these errors were encountered: