-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible to adjust quality score median/mean of simulated reads? #22
Comments
This is a great question. Are you looking to increase the quality or decrease the quality? The quality scores (and identity scores that correlate to quality scores) can be adjusted to a certain degree by tuning the following parameters
If you look at the squigulator paper, there are some plots showing how increasing the dwell-std reduced the quality and reducing it increased the quality. For amp-noise, there is a sweet spot that gives the best quality and going away from this makes the quality low. Note that the paper has benchmarks from R9, but should equally apply to R10. Also, keep in mind that the best quality is capped by the quality of the pore model. That means if the pore model is of high quality and accurately represents the real data, we can get simulated reads as close as the real data. For R9 this is the case, but not so for R10 at the moment. |
I was looking into increasing the quality, but perhaps the issue is as you pointed out the quality of the R10 pore model, and further, maybe it has something to do with the random 7Kbp sequence I chose to simulate the reads from. I tried making my own Pore model by the following: count 9-mers from reads using BBMap/BBTools kmercountexact.sh kmercountexact.sh k=9 threads=54 -Xmx300g in=~/hapmers/WGS_HG002_EZ1_25kb.pod5.bam out=~/hapmers/WGS_HG002_EZ1_25kb.pod5.bam.9mers.txt Convert from the format cat ~/hapmers/WGS_HG002_EZ1_25kb.pod5.bam.9mers.txt|paste - - |perl -pe "s/>(\d+)\t(\w*)/\2\t\1/g" |sort > ~/hapmers/WGS_HG002_EZ1_25kb.pod5.bam.9mers2.txt Turns out from my human WGS data, there are quite a few missing potential 9-mers (I only have 131072 9-mers out of a possible 262144), and your k-mer model input requires all possible 4-base 9-mers, so I used column1 (i.e., the k-mers) from https://raw.githubusercontent.com/nanoporetech/kmer_models/refs/heads/master/dna_r10.4.1_e8.2_400bps/9mer_levels_v1.txt wget https://raw.githubusercontent.com/nanoporetech/kmer_models/refs/heads/master/dna_r10.4.1_e8.2_400bps/9mer_levels_v1.txt
cut -f 1 9mer_levels_v1.txt > 9-mers.txt and then the following julialang script to get fill in the table. using DataFrames, CSV
# Read the input files
test = DataFrame(CSV.File("/home/jelber43/genomics_server/hapmers/WGS_HG002_EZ1_25kb.pod5.bam.9mers2.txt", header=false))
test2 = DataFrame(CSV.File("/home/jelber43/genomics_server/hapmers/9-mers.txt", header=false))
# Create test3 DataFrame
test3 = DataFrame("k-mer" => test2[:, 1], "number" => fill(0, nrow(test2)))
# Update test3 with values from test
for row in eachrow(test)
idx = findfirst(==(row[1]), test3[!, "k-mer"])
if !isnothing(idx)
test3[idx, "number"] = row[2]
end
end
# Sort test3 by k-mer
test4 = sort(test3, "k-mer")
# Write the output file
CSV.write("/home/jelber43/genomics_server/hapmers/WGS_HG002_EZ1_25kb.pod5.bam.9mers3.txt", test4, delim='\t', header=false) Python script to get 9-mers with counts transformed by a z-score import pandas as pd
import numpy as np
# Read the input file
input_file = '~/hapmers/WGS_HG002_EZ1_25kb.pod5.bam.9mers3.txt'
kmer_df = pd.read_csv(input_file, sep='\t', header=None, names=['kmer', 'count'])
# Calculate mean and standard deviation
mean_count = kmer_df['count'].mean()
std_count = kmer_df['count'].std()
# Calculate expected level in standard units
kmer_df['expected_level'] = (kmer_df['count'] - mean_count) / std_count
# Create result table
result_table = kmer_df[['kmer', 'expected_level']].reset_index(drop=True)
# Write output file without header
output_file = '~/hapmers/WGS_HG002_EZ1_25kb.pod5.bam.9mers4.txt'
result_table.to_csv(output_file, sep='\t', header=False, index=False) Edit output file so that the header has #k\t9 and that there is a fake level standard deviation of 0 for all k-mers. ## format is:
k-mer\tz-score\t0
## with the following code cat <(echo -e "#k\t9") <(perl -pe "s/(\w+)\t(.+)/\1\t\2\t0/g" ~/hapmers/WGS_HG002_EZ1_25kb.pod5.bam.9mers4.txt) > ~/hapmers/WGS_HG002_EZ1_25kb.pod5.bam.9mers5.txt
cd ~/pixi/blue-crab Simulate the reads with squigulator and the new k-mer model ~/bin/squigulator-v0.4.0/squigulator --sample-rate 5000.0 --ont-friendly=yes --seed 1 --kmer-model ~/hapmers/WGS_HG002_EZ1_25kb.pod5.bam.9mers5.txt -o ~/pixi/blue-crab/test2.blow5 -t 8 -f 10 <(echo -e ">1\nTACAACATGCACATAACTTTTCCATACATAATAAAATGATGCAAAGAAGATGTACATTCTAACCATAGCTGAAATCGGGGCCATTTGTACAAGATTAATTATTAACCACATAAGCCAAGAATTACTAATAAAATGTACTGCAAAATAGCTGAAAAACAATTGCATGATTGCAGCCAATCCAAGTACATAGAAAAACCTAGTGAAAAGAATATATGCCAAAAACCACTCTGCAACTAAGCCAAAAGCAGTTAAATCCCATTTAAAAGATGAAATGGTAATTTGTATAGTTTCTAAAGAAGGATAGGTGTCTAAAGAATCTAAACCACTAAGACAAACACTACAAGGTATAGAACCAGTACAGTAGGTTGCAATAGTGACATTAGTAGAGTTCAAATAGCCTTCTCTGTAACCAGTACAGTAAGAAGGCATGCCTAAATTAGACATTAAAACACCTAAAGCAGCGGTTGAGTAGATTAAAGAACCTAGGCAAACACTTAATAGTAAAAACCAAATTATAATATTTATCAGTTTAGAAAAATTAGGTGACTTCAAATAATTAAATGAAGCCTCTAGACAAAATTTACCGACACTCTTAACAGTATTCTTTGCTATAGTAGTCGGCATAGATGCTTTAATTCTAGAATTTGTACTTCTAGTAAAAGTACACAATTGTAGCAATAAAGTAAAGAAATAAGGCATATAATTAGTACAAACACGGTTTAAACACCGTGTAACTATGTTAGTAGTTGTACTAACAACTTTGTTAAGAAAAGGCTTAGCATAATTAGCTATAGTATCCCAAGGGACACTATTAACAGCAGCTAAACCATGAGTAGCAAGGGTTTTCAAACCTAATACTCTAGATAATTCATTAGGTTTCTTAATAGTAAGACTAGAATTGTCTACATAAGCAGCCATTAGATCTGTGTGGCCAACCTCTTCTGTAATTTTTAAACTATTATTTGCTGGTTTAAGTATAATGTCTCCTACAACTTCGGTAGTTTTCACATTACACTCAAGAACGTCTTTCTGTATGGTAGGATTTTCCACTACTTCTTCAGAGACTGGTTTTAGATCTTCGCAGGCAAGATTATCCATTCCCTGCGCGTCCTCTGACTTCAGTACATCAAACGAATTTGATGTTTCAACTGGTTTTGTGCTCCAAAGACAACGTATACACCAGGTATTTGGTTTATACGTGGCTTTATTAGTTGCATTGTTAACATGCCAAACAATAGGTTTATGTAACAATTTAGCTCCTTTCTTAAAAGAGGGTGTGTAGTGTTTATAATCAATAGCCACCACATCACCATTTAAGTCAGGGAAAAATGTAACTTTAAGCTCTCTTGAAGCAGGTTTCTTATAACCAGTTAACTGGTTTAAATCATCAGCAAATTTGATATTATCACATACAAACTTAAAATTATCGAAGCTTGCGTTTGGATATGGTTGGTTTGGTACAAGATCAATTGGTTGCTCTGTGAAATAAGAATTGTCTTTCTTATAATAATTGTCCAACTTAGGGTCAATTTCTGTACAAACAACACCATCCAATTTATAAGTAACTGGTTTTATGGTTGTTGTGTAACTGTTTTCTTTGTAGAAAACATCCGTAATAGGACCTTTGTATTCTGAGGACTTTGTAAGTAAAGCACCGTCTATGCAATACAAAGTTTCTTTAGAAGTTATATGTTTATAGTGACCACACTGGTAATTACCAGTGTACTCACTAGCACAAGTAAATGTACCATGCTTAAGTTCATACTGAGCAGGTGGTGCTGACATCATAACAAAAGGTGACTCCTGTTGTACTAGATATTTTGTAGCTTGTTTACCACACGTACAAGGTATCTGAACACCTTTCTTAAATTGTTCATAAGAAAGTGTGCCCATGTACATAACAGCTTCTACACCCTTAAGGGTTGTCTGCTGTTGTCCACAAGTTTTACACACCACGTTCAAGACTCTTTTGCAAGAATCTAAATTGGCATGTTGAAACAAGTAACTCATTGTTTCTCTAACATCACCTAACTCACCTACTGTCTTATTACAGTAGGCTAAGATAAGTGCACAAAAGTTAGCAGCTTCACCAGCCCTTGCTCTGTAATAAGCATCTTGTAGAGCAGGTGGATTAAACTTCAACTCTATTTGTTGGAGTGTTAACAATGCAGTGGCAAGATAACAGTTGTTATCTGCCCATTTAATAGAAGTTAAACCATTAACTTGTGGGTATTTCCACTTTTTAGTGTGATTTAATGCTGACATGTACCTACCCAGAAAACTAGGATCAGTTGTGTGGTAGTACTCAAAAGCCTCAACACGTAGAGTGTCATCATTAGGTAAAACATAAAATGTTTTACCTTCATGTGAATTATGAGGTTTTATTTTAGTAACATCAGCTCCATCCAAATAAGTTGGACCAAACTGTTGTCCATATGTCATTGACATGTCCACAACTTGCGTGTGGAGGTTAATGTTGTCTACTGTTGTAAACACCTTAATAGTCCTCACTTCTCTCAAAGAAAGAAGTGTCTTAAGATTGTCAAAGGTGATAACTTCACCATCTAGGTGGAATGTGGTAGGATTACTAGTGTAATATACACTTTTATCACCTCTCTTAAGAAATTCTATACCTAGTTGTGTAGATTGTCCAGAATAGGACCAATCTTTATAGGAACCAGCAAGTGAGATGGTTTCAATAAAATGTTCTTCAGGTGTTTTAGAAGAAGAAGTAAGATAACCATTATACGCTGTAACAGCATCAGGTGAAGAAACAGAAACTGTAGCTGGCACTTTGAGAGATCTCATATACCGAGCAGCTTCTTCCAAATTTAAGCCATGTGTTACATAGCCAAGTGGCATTGTAACAAGAGTTTCATTTAGATCGTTAAGTGTGTTGATAAGTGACGCTACAGTTGTTTTACTGGTGTAAAAGTAAAATCTAGCACCATAATCAACCACACCCTCTTGTATTTTAATACCCTTATATTTACGCTGTATAGTTGAAACTATGGCTTTAGTTTCCACACAGACAGGCATTAATTTGCGTGTTTCTTCTGCATGTGCAAGCATTTCTCGCAAATTCCAAGAAACAGTTCCAAGAATTTCTTGCTTCTCATTAGAGATAATAGATGGTAGAATGTAAAAGGCACTTTTACACTTTTTAAGCACTGTCTTTGCCTCCTCTACAGTGTAACCATTTAAACCCTGACCCGGGTAAGTGGTTATATAATTGTCTGTTGGCACTTTTCTCAAAGCTTTCGCTAGCATTTCAGTAGTGCCACCAGCCTTTTTAGTAGGTATAACCACAGCAGTTAAAACACCCTCTTGAACAACATCACCCACTATATATGGAGCATCTTTCTTTAAGAAAGTGATGTCAATGTCACTAACAAGAGTGGCAGAATCTGGATGAAGATTGCCATTAATGTCAATATAAAGTAACAAGTTTTCTGTGAGGAACTTAGTTTCTTCCAGAGTTGTTGTAACTTCTTCAACACAAGCTTTGATTTTCTTATCATCTTGTTTTCTCTGTTCAACTGAAGGTTTACTTTCAGTTATAAATGGCTTAACTTCCTCTTTAGGAATCTCAGCGATCTTTTGTTCAACTTGCTTTTCACTCTTCATTTCCAAAAAGCTTGAAACAAGTTTGTCATAGAGATTTTTATCAAAGACAGCTAAGTAGACATTTGTGCGAACAGTATCTACACAAACTCTTAAAGAATGTATAGGGTCAGCACCAAAAATACCAGCTGATAATAATGGTGCAAGTAGAACTTCGTGCTGATTAAAATTTTCATAAGCACTCTTAAGAAGTTGAATGTCTTCACCTTTGTTAACATTTGGGCCGACAACATGAAGACAGTGTTTAGCAAGATTGTGTCCGCTTAAAACACAACTACCACCCACTTTAAGTGGTCCATTAGTAGCTATGTAATCATCAGATTCAACTTGCATGGCATTGTTAGTAGCCTTATTTAAGGCTCCTGCAACACCTCCTCCATGTTTAAGGTAAACATTGGCTGCATTAACAACCACTGTTGGTTTTACCTTTTTAGCTTCTTCCACAATGTCTGCATTTTTAATGTATACATTGTCAGTAAGTTTTAAATAACCACTAAAACTATTCACTTCAATAGTCTGAACAACTGGTGTAAGTTCCATCTCTAATTGAGGTTGAACCTCAACAATTGTTTGAATAGTAGTTGTCTGATTGTCCTCACTGCCGTCTTGTTGACCAACAGTTTGTTGACTATCATCATCTAACCAATCTTCTTCTTGCTCTTCTTCAGGTTGAAGAGCAGCAGAAGTGGCACCAAATTCCAAAGGTTTACCTTGGTAATCATCTTCAGTACCATACTCATATTGAGTTGATGGCTCAAACTCTTCTTCTTCACAATCACCTTCTTCTTCATCCTCATCTGGAGGGTAGAAAGAACAATACATATGTGAAGCCAATTTAAACTCACCAGACTCATCAAATAAGTAGTATGTAGCCATACTCCACTCATCTAAATCAATGCCCAGTGGTGTAAGTAATTCAGATACTGGTTGCAAAGTTTTTATGACAGCATCTGCCACAACACAGGCGAACTCATTTACTTCTGTACCGAGTTCAACTGTATAGGCAGAGCACTTCTCATTAAGTACTTTATCAATCCTTTCATCAAGTTCAAAAGTGATATTCACACTCTTGTAACCTTGCACTTCTATCACAGTGTCATCACCAAAAGTAACCTTTGTTGGTGCACCGCCTTTGAGTGTGAAGGTATTGTTTGTTACCATCATATTAGGTGCAAGGGCACAGTACTTTTCTGTGTCTTTGATTTCGAGCAACATAAGCCCGTTAATACAAACTGGTGTACCAACCAATGGAGCTTCAACAGCTTCACTAGTAGGTTGTTCTAATGGTTGTAAATCACCAGTTTTCAAGACAACTTCCTCTGTTAACACTTCTGTGGGAAGTGTTTCTCCCTCTAAGAAGATAATTTCTTTTGGGGCTTTTAGAGGCATGAGTAGGCCAGTTTCTTCTCTGGATTTAACACACTTTCTGTACAATCCCTTTGAGTGCGTGACAAATGTTTCACCTAAATTCAAGGCTTTAAGTTTAGCTCCACCAATAATGATAGAGTCAGCACACAAAGCCAAAAATTTATTTACAAGCTTAAAGAATGTCTGAACACTCTCCTTAATTTCCTTTGCACAGGTGACAATTTGTCCACCGACAATTTCACAAGCACAGGTTGAGATAAATTTAACAATTTCCCAACCGTCTCTAAGAAACTCTACACCTTCCTTAAACTTCTCTTCAAGCCAATCAAGGACGGGTTTGAGTTTTTCATAAACAGTGCCAAAGATGTTAGTTAGCCACTGCGAAGTCAACTGAACAACACCACCTGTAATGTAGGCCATTACAACTAGATTGTTAGTAGCCAAATCAGATGTGAACATCATAGCATCAATGAGTCTCAGTGAATACTGTGAAATTCCATCTAGTATTGTTATAGCGGCCTTCTGTAAAACACGCACAGAATTTTGAGCAGTTTCAAGAGTGCGGGAGAAAATTGATCGTACAACACGAGCAGCCTCTGATGCAAATGCATAAAGAGGACTCAGTATTGATTTCTGTTCACCAATATTCCAGGCACCTTTTTTAGCTTTTCCTTTTGTAACTTTAAAATTACCACAGGATTCAACAATTTGTTTGAATGCTTTATAATCCAAACCTTTCACAGTTTCCACAAAAGCACTTGTGGAAGCAGAAAAAGATGCCAAAATAATGGCGATCTCTTCATTAAGTTTAAAGTCACCAACAATATTGATGTTGACTTTCTCTTTTTGGAGTATTTCAAGAAGGTTGTCATTAAGACCTTCGGAACCTTCTCCAACAACACCTGTATGGTTACAACCTATGTTAGCGCTAGCACGTGGAACCCAATAGGCACACTTGTTATGGCAACCAACATAAGAGAACACACAGCCTCCAAAGGCAATAGTGCGACCACCCTTACGAAGAATGGTTTTCAAGCCAGATTCATTATGGTATTCGGCAAGACTATGCTCAGGTCCTACTTCTGAATTGTGACATGCTGGACAATAAATTTTAACAACAGCATTTTGGGGTAAGTAACCACAAGTAGTGGCACCTTCTTTAGTCAAATTCTCAGTGCCACAAAATTCGCAAGTGGCTTTAACAAAATCGCCCGTCTGCCATGAAGTTTCACCACAATGATCACACTTCATGAGAGTTGAAAGGCACATTTGGTTGCATTCATTTGGTGACGCAACTGGATAGACAGATCGAATTCTACCCATAAAGCCATCAAGCTTTTTCTTTTCAACCCTTGGTTGAATAGTCTTGATTATGGAATTTAAGGGAAATACAAAATTTGGACATTCCCCATTGAAGGTGTCAAATTTCTTTGCCAATTTAATTTCAAAAGGTGTCTGCAATTCATAGCTCTTTTCAGAACGTTCCGTGTACCAAGCAATTTCATGCTCATGTTCACGGCAGCAGTATACACCCCTCTTAGTGTCAATAAAGTCCAGTTGTTCGGACAAAGTGCATGAAGCTTTACCAGCACGTGCTAGAAGGTCTTTAATGCACTCAAGAGGGTAGCCATCAGGGCCACAGAAGTTGTTATCGACATAGCGAGTGTATGCCCCTCCGTTAAGCTCACGCATGAGTTCACGGGTAACACCACTGCTATGTTTAGTGTTCCAGTTTTCTTGAAAATCTTCATAAGGATCAGTGCCAAGCTCGTCGCCTAAGTCAAATGACTTTAGATCGGCGCCGTAACTATGGCCACCAGCTCCTTTATTACCGTTCTTACGAAGAAGAACCTTGCGGTAAGCCACTGGTATTTCGCCCACATGAGGGACAAGGACACCAAGTGTCTCACCACTACGACCGTACTGAATGCCTTCGAGTTCTGCTACCAGCTCAACCATAACATGACCATGAGGTGCAGTTCGAGCATCCGAACGTTTGATGAACACATAGGGCTGTTCAAGTTGAGGCAAAACGCCTTTTTCAACTTCTACTAAGCCACAAGTGCCATCTTTAAGATGTTGACGTGCCTCTGATAAGACCTCCTCCACGGAGTCTCCAAAGCCACGTACGAGCACGTCGCGAACCTGTAAAACAGGCAAACTGAGTTGGACGTGTGTTTTCTCGTTGAAACCAGGGACAAGGCTCTCCATCTTACCTTTCGGTCACACCCGGACGAAACCTAGATGTGCTGATGATCGGCTGCAACACGGACGAAACCGTAAGCAGCCTGCAGAAGATAGACGAGTTACTCGTGTCCTGTCAACGACAGTAATTAGTTATTAATTATACTGCGTGAGTGCACTAAGCATGCAGCCGAGTGACAGCCACACAGATTTTAAAGTTCGTTTAGAGAACAGATCTACAAGAGATCGAAAGTTGGTTGGTTTGTTACCTGGGAAGGTATAAACCTTTAAT") Convert blow5 to pod5 pixi shell
blue-crab s2p test2.blow5 -o test2.pod5 Basecall with dorado-0.8.0 SUP ~/bin/dorado-0.8.0-linux-x64/bin/dorado download
~/bin/dorado-0.8.0-linux-x64/bin/dorado basecaller [email protected] test2.pod5 > test2.pod5.bam bam to fastq and quality check samtools fastq test2.pod5.bam > test2.fastq
~/bin/seqtk/seqtk fqchk test2.fastq|head
min_len: 1228; max_len: 3901; avg_len: 2785.86; 2 distinct quality values
POS #bases %A %C %G %T %N avgQ errQ %low %high
ALL 19501 32.0 17.6 16.5 33.9 0.0 2.6 3.0 100.0 0.0
1 7 14.3 0.0 0.0 85.7 0.0 2.6 3.0 100.0 0.0
2 7 14.3 42.9 0.0 42.9 0.0 2.7 3.0 100.0 0.0
3 7 42.9 0.0 0.0 57.1 0.0 2.7 3.0 100.0 0.0
4 7 0.0 0.0 14.3 85.7 0.0 2.6 3.0 100.0 0.0
5 7 0.0 42.9 14.3 42.9 0.0 2.6 3.0 100.0 0.0
6 7 57.1 0.0 0.0 42.9 0.0 2.6 3.0 100.0 0.0
7 7 14.3 0.0 0.0 85.7 0.0 2.6 3.0 100.0 0.0 I guess I am not surprised the results are so bad with my k-mer model with an average quality score of 2.6 as I pointed out that there were so many missing k-mers. Also I just put a fake standard deviation level of 0 for all of the z-scores that I had, so I have no idea what the values should look like. |
Hmmm... I guess you have based your k-mer models on the legacy k-mer models, and the above I wrote has gone with the assumption of the newer models that does not contain the electrical current information for a k-mer, rather the abundance of that k-mer. |
I will play around with your R10 k-mer model and stop wasting time trying to make my own as that does not seem possible. |
Hello. Sorry for taking a bit of time to respond, I was iterating through the other repositories and work. Great effort in deducing the k-mers levels. Generating nucleotide k-mer levels is a bit of a headache for a number of reasons.
I have not documented how exactly I generated the squigulator model, but I have documented it for f5c which you can see here https://hasindu2008.github.io/f5c/docs/r10train#nucleotide-model. The steps are quite related, for instance, the conversion of the ONT's score normalised value to the pico-ampere levels using a linear transform. From on top of my head (might be not 100% correct, got to see my logs).
I believe the standard deviations that I have got through the above training are not great. Also, I believe some of the ONT mean levels are not as good as they should be. @hiruna72 has been working on an independent pore model generation method:
@skovaka has also been working on an independent pore model generation method: While these methods focus on generating k-mer models for alignment, I believe the methods could be tweaked to generate k-mer models for simulation as well. While testing these approaches has been on my todo list, have not been able to find some time to go through this model improvement carefully. |
Thank you for your super detailed and helpful response! I could really only afford the limited time I spent on deducing the k-mer abundances from the reads in making a custom k-mer model. Now that I realize that I would have to take into account the actual raw signal converted to pA available from converting pod5 to s/blow5, I could not afford the time to do this entire process. We have whole-genome amplification data for human cells (I would need to double-check), so presumably that would be modification-free r10.4.1 data, but I am afraid there will be too many missing k-mers based on k-mer abundances I got from the whole-genome sequencing that I had analyzed here. Thanks for all of your help. Feel free to close this or leave it as a discussion. |
EDIT I guess it would help if I were to read the publication to see especially the figures on time and amplitude domains' effects on base identity.
Hi, thank you very much for developing this tool. This issue is perhaps an enhancement, but not necessarily as it may be tied completely to the underlying k-mer model and only changeable through making a new k-mer model (I am not sure).
So here it goes:
Run squigulator
Convert blow5 to pod5 with blue-crab
Basecall with dorado-0.8.0, but first download the SUPv5.0.0 model
BAM to FASTQ
samtools fastq test.pod5.bam > test.fastq
Use seqtk fastq check to estimate the average quality score
It appears as though I am getting average quality scores of 16.3 or so based on running
seqtk fqchk
.My question is whether it is possible to adjust this mean (or if the median might be adjustable) or whether that can only happen based on the k-mer values provided by Oxford Nanopore themselves? Thus, if I would like to do this, I would somehow need to make a new k-mer frequency file based on the data quality that I desire. Is it as simple as counting the 9-mers from an exemplary data set?
The text was updated successfully, but these errors were encountered: