CfdnaPattern

Pattern Recognition for Cell-free DNA

Predict a fastq is cfdna or not

# predict a single file
python predict.py <single_fastq_file>

# predict files
python predict.py <fastq_file1> <fastq_file2> ... 

# predict files with wildcard
python predict.py *.fq

warning: this tool doesn't work for trimmed fastq

prediction output

For each file given in the command line, this tool will output a line <prediction>: <filename>, like

cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R1_001.fastq.gz
cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R2_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R1_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R2_001.fastq.gz

Add -q or --quite to enable quite output mode, in which it will only output:

a file with name of cfdna, but prediction is not-cfdna
a file without name of cfdna, but prediction is cfdna

Train a model

This tool has a pre-trained model (cfdna.model), which can be used for prediction. But you still can train a model by yourself.

prepare/link all your fastq files in some folder
for files from cfdna, include cfdna (case-insensitive) in the filename, like 20160220-cfdna-015_S15_R1_001.fq
for files from genomic DNA, include gdna (case-insensitive) in the filename, like 20160220-gdna-002_S2_R1_001.fq
for files from FFPE DNA, include ffpe (case-insensitive) in the filename, like 20160123-ffpe-040_S0_R1_001.fq
run:

python train.py /fastq_folder/*.fq

Citation

If you used CfdnaPattern for your publication, please cite: https://doi.org/10.1109/TCBB.2017.2723388

Full options:

python training.py <fastq_files> [options] 

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -m MODEL_FILE, --model=MODEL_FILE
                        specify which file to store the built model.
  -a ALGORITHM, --algorithm=ALGORITHM
                        specify which algorithm to use for classfication,
                        candidates are svm/knn/rbf/rf/gnb/benchmark, rbf means
                        svm using rbf kernel, rf means random forest, gnb
                        means Gaussian Naive Bayes, benchmark will try every
                        algorithm and plot the score figure, default is knn.
  -c CFDNA_FLAG, --cfdna_flag=CFDNA_FLAG
                        specify the filename flag of cfdna files, separated by
                        semicolon. default is: cfdna
  -o OTHER_FLAG, --other_flag=OTHER_FLAG
                        specify the filename flag of other files, separated by
                        semicolon. default is: gdna;ffpe
  -p PASSES, --passes=PASSES
                        specify how many passes to do training and validating,
                        default is 10.
  -n, --no_cache_check  if the cache file exists, use it without checking the
                        identity with input files

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cfdna.model		cfdna.model
draw.py		draw.py
fastq.py		fastq.py
feature.py		feature.py
predict.py		predict.py
skip.py		skip.py
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CfdnaPattern

Predict a fastq is cfdna or not

prediction output

Train a model

Citation

About

Releases 2

Packages

Languages

License

OpenGene/CfdnaPattern

Folders and files

Latest commit

History

Repository files navigation

CfdnaPattern

Predict a fastq is cfdna or not

prediction output

Train a model

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages