There are a variety of informative tutorials for vg
such as (Main Wiki and Workshop in Portuguese). However when we do not know right vg subcommand and options for what we want to do, it is not easy to find them from these tutorials. To address this, we have organized command line examples of vg in temrs of what we want to do.
The version is v1.9.0 "Miglionico"
. The static binary and Docker image are available from here.
vg construct -r ref.fa -v valiant.vcf > graph.vg
vg msga -f multi.fa > graph.vg
vg view graph.vg > graph.gfa # Convert vg to GFA
vg view -j graph.vg > convert graph.json # vg to JSON
- By converting into JSON format, it can be visualized the genome graph by using a genome graph browser such as MoMIG. - Example:MoMIG - Note: path information is mandately.
vg view -Fv hoge.gfa > graph.vg
- [A bug in GFA parser with overlap with v1.9.0 was fixed] (vgteam/vg#1765)
- When using assembly graph for downstream analysis, this
- If you use
minia
as an assembler, you can use here
grep -v ^P assembly_graph_with_scaffolds.gfa | vg view -Fv - | vg mod -X 1000 - > graph.vg
- Above example is confirmed by the SPAdes v3.11.1
vg view -d graph.vg | dot -Tpng -o vis.png # Converting vg format into dot format
vg view -dnp graph.vg | dot -Tpng -o vis.png # Highliting each path on the vg graph
vg index -x index.xg graph.vg
vg view -a mapped.gam > mapped.json
vg stats -l graph.vg
vg stats -z grpah.vg
vg view graph.vg | grep ^P | wc -l
vg find -n 10 -P chr1 -x index.xg # Where is the coodinate of the node ID 10 on the path chr 1.
vg mod - X 1000 graph.vg > graph.1000.vg # Divinding each node into the 1000 bases or less.
vg mod -u graph.vg > merged.graph.vg
vg mod -c graph.vg > fixed.graph.vg
vg find -n 5 -c 10 -x index.xg> node5.dis10.vg # Extract the graph from the node whose ID is 5 to the node whose distance is 10
Extracting a graph consisting of nodes whose distances of bases from a user-specified node are less than N
vg find -n 5 -c 10 -L -x index.xg > node5.dis10.vg # Extracting the graph consisting of nodes whose distances of bases from the node 5 are less than 10(bp)
Extracting a graph consisting of nodes whose number is less than or equal to N from the specified path, e
vg find -n 5 -c 10 -p chr 1:50000-55520 -x index.xg > chr1:50000-55520.vg # chr1:50000-55520 and the nodes that are away from it by 10 Extract graph of
vg ids -j 1.vg 2.vg # Aligning node IDs of 1.vg and 2.vg
cat 1.vg 2.vg > merged.vg
vg augment -a direct grpah.vg aln.gam > aug.vg
- From v1.10.0 onwards, Is the default of the option
-a
direct
instead ofpileup
? → Reference - Unlike
vg mod -i
, it does not put path information. For the difference between these two, please refer here
vg index -g index.gcsa -k 16 -b . graph.vg # Option -b specifies the directory where the temporary file is to be placed
# When memory consumption is too large,
vg prune graph.vg > prune.vg # Firstly simplifying the graph
vg index -g index.gcsa -k 16 -b . prune.vg # Then the foregoing commnad can be executed with less memory
rm prune.vg
#It is assumed that xg and gcsa files exist
vg map -x index.xg -g index.gcsa -t 1 -f 1 fq -f 2.fq > mapped.gam
vg pack -x index.xg -g mapped.gam -d > coverage.tsv
# If you want something like pileup, set the -e flag
vg pack -x index.xg -g mapped.gam -d -e > coverage.edit.tsv
vg view -a mapped.gam | jq - cr 'select (.score > 0)' | vg view - aJG - > filtered.gam
vg view - a mapped.gam | jq - cr 'select (.identity> = 0.95)' | vg view - aJG - > filtered.id95.gam
vg stats -a mapped.gam graph.vg
- It is not used for calculation, but position argument is necessary
vg surject -x index.xg -t 1 -b mapped.gam > mapped.bam
#You can extract only the mappings for the path specified with -p option
vg surject -x index.xg -t 1 -s -p chr1 mapped.gam > mapped.sam
vg inject -x index.xg -t 1 mapped.sam > mapped.gam
To put the gene annotation as a path on the vg graph, first you need to convert the gene annotation into an alignment for the genome graph, then create a path and merge it into the vg graph.
vg annotate -b input.bed -x index.xg > annotation.gam
vg annotate -g input.gff -x index.xg > annotation.gam
vg mod -P --include-aln annotation.gam graph.vg > mod.vg
# If you want to split the node at the breaks of the annotation, remove -P
# If you want to align node splittings with annotations, remove the -P option
vg mod --include-aln annotation.gam graph.vg > mod.vg
Please be aware that there are uncertain points at present
vg snarls -m 1000 -r list.st graph.vg > snarls.pb
vg view -E list.st | jq '.visit[1: -1][].node_id | select (.! = null) | tonumber' | sort -n | uniq > node_list_in_ultra_bubble.txt
# Showing nodes in core regions (i.e. hub structures in a graph)
vg view graph.vg | grep ^S | cut -f 2 | grep -vwf node_list_in_ultra_bubble.txt > node_list_of_core_region.txt
- A Snarl is a generalization of the superbubble which is a subgraph of a genome graph. For the definition of terms, please refer to Paten et al.
- Note: there is an inconsistency (as of Aug 27, 2018) that the
-m
option ofvm snarls
only compute traversals for snarls with<=
N nodes in the help message, but<
according to the SourceCode
- Story of valiant call