One of the key innovations described in the publication tmVar: A text mining approach for extracting sequence variants in biomedical literature is a method of normalizing extracted variant mentions to unique identifiers (dbSNP RSIDs). However it is unclear how this feature can be used and running the tmVar
model out of the box does not produce this behaviour. To normalize extracted variants GNormPlus
must first be run on the input data and the results of this must be fed into tmVar
.
- Download
GNormPlus
from the NCBI's website and decompess the folder. - Install
tmVar
from the NCBI's website and extract it into the same directory asGNormPlus
.
project
│
└─── gnormplus_input
└─── gnormplus_output
└─── tmvar_output
│
└───tmVar
│ │ corpus
│ │ CRF
│ ...
│
└───GNormPlus
│ Corpus
│ CRF
...
java -Xmx10G -Xms10G -jar tmVar.jar gnormplus_input gnormplus_output
java -Xmx10G -Xms10G -jar GNormPlus.jar gnormplus_output tmvar_output setup.txt
- Chih-Hsuan Wei for clarifying this process.