Run SRILM with different options to find the best language model(LM) given the training and test data.
Directory Structure: src/ Scripts run/ Experiment Folder /demo Sample experiment
How to run?
- Set SRILM path in run/Makefile
- Copy your LM training file and name it as train.tok.gz
- Copy your test file and name it as test.tok.gz (this file is used to calculate perplexity)
- Edit the vocabulary threshold according to your needs
- run cd run/demo && make ppl.out
- To run on 10 CPU run cd run/demo && make ppl.out NCPU=10
- Each line of ppl.out will corresponds to an LM and its perplexity ppl.out format: training-set test-set vocabulary-thr ngram discount option prob ppl ppl2 time
- Each LM has its own folder in the experiment folder (i.e., run/demo/).
- In order to run experiments on different datasets create a folder under run and copy the Makefile in demo to this new directory and perform the above steps
Supported SRILM options:
- An experiment will run the following discountings: ndiscount wbdiscount ukndiscount kndiscount
- All discountings will be run with/without interpolation
- All experiments will run for 4 and 5 grams
- If you want to edit these settings please edit the src/lm-args.pl