RUBER

Using pre-trained word2vec embeddings to initialize bidirectional RNN.

Steps to run:

Download dataset of choice and word2vec file of choice (convert to .txt). In my case, I cloned ParlAI and downloaded the dataset with command, and the word2vec bin file here.

python examples/display_data.py --task convai2 --datatype train

Run data_helpers.py to create queries.txt, replies.txt, vocab and embedding files for each.

python data_helpers.py

Run hybrid_evaluation.py to train model for unreferenced metric. You'll need to comment out the code block after the "Getting scores" print statement in hybrid_evaluation.py.

python hybrid_evaluation.py

Create files with sentences to score, with format some_string_queries.txt.sub, some_string_replies.txt.sub, some_string_replies.txt.true.sub. Run hybrid_evaluation.py to score these metrics. You'll need to comment out the code block after the "train" print statement in hybrid_evaluation.py.

python hybrid_evaluation.py

To create your synthesized replies, either use a dialogue generation model or scramble your replies.txt.true like so:

import random
lines = open('replies.txt').readlines()
random.shuffle(lines)
open('replies_scrambled.txt', 'w').writelines(lines)

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
results		results
README.md		README.md
adem_data_helpers.py		adem_data_helpers.py
create_persona_validation_set.py		create_persona_validation_set.py
data_helpers.py		data_helpers.py
environment.yml		environment.yml
evaluate.sh		evaluate.sh
hybrid_evaluation.py		hybrid_evaluation.py
hyperparameter_search.sh		hyperparameter_search.sh
referenced_metric.py		referenced_metric.py
unreferenced_metric.py		unreferenced_metric.py

Provide feedback