This project is besed on the paper "Distributed Representations of Words and Phrases and their Compositionality" by Tomas Mikolov et al.
-
Create a virtual environment using anaconda(install anaconda2 if you do not have it installed)
$conda create -n <env-name> python=2
-
Activate the virtual environment
$source activate <env-name>
-
Install the required package
$conda install --file requirements.txt
-
Install the corpus using nltk download
$ipython >>>import nltk >>>nltk.download()
-
Run the scipt word2vec.py to find the word representations
$python word2vec.py
-
The word representations are stored as dictionary where each key-value pair is a word(string) and its vector representation (numpy arrray) which is stored as a pickle file.