Skip to content

Sentiment specific word embeddings

attardi edited this page Dec 28, 2014 · 2 revisions

Class SentimentModel allows training word embeddings from tweets or other documents, exploiting semantic similarity for words or phrases that appear with similar polarity.

One trains such model with a command like this:

nlpnet-train.py sslm lm -w 3 -n 20 -l 0.1 -e 50 --gold tweets.tsv --data data

The full command syntax is the following:

nlpnet-train.py sslm [-h] [-w WINDOW] [-f NUM_FEATURES]
                        [--load_features] [--load_network] [-e ITERATIONS]
                        [-l LEARNING_RATE] [--lf LEARNING_RATE_FEATURES]
                        [--lt LEARNING_RATE_TRANSITIONS] [-a ACCURACY]
                        [-n HIDDEN] [-v] --gold GOLD --data DATA
                        [--variant VARIANT] [--dict_size DICT_SIZE]
                        [--ngrams NGRAMS] [--alpha ALPHA]

optional arguments:
  -h, --help            show this help message and exit
  -w WINDOW, --window WINDOW
                    Size of the word window (default 5)
  -f NUM_FEATURES, --num_features NUM_FEATURES
                    Number of features per word (default 50)
  --load_features       Load previously saved word type features (overrides -f
                    and must also load a dictionary file)
  --load_network        Load previously saved network
  -e ITERATIONS, --epochs ITERATIONS
                    Number of training epochs (default 100)
  -l LEARNING_RATE, --learning_rate LEARNING_RATE
                    Learning rate for network weights (default 0.001)
  --lf LEARNING_RATE_FEATURES
                    Learning rate for features (default 0.01)
  --lt LEARNING_RATE_TRANSITIONS
                    Learning rate for transitions (default 0.01)
  -a ACCURACY, --accuracy ACCURACY
                    Desired accuracy per tag.
  -n HIDDEN, --hidden HIDDEN
                    Number of hidden neurons (default 200)
  -v, --verbose         Verbose mode
  --gold GOLD           File with annotated data for training.
  --data DATA           Directory to save new models and load partially
                    trained ones
  --variant VARIANT     If "polyglot" use Polyglot case conventions; if
                    "senna" use SENNA conventions.
  --dict_size DICT_SIZE
                    Size of embeddings dictionary (default 100000)
  --ngrams NGRAMS       Length of ngrams to consider (default 1)
  --alpha ALPHA         Weight of syntactic loss (default 0.5)

To tag a document, consisting of one token per line with sentences separated by an empty line, use:

nlpnet-tag.py pos data

where data is the directory containing the trained model.

Clone this wiki locally