Sentiment specific word embeddings

Class SentimentModel allows training word embeddings from tweets or other documents, exploiting semantic similarity for words or phrases that appear with similar polarity.

One trains such model with a command like this:

nlpnet-train.py sslm lm -w 3 -n 20 -l 0.1 -e 50 --gold tweets.tsv --data data

The full command syntax is the following:

nlpnet-train.py sslm [-h] [-w WINDOW] [-f NUM_FEATURES]
                        [--load_features] [--load_network] [-e ITERATIONS]
                        [-l LEARNING_RATE] [--lf LEARNING_RATE_FEATURES]
                        [--lt LEARNING_RATE_TRANSITIONS] [-a ACCURACY]
                        [-n HIDDEN] [-v] --gold GOLD --data DATA
                        [--variant VARIANT] [--dict_size DICT_SIZE]
                        [--ngrams NGRAMS] [--alpha ALPHA]

optional arguments:
  -h, --help            show this help message and exit
  -w WINDOW, --window WINDOW
                    Size of the word window (default 5)
  -f NUM_FEATURES, --num_features NUM_FEATURES
                    Number of features per word (default 50)
  --load_features       Load previously saved word type features (overrides -f
                    and must also load a dictionary file)
  --load_network        Load previously saved network
  -e ITERATIONS, --epochs ITERATIONS
                    Number of training epochs (default 100)
  -l LEARNING_RATE, --learning_rate LEARNING_RATE
                    Learning rate for network weights (default 0.001)
  --lf LEARNING_RATE_FEATURES
                    Learning rate for features (default 0.01)
  --lt LEARNING_RATE_TRANSITIONS
                    Learning rate for transitions (default 0.01)
  -a ACCURACY, --accuracy ACCURACY
                    Desired accuracy per tag.
  -n HIDDEN, --hidden HIDDEN
                    Number of hidden neurons (default 200)
  -v, --verbose         Verbose mode
  --gold GOLD           File with annotated data for training.
  --data DATA           Directory to save new models and load partially
                    trained ones
  --variant VARIANT     If "polyglot" use Polyglot case conventions; if
                    "senna" use SENNA conventions.
  --dict_size DICT_SIZE
                    Size of embeddings dictionary (default 100000)
  --ngrams NGRAMS       Length of ngrams to consider (default 1)
  --alpha ALPHA         Weight of syntactic loss (default 0.5)

To tag a document, consisting of one token per line with sentences separated by an empty line, use:

nlpnet-tag.py pos data

where data is the directory containing the trained model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentiment specific word embeddings

Clone this wiki locally