forked from erickrf/nlpnet
-
Notifications
You must be signed in to change notification settings - Fork 2
Sentiment specific word embeddings
attardi edited this page Dec 28, 2014
·
2 revisions
Class SentimentModel allows training word embeddings from tweets or other documents, exploiting semantic similarity for words or phrases that appear with similar polarity.
One trains such model with a command like this:
nlpnet-train.py sslm lm -w 3 -n 20 -l 0.1 -e 50 --gold tweets.tsv --data data
The full command syntax is the following:
nlpnet-train.py sslm [-h] [-w WINDOW] [-f NUM_FEATURES]
[--load_features] [--load_network] [-e ITERATIONS]
[-l LEARNING_RATE] [--lf LEARNING_RATE_FEATURES]
[--lt LEARNING_RATE_TRANSITIONS] [-a ACCURACY]
[-n HIDDEN] [-v] --gold GOLD --data DATA
[--variant VARIANT] [--dict_size DICT_SIZE]
[--ngrams NGRAMS] [--alpha ALPHA]
optional arguments:
-h, --help show this help message and exit
-w WINDOW, --window WINDOW
Size of the word window (default 5)
-f NUM_FEATURES, --num_features NUM_FEATURES
Number of features per word (default 50)
--load_features Load previously saved word type features (overrides -f
and must also load a dictionary file)
--load_network Load previously saved network
-e ITERATIONS, --epochs ITERATIONS
Number of training epochs (default 100)
-l LEARNING_RATE, --learning_rate LEARNING_RATE
Learning rate for network weights (default 0.001)
--lf LEARNING_RATE_FEATURES
Learning rate for features (default 0.01)
--lt LEARNING_RATE_TRANSITIONS
Learning rate for transitions (default 0.01)
-a ACCURACY, --accuracy ACCURACY
Desired accuracy per tag.
-n HIDDEN, --hidden HIDDEN
Number of hidden neurons (default 200)
-v, --verbose Verbose mode
--gold GOLD File with annotated data for training.
--data DATA Directory to save new models and load partially
trained ones
--variant VARIANT If "polyglot" use Polyglot case conventions; if
"senna" use SENNA conventions.
--dict_size DICT_SIZE
Size of embeddings dictionary (default 100000)
--ngrams NGRAMS Length of ngrams to consider (default 1)
--alpha ALPHA Weight of syntactic loss (default 0.5)
To tag a document, consisting of one token per line with sentences separated by an empty line, use:
nlpnet-tag.py pos data
where data is the directory containing the trained model.