-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
126 lines (82 loc) · 3.58 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
*********************************************************************
* A simple C++ library for maximum entropy classification *
* *
* Version 3.0 *
*********************************************************************
1.0 Overview
------------
This is a simple C++ library for maximum entropy classification
(also known as "multinomial logistic regression").
The main features of this library are:
- supporting L1/L2 regularization [1,2]
- fast parameter estimation algorithms (LBFGS [3], OWLQN [4],
and SGD [5])
- supporting real-valued features
- saving/loading the model to/from a file
2.0 How to Use
--------------
1. Compile the programs
% make
- if you encounter errors regarding hash, try commenting out
#define USE_HASH_MAP
in "maxent.h".
2. Run the examples
% ./bicycle
- this is a toy example of binary classification
% ./postagging
- a more realistic example (part-of-speech tagging)
3. Build your own program
see "bicycle.cpp" and "postagging.cpp"
3.0 Tips
--------
o Use L1 or L2 regularizer to avoid overfitting.
e.g.) model.use_l1_regularizer(1.0);
- The argument specifies the coefficient of the regularizer.
In general, it should be tuned using the development data,
but a good rule of thumb is setting it to 1.0.
- I personally prefer L1 regularization since it produces a
sparse model, i.e. the training results in a smaller number
of active features (hence you save the disk space).
o You can use a portion of the training data as held-out data
to see whether overfitting is happening.
e.g.) model.set_heldout(1000); // last 1000 samples as heldout
o Try Stochastic Gradient Descent (SGD) if you are using a very
large number of training samples. It's often much quicker than
OWLQN.
e.g.) model.use_SGD(30); // iterate 30 times
4.0 References
--------------
[1] Jun'ichi Kazama and Jun'ichi Tsujii. 2003. Evaluation and Extension
of Maximum Entropy Models with Inequality Constraints, In
Proceedings of EMNLP, pp. 137-144.
[2] Stanley F. Chen and Ronald Rosenfeld. 1999. A Gaussian Prior for
Smoothing Maximum Entropy Models, Technical Report CMU-CS-99-108,
Computer Science Department, Carnegie Mellon University.
[3] Jorge Nocedal. 1980. Updating Quasi-Newton Matrices with Limited
Storage, Mathematics of Computation, Vol. 35, No. 151, pp. 773-782.
[4] Galen Andrew and Jianfeng Gao. 2007. Scalable training of
L1-regularized log-linear models, In Proceedings of ICML.
[5] Yoshimasa Tsuruoka, Jun'ichi Tsujii, and Sophia Ananiadou. 2009.
Stochastic Gradient Descent Training for L1-regularized Log-linear
Models with Cumulative Penalty, In Proceedings of ACL-IJCNLP,
pp. 477-485
5.0 Change history
------------------
o 2005 Jul. 8 version 1.2.2
- initial public release
o 2005 Sep. 13 version 1.3
- requires less memory in training
o 2005 Sep. 13 version 1.3.1
- update README
o 2005 Oct. 28 version 1.3.2
- fix for overflow (thanks to Ming Li)
o 2005 Dec. 26 version 2.0
- support real-valued features
o 2006 Aug. 21 version 2.1.1
- reduce memory consumption in training
- speedup
o 2011 Jul. 14 version 3.0
- support OWLQN and SGD
- change interface
------------------------------------------------
Yoshimasa Tsuruoka ([email protected])