-
Notifications
You must be signed in to change notification settings - Fork 0
GENIA Sentence Splitter
License
TsujiiLaboratory/geniass
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
SS MaxEnt¤òÍѤ¤¤¿Sentence Splitter * How to use 1) make 2) ./geniass arg1 arg2 arg1 is a target file to split. arg2 is an output file name. You need to run geniass in the directory which has EventExtracter.rb, Classifying2Splitting.rb, model1-1.0. If you want to get stand-off format file, please run 3) ruby sentence2standOff.rb arg1 arg2 arg3 arg1 and arg2 are same with 2). arg3 is an output stand-off file name. ------------ SS MaxEnt This is a simple C++ class library for maximum entropy classifiers. If you are familiar with C++ and STL, you will easily understand how to use the library by having a look at the sample code. The main features of this library are: - fast parameter estimation using the BLMVM algorithm (Benson and More, 2001) - smoothing with Gausian prior (Chen and Rosenfeld, 1999) - modelling with inequality constraints (Kazama and Tsujii, 2003) - saving/loading the model to/from a file - can integrate the model data into your source code. * How to use 1) make - if you encounter errors with hash, try commenting out #define USE_HASH_MAP in "maxent.h". 2) ./a.out 3) see sample.cpp and maxent.h * Tips 1) If you have many samples for training, use a portion of the data as held-out data to see if overfitting is happening or not. ex.) model.set_heldout(1000); 2) If you see overfitting, try one of the followings: - feature cut-off ex.) model.train(3); - Gausian prior ex.) model.train(0, 1000, 0); - inequality constrains ex.) model.train(0, 0, 1.0); * I like the third one because it produces a compact model and gives equally good performance with gausian prior. 3) If you want to integrate the generated model file into your code, see model2c.cpp. * References [1] Jun'ichi Kazama and Jun'ichi Tsujii, Evaluation and Extension of Maximum Entropy Models with Inequality Constraints, In the Proceedings of EMNLP 2003, pp. 137-144. [2] Steven J. Benson and Jorge J. More, A Limited-Memory Variable-Metric Method for Bound-Constrained Minimization, Preprint ANL/MCS-P909-0901 http://www-unix.mcs.anl.gov/~benson/blmvm/ [3] Stanley F. Chen and Ronald Rosenfeld, A Gaussian Prior for Smoothing Maximum Entropy Models, Technical Report CMU-CS-99-108, Computer Science Department, Carnegie Mellon University, 1999. * History 2005 Jul. 8 version 1.2.2 - initial public release 2005 Sep. 13 version 1.3 - requires less memory in training 2005 Sep. 13 version 1.3.1 - update README 2005 Oct. 28 version 1.3.2 - fix for overflow (thanks to Ming Li) ------------------------------------------------------------------------- Yoshimasa Tsuruoka ([email protected])
About
GENIA Sentence Splitter
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published