-
Notifications
You must be signed in to change notification settings - Fork 3
1.1. Baselines
On this page, we report on the performance of the FFA 'out of the box'. Evaluation takes place over a document classification task, which assesses whether our document vectors are of sufficient quality to be accurately categorised into different topics.
All results below should be regarded as 'baselines'. Further work will aim at improving them, using evolutionary algorithms over the generated fruit fly models.
We are using two standard datasets for document classification, 20_newsgroups and Web of Science (links below). We also built our own dataset from Wikipedia pages linked to categories.
- 20_newsgroups: we are using http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz, which has around 18000 documents in 20 classes.
- WoS: we are using the medium-size version of the dataset, which has 11967 documents devided into 35 classes.
- Wikipedia: 30000 documents in 15 classes.
Our aim is to evaluate the FFA's ability to generate good document vectors. Each document in the three above datasets is passed as input to the FFA, and hashed. The hashes are then used to classify documents, using a multiclass logistic regression classifier (we use the sklearn implementation, available here.)
We tune the fruit fly model and classifier concurrently, using Bayesian Optimization. We show below the range of values considered for each hyperparameter.
Model hyperparameters:
- Number of Kenyon Cells (Number KC): 3000-9000
- Size of random projections (Proj. size): 2-10
- Percentage of KCs to retain in the final hash (WTA): 2-20
- Number of keywords from document (Num. top words): 10-250
Classification hyperparameters:
- C: 1-100
- Max. iterations: set at 2000 for 20newsgroups dataset, 50 for the other two datasets
We first report below our results on the validation data, showing the 5 best sets of hyperparameters for each dataset.
Score | Number KC | Proj. size | Num. top words | WTA | C parameter |
---|---|---|---|---|---|
0.8082 | 8853 | 10 | 231 | 15 | 8 |
0.8061 | 8663 | 5 | 194 | 11 | 76 |
0.8055 | 8581 | 9 | 247 | 19 | 84 |
0.8044 | 8685 | 6 | 250 | 11 | 10 |
0.8040 | 8849 | 4 | 230 | 16 | 97 |
Score | Number KC | Proj. size | Num. top words | WTA | C parameter |
---|---|---|---|---|---|
0.8094 | 9000 | 2 | 198 | 20 | 1 |
0.8080 | 7881 | 3 | 103 | 6 | 1 |
0.8070 | 9000 | 2 | 165 | 2 | 1 |
0.8053 | 8746 | 2 | 100 | 2 | 1 |
0.8053 | 7724 | 3 | 164 | 18 | 1 |
The average score for the 5 settings above on the test set is: 0.6960 0.6976
Score | Number KC | Proj. size | Num. top words | WTA | C parameter |
---|---|---|---|---|---|
0.7712 | 8834 | 10 | 242 | 15 | 93 |
0.7707 | 8662 | 8 | 244 | 19 | 49 |
0.7700 | 8891 | 5 | 250 | 18 | 4 |
0.7700 | 8409 | 5 | 247 | 16 | 98 |
0.7699 | 8015 | 9 | 171 | 8 | 69 |
Score | Number KC | Proj. size | Num. top words | WTA | C parameter |
---|---|---|---|---|---|
0.8159 | 8967 | 10 | 250 | 20 | 1 |
0.8145 | 8791 | 8 | 249 | 12 | 1 |
0.8144 | 8968 | 6 | 238 | 9 | 1 |
0.8144 | 7826 | 10 | 250 | 20 | 1 |
0.8137 | 8094 | 5 | 246 | 13 | 2 |
The average score for the 5 settings above on the test set is: 0.7833 0.8320
Score | Number KC | Proj. size | Num. top words | WTA | C parameter |
---|---|---|---|---|---|
0.9186 | 8271 | 6 | 241 | 19 | 40 |
0.9179 | 8604 | 8 | 204 | 18 | 35 |
0.9178 | 8808 | 5 | 240 | 11 | 83 |
0.9172 | 8786 | 9 | 243 | 16 | 100 |
0.9171 | 8995 | 7 | 249 | 10 | 80 |
Score | Number KC | Proj. size | Num. top words | WTA | C parameter |
---|---|---|---|---|---|
0.9226 | 8913 | 6 | 159 | 11 | 1 |
0.9221 | 9000 | 2 | 192 | 20 | 1 |
0.9214 | 8695 | 3 | 187 | 14 | 2 |
0.9213 | 8957 | 2 | 159 | 20 | 1 |
0.9211 | 9000 | 3 | 173 | 3 | 2 |
The average score for the 5 settings above on the test set is: 0.9179 0.9199
The fruit fly gives very decent performance (the test score on the 20newsgroup dataset was 0.7820 given in this paper, using a heavier architecture). Nevertheless, we expect it can be improved both in terms of raw performance and size of the model, and this is what the next steps of the project will investigate.
The wordpiece vocabulary currently provided in the repo was extracted over a particular subset of CommonCrawl. In the future, we might consider checking whether the vocabulary could be further optimised.