Skip to content

1.1. Baselines

tlmnhut edited this page Nov 1, 2021 · 23 revisions

On this page, we report on the performance of the FFA 'out of the box'. Evaluation takes place over a document classification task, which assesses whether our document vectors are of sufficient quality to be accurately categorised into different topics.

All results below should be regarded as 'baselines'. Further work will aim at improving them, using evolutionary algorithms over the generated fruit fly models.

The datasets

We are using two standard datasets for document classification, 20_newsgroups and Web of Science (links below). We also built our own dataset from Wikipedia pages linked to categories.

The model

Our aim is to evaluate the FFA's ability to generate good document vectors. Each document in the three above datasets is passed as input to the FFA, and hashed. The hashes are then used to classify documents, using a multiclass logistic regression classifier (we use the sklearn implementation, available here.)

Hyperparameter tuning

We tune the fruit fly model and classifier concurrently, using Bayesian Optimization. We show below the range of values considered for each hyperparameter.

Model hyperparameters:

  • Number of Kenyon Cells (Number KC): 3000-9000
  • Size of random projections (Proj. size): 2-10
  • Percentage of KCs to retain in the final hash (WTA): 2-20
  • Number of keywords from document (Num. top words): 10-250

Classification hyperparameters:

  • C: 1-100
  • Max. iterations: set at 2000 for 20newsgroups dataset, 50 for the other two datasets

Results

We first report below our results on the validation data, showing the 5 best sets of hyperparameters for each dataset.

20_newsgroups dataset

Score Number KC Proj. size Num. top words WTA C parameter
0.8082 8853 10 231 15 8
0.8061 8663 5 194 11 76
0.8055 8581 9 247 19 84
0.8044 8685 6 250 11 10
0.8040 8849 4 230 16 97
Score Number KC Proj. size Num. top words WTA C parameter
0.8094 9000 2 198 20 1
0.8080 7881 3 103 6 1
0.8070 9000 2 165 2 1
0.8053 8746 2 100 2 1
0.8053 7724 3 164 18 1

The average score for the 5 settings above on the test set is: 0.6960 0.6976

Web of Science dataset

Score Number KC Proj. size Num. top words WTA C parameter
0.7712 8834 10 242 15 93
0.7707 8662 8 244 19 49
0.7700 8891 5 250 18 4
0.7700 8409 5 247 16 98
0.7699 8015 9 171 8 69
Score Number KC Proj. size Num. top words WTA C parameter
0.8159 8967 10 250 20 1
0.8145 8791 8 249 12 1
0.8144 8968 6 238 9 1
0.8144 7826 10 250 20 1
0.8137 8094 5 246 13 2

The average score for the 5 settings above on the test set is: 0.7833 0.8320

Wikipedia dataset

Score Number KC Proj. size Num. top words WTA C parameter
0.9186 8271 6 241 19 40
0.9179 8604 8 204 18 35
0.9178 8808 5 240 11 83
0.9172 8786 9 243 16 100
0.9171 8995 7 249 10 80
Score Number KC Proj. size Num. top words WTA C parameter
0.9226 8913 6 159 11 1
0.9221 9000 2 192 20 1
0.9214 8695 3 187 14 2
0.9213 8957 2 159 20 1
0.9211 9000 3 173 3 2

The average score for the 5 settings above on the test set is: 0.9179 0.9199

Discussion

The fruit fly gives very decent performance (the test score on the 20newsgroup dataset was 0.7820 given in this paper, using a heavier architecture). Nevertheless, we expect it can be improved both in terms of raw performance and size of the model, and this is what the next steps of the project will investigate.

Notes

The wordpiece vocabulary currently provided in the repo was extracted over a particular subset of CommonCrawl. In the future, we might consider checking whether the vocabulary could be further optimised.