1.1. Baselines

On this page, we report on the performance of the FFA 'out of the box'. Evaluation takes place over a document classification task, which assesses whether our document vectors are of sufficient quality to be accurately categorised into different topics.

All results below should be regarded as 'baselines'. Further work will aim at improving them, using evolutionary algorithms over the generated fruit fly models.

The datasets

We are using two standard datasets for document classification, 20_newsgroups and Web of Science (links below). We also built our own dataset from Wikipedia pages linked to categories.

20_newsgroups: we are using http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz, which has around 18000 documents in 20 classes.
WoS: we are using the medium-size version of the dataset, which has 11967 documents devided into 35 classes.
Wikipedia: 30000 documents in 15 classes.

The model

Our aim is to evaluate the FFA's ability to generate good document vectors. Each document in the three above datasets is passed as input to the FFA, and hashed. The hashes are then used to classify documents, using a multiclass logistic regression classifier (we use the sklearn implementation, available here.)

Hyperparameter tuning

We tune the fruit fly model and classifier concurrently, using Bayesian Optimization. We show below the range of values considered for each hyperparameter.

Model hyperparameters:

Number of Kenyon Cells (Number KC): 3000-9000
Size of random projections (Proj. size): 2-10
Percentage of KCs to retain in the final hash (WTA): 2-20
Number of keywords from document (Num. top words): 10-250

Classification hyperparameters:

C: 1-100
Max. iterations: set at 2000 for 20newsgroups dataset, 50 for the other two datasets

Results

We first report below our results on the validation data, showing the 5 best sets of hyperparameters for each dataset.

20_newsgroups dataset

Score	Number KC	Proj. size	Num. top words	WTA	C parameter
0.8082	8853	10	231	15	8
0.8061	8663	5	194	11	76
0.8055	8581	9	247	19	84
0.8044	8685	6	250	11	10
0.8040	8849	4	230	16	97

Score	Number KC	Proj. size	Num. top words	WTA	C parameter
0.8094	9000	2	198	20	1
0.8080	7881	3	103	6	1
0.8070	9000	2	165	2	1
0.8053	8746	2	100	2	1
0.8053	7724	3	164	18	1

The average score for the 5 settings above on the test set is: 0.6960 0.6976

Web of Science dataset

Score	Number KC	Proj. size	Num. top words	WTA	C parameter
0.7712	8834	10	242	15	93
0.7707	8662	8	244	19	49
0.7700	8891	5	250	18	4
0.7700	8409	5	247	16	98
0.7699	8015	9	171	8	69

Score	Number KC	Proj. size	Num. top words	WTA	C parameter
0.8159	8967	10	250	20	1
0.8145	8791	8	249	12	1
0.8144	8968	6	238	9	1
0.8144	7826	10	250	20	1
0.8137	8094	5	246	13	2

The average score for the 5 settings above on the test set is: 0.7833 0.8320

Wikipedia dataset

Score	Number KC	Proj. size	Num. top words	WTA	C parameter
0.9186	8271	6	241	19	40
0.9179	8604	8	204	18	35
0.9178	8808	5	240	11	83
0.9172	8786	9	243	16	100
0.9171	8995	7	249	10	80

Score	Number KC	Proj. size	Num. top words	WTA	C parameter
0.9226	8913	6	159	11	1
0.9221	9000	2	192	20	1
0.9214	8695	3	187	14	2
0.9213	8957	2	159	20	1
0.9211	9000	3	173	3	2

The average score for the 5 settings above on the test set is: 0.9179 0.9199

Discussion

The fruit fly gives very decent performance (the test score on the 20newsgroup dataset was 0.7820 given in this paper, using a heavier architecture). Nevertheless, we expect it can be improved both in terms of raw performance and size of the model, and this is what the next steps of the project will investigate.

Notes

The wordpiece vocabulary currently provided in the repo was extracted over a particular subset of CommonCrawl. In the future, we might consider checking whether the vocabulary could be further optimised.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly