To install the package simply write the following command in your favorite terminal:
$ pip install scandeval[all]
This will install all the model frameworks currently supported (pytorch
,
spacy
, and jax
). If you know you only need one of these, you can install a slimmer
package like so:
$ pip install scandeval[pytorch]
Lastly, if you are not interesting in benchmarking models, but just want to use the package to download datasets, then the following command will do the trick:
$ pip install scandeval
The easiest way to benchmark models is via the command line interface. After having installed the package, you can benchmark your favorite model like so:
$ scandeval --model_id <model_id>
Here model_id
is the HuggingFace model ID, which can be found on the
HuggingFace Hub. By default this will
benchmark the model on all the datasets eligible. If you want to benchmark on a
specific dataset, this can be done via the --dataset
flag. This will for
instance evaluate the model on the AngryTweets
dataset:
$ scandeval --model_id <model_id> --dataset angry-tweets
We can also separate by language. To benchmark all Danish models, say, this can
be done using the language
tag, like so:
$ scandeval --language da
Multiple models, datasets and/or languages can be specified by just attaching multiple arguments. Here is an example with two models:
$ scandeval --model_id <model_id1> --model_id <model_id2> --dataset angry-tweets
See all the arguments and options available for the scandeval
command by
typing
$ scandeval --help
In a script, the syntax is similar to the command line interface. You simply
initialise an object of the Benchmark
class, and call this benchmark object
with your favorite models and/or datasets:
>>> from scandeval import Benchmark
>>> benchmark = Benchmark()
>>> benchmark('<model_id>')
To benchmark on a specific dataset, you simply specify the second argument,
shown here with the AngryTweets
dataset again:
>>> benchmark('<model_id>', 'angry-tweets')
This would benchmark all Danish models:
>>> benchmark(language='da')
See the documentation for a more in-depth description.
If you are just interested in downloading a dataset rather than benchmarking, this can be done as follows:
>>> from scandeval import load_dataset
>>> X_train, X_test, y_train, y_test = load_dataset('angry-tweets')
Here X_train
and X_test
will be Pandas dataframes containing the relevant
texts, and y_train
and y_test
will be Pandas dataframes containing the
associated labels.
See the documentation for a list of all the datasets that can be loaded.
The full documentation can be found on ReadTheDocs.
If you want to cite the framework then feel free to use this:
@article{nielsen2021scandeval,
title={ScandEval: Evaluation of language models on mono- or multilingual Scandinavian language tasks.},
author={Nielsen, Dan Saattrup},
journal={GitHub. Note: https://github.com/saattrupdan/ScandEval},
year={2021}
}
The image used in the logo has been created by the amazing Scandinavia and the World team. Go check them out!