Skip to content

Evaluation of language models on mono- or multilingual Scandinavian language tasks.

License

Notifications You must be signed in to change notification settings

NbAiLab/ScandEval

 
 

Repository files navigation

Evaluation of language models on mono- or multilingual Scandinavian language tasks.


LastCommit ReadTheDocs PyPI Status License

Installation

To install the package simply write the following command in your favorite terminal:

$ pip install scandeval[all]

This will install all the model frameworks currently supported (pytorch, spacy, and jax). If you know you only need one of these, you can install a slimmer package like so:

$ pip install scandeval[pytorch]

Lastly, if you are not interesting in benchmarking models, but just want to use the package to download datasets, then the following command will do the trick:

$ pip install scandeval

Quickstart

Benchmarking from the Command Line

The easiest way to benchmark models is via the command line interface. After having installed the package, you can benchmark your favorite model like so:

$ scandeval --model_id <model_id>

Here model_id is the HuggingFace model ID, which can be found on the HuggingFace Hub. By default this will benchmark the model on all the datasets eligible. If you want to benchmark on a specific dataset, this can be done via the --dataset flag. This will for instance evaluate the model on the AngryTweets dataset:

$ scandeval --model_id <model_id> --dataset angry-tweets

We can also separate by language. To benchmark all Danish models, say, this can be done using the language tag, like so:

$ scandeval --language da

Multiple models, datasets and/or languages can be specified by just attaching multiple arguments. Here is an example with two models:

$ scandeval --model_id <model_id1> --model_id <model_id2> --dataset angry-tweets

See all the arguments and options available for the scandeval command by typing

$ scandeval --help

Benchmarking from a Script

In a script, the syntax is similar to the command line interface. You simply initialise an object of the Benchmark class, and call this benchmark object with your favorite models and/or datasets:

>>> from scandeval import Benchmark
>>> benchmark = Benchmark()
>>> benchmark('<model_id>')

To benchmark on a specific dataset, you simply specify the second argument, shown here with the AngryTweets dataset again:

>>> benchmark('<model_id>', 'angry-tweets')

This would benchmark all Danish models:

>>> benchmark(language='da')

See the documentation for a more in-depth description.

Downloading Datasets

If you are just interested in downloading a dataset rather than benchmarking, this can be done as follows:

>>> from scandeval import load_dataset
>>> X_train, X_test, y_train, y_test = load_dataset('angry-tweets')

Here X_train and X_test will be Pandas dataframes containing the relevant texts, and y_train and y_test will be Pandas dataframes containing the associated labels.

See the documentation for a list of all the datasets that can be loaded.

Documentation

The full documentation can be found on ReadTheDocs.

Citing ScandEval

If you want to cite the framework then feel free to use this:

@article{nielsen2021scandeval,
  title={ScandEval: Evaluation of language models on mono- or multilingual Scandinavian language tasks.},
  author={Nielsen, Dan Saattrup},
  journal={GitHub. Note: https://github.com/saattrupdan/ScandEval},
  year={2021}
}

Remarks

The image used in the logo has been created by the amazing Scandinavia and the World team. Go check them out!

About

Evaluation of language models on mono- or multilingual Scandinavian language tasks.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.6%
  • Makefile 0.4%