Ten Thousand German News Articles Dataset

For more information visit the detailed project page.

Install the required python packages pip install -r requirements.txt.
Download the corpus.sqlite3 file into the project root from here (compressed) or directly from here.
Run python code/extract_dataset_from_sqlite.py corpus.sqlite3 articles.csv to extract the articles.
Run python code/split_articles_into_train_test.py to split the dataset.

License

All code in this repository is licensed under a MIT License.