Skip to content

maa989/covid19-forecast-hub-de

 
 

Repository files navigation

Actions Status Actions Status Actions Status Actions Status

German and Polish COVID-19 Forecast Hub

A collaborative forecasting project

Beschreibung in deutscher Sprache siehe hier.

Note: This project is now largely synchronized with the European COVID-19 Forecast Hub (website and github repository), which is run by the European Center for Disease Prevention and Control and the London School of Hygiene and Tropical Medicine. Further development is mainly going on in this new repository now. All forecasts except regional-level forecasts are also shown in the European Forecast Hub.

Website:: https://kitmetricslab.github.io/forecasthub/

Preprint: https://www.medrxiv.org/content/10.1101/2020.12.24.20248826v2

Old version of visualization incl. evaluation scores: https://jobrac.shinyapps.io/app_forecasts_de/

The new visualization, built by the Signale Team at RKI lives in a separate repository: https://github.com/KITmetricslab/forecasthub.**

Study protocol:: https://osf.io/cy937/registrations

Reference: Bracher J, Wolffram D, Deuschel, J, Görgen, K, Ketterer, J, Gneiting, T, Schienle, M (2020): The German and Polish COVID-19 Forecast Hub. https://github.com/KITmetricslab/covid19-forecast-hub-de.

Web tool to visualize submission files: https://jobrac.shinyapps.io/app_check_submission/

Web tool to explore forecast evaluations (still in development): https://jobrac.shinyapps.io/app_evaluation/

Contact: [email protected]

Purpose

This repository assembles forecasts of cumulative and incident COVID-19 deaths and cases in Germany and Poland in a standardized format. The repository is run by members of the Chair of Econometrics and Statistics at Karlsruhe Institute of Technology and the Computational Statistics Group at Heidelberg Institute for Theoretical Studies, see below.

An interactive visualization and additional information on our project can be found on our website here.

We are running a pre-registered evaluation study covering the months of October through March to assess the performance of different forecasting methods. You can find the protocol here.

The effort parallels the US COVID-19 Forecast Hub run by the UMass-Amherst Influenza Forecasting Center of Excellence based at the Reich Lab. We are in close exchange with the Reich Lab team and follow the general structure and data format defined there, see this wiki entry for more details. We also re-use software provided by the ReichLab (see below).

If you are generating forecasts for COVID-19 cases, hospitalizations or deaths in Germany and would like to contribute to this repository do not hesitate to get in touch.

Forecast targets

Deaths

We collect 1 through 4 week ahead forecasts of incident and cumulative deaths by reporting date in Germany and Poland (national level), the German states (Bundesländer) and Polish voivodeships, with a special focus on short horizons 1 and 2 week ahead. This wiki entry contains details on the definition of the targets. There is no obligation to submit forecasts for all suggested targets and it is up to teams to decide what they feel comfortable forecasting.

Our definition of targets parallels the principles outlined here for the US COVID-19 Forecast Hub.

Up to 14 December we treated the ECDC data available here and here in a processed form as our ground truth for the national level death forecasts. As of 19 December, we use data we process directly from Robert Koch Institute and the Polish Ministry of Health see below. These agree with the ECDC data up to 14 Dec.

Cases

We collect 1 through 4 week ahead forecasts of incident and cumulative confirmed cases by reporting date in Germany and Poland (national level), German states (Bundesländer) and Polish voivodeships, see the wiki entry. The respective truth data from RKI and the Polish Ministry of Health can be found here and here.

Contents of the repository

The main contents of the repository are currently the following (see also this wiki page):

  • data-processed: forecasts in a standardized format
  • data-truth: truth data from JHU and ECDC in a standardized format
  • data-raw: the forecast files as provided by various teams on their respective websites
  • The interactive visualization, which has been implemented by embers of the Signale Team at RKI, is maintained in a separate repository.

Guide to submission

For new teams we recommend direct submission to the European COVID-19 Forecast Hub unless they produce regional-level forecasts. They should consider the (slightly different) isntructions there.

Submission for actively contributing teams is based on pull requests. Our wiki contains a detailed guide to submission. Forecasts should be updated in a weekly rhythm. If possible, new forecast should be uploaded on Mondays. Upload until Tuesday, 3pm Berlin/Warsaw time is acceptable. Note that we also accept additional updates on other days of the week (not more than one per day), but will not include these in visualizations or ensembles (if no new forecast was provided on a Monday we will, however, use forecasts from the preceding Sunday, Saturday or Friday).

We moreover actively collect forecasts from a number of public repositories in accordance with the respective license terms and after having contacted the respective authors.

We strongly encourage teams to visually inspect their final forecasts prior to submission. We created a Shiny app to help you in this process.

We try to provide direct support to new teams to help overcome technical difficulties, do not hesitate to get in touch.

Data format

We store point and quantile forecasts in a long format, including information on forecast dates and location, see this wiki entry for details. This format is largely identical to the one outlined for the US Hub here.

Data license and reuse

The forecasts assembled in this repository have been created by various independent teams, most of which provided a license with their forecasts. These licenses can be found in the respective subfolders of data-processed. Parts of the processing, analysis and validation codes have been taken or adapted from the US COVID-19 forecast hub where they were provided under an MIT license. All codes contained in this repository are equally under the MIT license. If you want to re-use materials from this repository please get in touch with us.

Truth data

Data on observed numbers of deaths and several other quantities are compiled here and come from the following sources:

  • European Centre for Disease Prevention and Control This used to be our preferred source for national level counts, but ECDC has switched to weekly reporting intervals on 14 Dec 2020.
  • Polish Ministry of Health. We pull these data from this Google Sheet run by Michal Rogalski. This is our preferred source for Polish voivodeship level counts. The data are coherent with the national level data from ECDC. These data are coherent with the ECDC data up to 14 Dec. To align with the ECDC time scale we have shifted them by one day, see here.
  • Robert Koch Institut. Note that these data are subject to some processing steps (see here) and are in part based on manual data extraction performed by IHME. This is our preferred source for German Bundesland level counts. The data are coherent with the national level data from ECDC up to 14 Dec.
  • Johns Hopkins University. These data are used by a number of teams generating forecasts. Currently (August 2020) the agreement with ECDC is good, but in the past there have been larger discrepancies. This is the main data source for the European COVID-19 Forecast Hub.
  • DIVI Intensivregister. These data are currently not yet used for forecasts, but we may extend our activities in this direction.

Details can be found in the respective README files in the subfolders of data-truth.

Teams generating forecasts

Currently we assemble forecasts from the following teams. Note that not all teams are using the same ground truth data. (used truth data source and forecast reuse license in brackets):

Forecast evaluation and ensemble building

One of the goals of this forecast hub is to combine the available forecasts into an ensemble prediction, see here for a description of the current unweighted ensemble approach. Note that we only started generating ensemble forecasts each week on 17 August 2020. Ensemble forecasts from earlier weeks have been generated retrospectively to assess performance. As the ensemble is only a simple average of other models this should not affect the behaviour of the ensemble forecasts. The commit dates of all forecasts can be found here. Starting from 2020-09-21, our main ensemble is the median rather than the mean ensemble, as the former showed better performance in evaluations.

At a later stage we intend to generate more data-driven ensembles, which requires evaluating different forecasts, both those submitted by teams and those generated using different ensembling techniques. We want to emphasize, however, that this is not a competition, but a collaborative effort. The forecast evaluation method which will be applied is described in this preprint.

Forecast hub team

The following persons have contributed to this repository, either by assembling forecasts or by conceptual work in the background (in alphabetical order):

Related efforts

Scientific papers and preprints

Members of our group have contributed to the following papers and preprints on collaborative COVID-19 forecasting:

Acknowledgements

The Forecast Hub project is part of the SIMCARD Information& Data Science Pilot Project funded by the Helmholtz Association. We moreover wish to acknowledge the Alexander von Humboldt Foundation whose support facilitated early interactions and collaboration with the Reich Lab and the US COVID-19 Forecast Hub.

The content of this site is solely the responsibility of the authors and does not necessarily represent the official views of KIT, HITS, the Humboldt Foundation or the Helmholtz Association.

About

German and Polish COVID-19 Forecast Hub

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 68.7%
  • HTML 20.7%
  • R 6.6%
  • Python 4.0%