Epidemiology Data

Datasets on Cases, Deaths, Transmission, Responses...

✏️ Edit this document to make it better ✏️

Join the WhatsApp Chat for deep dive discussions:
"Data" CoronavirusTechHandbook WhatsApp Chat

Link Drop

Streetbees Report- Report on how habits and attitudes relating to coronavirus have impacted certain markets. Requires entering email to access.

Comorbidities data

🔗httpss.com/Joseph_GL_Lee/status/1245695311747649536 ⁸
Map of US reductions in travel (red is less reduction in average travel distance)

Google release their COVID-19 Community Mobility Reports

"We have decided that the best way to track coronavirus in the U.S. is by looking at what’s happening in metro areas."

FT Letter: “Intelligent sharing of data can save lives.” By Sylvie Delacroix (one of the workshop keynotes) and Neil Lawrence.

Covid-19 time series data

COV-CLEAR: International Open Data Standard for 
COVID-19: Community Case Reporting / Surveys; Symptom Trackers; Testing Services

Data Visualization Society: pairing interested data/viz volunteer consultants with practitioners in the health community, particularly interested in supporting community organizations and social service agencies looking for ways to make COVID-19 data more local for their communities 

Discourse Data Against Covid: community of data scientists available to do data analysis or answer technical queries for other researchers/scientists working on covid-response 

JOGL (Just One Giant Lab): the Quantified Flu project have launched their first prototype to collect data to use wearables to differentiate COVID-19 from other infections

ODI Leeds: #OpenDataSavesLives initiative with Covid-19 specific resources, events, Slack channel, etc 

Slack channel for Cambridge-based information sharing

Google searches can help us find emerging COVID-19 outbreaks- Searches for Loss of Smell align closely with the number of positive cases. The inability to smell could be an early warning sign.

Data and Information Sources on Black People/People of African Descent and COVID19
Maintained by Professor Kim Gallon at Purdue University.

South Korea released medical history for all #COVID19 patients based on their insurance claims for the past five years: https://hira-covid19.net


Datasets We Would Like

If you'd like to request any datasets, please add to this list. If you’re aware of a dataset on this list, please add under Existing Datasets with a link and a description, and delete from the list here. Not found as of 24th March 2020.

💡Representative samples of populations tested regardless of symptoms (suggested here; drive through testing in South Korea of tens of thousands early on; 4% of 1097 randomly tested hospital staff infected with coronavirus in Brabant, the Netherlands)

💡Hospitalisation rate by different ages/comorbidities (Vox article summarising infection and fatality rate by age, 23rd March) got some of that

💡Case fatality rate by age and comorbidity at the same time

💡Audio recordings of coughs- We would like to build an app that can record a cough and identify if it is likely caused by Covid19 - This will also hopefully help us track the spread of the virus.

💡Regular Daily Deaths (e.g. 2019) vs. Current Daily Deaths per Country (totals) irrespective of the cause, e.g. for the Netherlands there is the CBS for official stats (Centraal Bureau voor de Statistiek) and if I looked correctly the numbers seem similar 2019 vs. 2020 (total daily deaths): https://opendata.cbs.nl/statline/#/CBS/nl/dataset/70895ned/table?ts=1585415204928 Don't know how correct https://www.cia.gov/library/publications/the-world-factbook/rankorder/2066rank.html is but you can see roughly the death rate per 1,000 population and Italy is by default very high (10.40) vs. Singapore (3.40 !). Why do this? Because nobody ever focused so much on daily death rates etc. than today and it's not 'fair' that all regular stats which nobody ever knew are never mentioned in comparison... And it created a strong bias towards misplaced fear etc.

💡Police arrest data (in regions with enforcement). General numbers and breakdowns by race/gender/class

Repositories of Existing Datasets

These are links to other repositories of datasets. If you are looking for a dataset with a particular characteristic, we recommend you check through these first. They have descriptions of the dataset and a link.

library of data collaboratives

COVID-19 Data Providers by Amass Insights- Datasets categorised by geographical focus, type of data, data features, status (complete?), date last updated, date added.

Living Systematic Map of the Evidence- Categorises datasets by topic (e.g. diagnosis, health impacts, economic impacts). In particular, it identifies data which is about other viruses and / or is not primary. Updates on a weekly basis.

Table of Possible Covid Datasets- Table includes the source of the data, notable features, a short description, how to access the data and a link. [c.95 datasets as of 21st March 2020]

🗣[email protected] 
#DATA4COVID-19  — Data Collaboratives in Response to COVID-19. A repository of live data projects.

Context for Data

For documents which help with interpreting the data.

WHO List of Protocols for Laboratory Testing by Country

Testing & case reporting protocols for each country (and its chronological change) is crucial for understanding epidemiological data

It’s all about definitions
It is extremely dangerous to make any assessments whatsoever about spreading rates because each country is using different testing strategies. In an ideal world, we would know exactly what percent of tests are positive, but since we are unlikely to get that, we need some kind of proxy. I’m aware that there is anecdotal evidence about what countries are not testing, but perhaps a good first step would be to create categories: 

  1. Testing done on suspected cases
  2. Testing done on probable cases

Or perhaps another way to break things down could be:

  1. Testing on self-quarantined people
  2. Testing on general practice patients
  3. Testing at hospital only

Research and Papers Data

A Free, Open Resource for the Global Research Community- over 33,000 papers

COVID-19 Research Database (provided by the WHO)

LitCovid is a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus. It is the most comprehensive resource on the subject, providing a central access to 2170 (and growing) relevant articles in PubMed. 

Microsoft Academic Research on COVID-19

COVID-19 Open Patent Dataset (hosted by Lens.org)

COVID-19 Literature Review Collection (hosted by Cochrane Library)

🌏🔗 https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30183-5/fulltext
Symptoms of People admitted to hospital

Google search data for Coronavirus. Search data has been used by digital epidemiologists to understand public information needs during previous outbreaks (eg Zika, Ebola). An active project led by @lampos is working on a model to track COVID-19 using search data

COVID-19 Open Research Dataset

Existing Datasets by Features

Case Data

🔗https://covid-19.datasettes.com/ (Johns Hopkins daily reports loaded into Datasette, so you can run SQL queries against it and export JSON or CSV - info here)
Johns Hopkins dataset for cases, deaths, recoveries. Regular updates and data from the 14 largest government agencies. 

processed from JHU data and augmented with ISO3 country codes, WHO region designations and World Bank Income country groups

COVID-19 Global Pandemic data

Global data- BNO News

Cases per capita- Maps updated every few days with cases per 100,000.

🗣 🔗[email protected]
a dataset of 1000 early cases, e.g. First confirmed imported COVID-19 pneumonia patient in Shenzhen (from Wuhan): male, 66, Shenzhen residence, visited relatives in Wuhan on 12/29/2019, symptoms onset on 01/03/2020, returned to Shenzhen and seek medical care on 01/04/2020, hospitalized on 01/11/2019, sample sent to China CDC for testing on 01/18/2020, confirmed on 01/19/2020. 8 others under medical observation, contact tracing ongoing.

Day-Level COVID-19 Dataset (hosted on Kaggle)

Ingredients & uncertainties involved in infectious disease modeling. Case & death projections differ hugely between models due to different assumptions feeding them.

climate and local COVID-19 transmission: 'summary: we shouldn't assume transmission will decline substantially during the summer'

Graphs of confirmed cases/recoveries/deaths by country, using data from JHU CSSE

Projections and Forecasts

Produces live counts of latest data and projections.

Time taken for case numbers to double (Wikipedia)

The COVID-19 forecasting project at Oxford curates a database of all sufficiently large global containment and mitigation measures focused on reducing transmission (i.e. excluding measures like economic stimulus).

Healthcare and Hospital Data

🌏🔗Hospital Beds per 1000 People (OECD) (Wikipedia)
Hospital Beds Per 1000 People (OECD)

The Global Healthsites Mapping Project is building an open data commons of health facility data with OpenStreetMap. We believe that by leaning on the methods and infrastructure of OpenStreetMap, baseline health facility data can be maintained.

Global Dataset on ICU, Ventilators and Critical Care Capacity

Trial Data

Dimensions COVID-19 publications, data sets, clinical trials



🌏🔗https://www.openehr.org/close m/
Clinical knowledge data

Tests and Self Reported Data

Number of tests performed by each country

Number of tests performed by each country- Tests performed and tests per million performed. Combination of official and estimated figures. [Last updated 9th March 2020 - updated 24th March 2020.]

Self-reported Data- Many people have some symptoms, but choose not to go to a doctor. There are a rising number of digital tools that allow individuals to self-report symptoms.

data, testing, age distribution: a thread about some of the issues while comparing numbers from different countries/populations:

Policy and Government Responses Data

COVID-19 Policy Dataset (provided by Overton)

🌏🔗 https://data.humdata.org/dataset/acaps-covid19-government-measures-dataset
🗣[email protected] 
The COVID-19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories: - Social distancing - Movement restrictions - Public health measures - Social and economic measures - Human rights implications Each category is broken down into several types of measures. ACAPS consulted government, media, United Nations, and other organisations sources.

Working with data that may lead to unintended consequences?

Humanitarian Data

COVID-19 Pandemic in Locations with a Humanitarian Response

Search Data

🗣 [email protected] 
Further search term variation data on coronavirus (i.e., what people google when they google “coronavirus”) Geographical scope: Various languages and countries (including Italy, Netherlands, France, Germany, Belgium, US, Mexico, Brazil, Colombia, Norway, Switzerland, UK)
Data has been collecting on a near-daily basis since 20/01/2020

CORD-19 Search Engine (provided by Verizon Media)

Travel Data

Global Travel Restrictions - WFP Division of Emergencies

IATA travel center updates (text)- Live free collection of information on travel restrictions worldwide

Existing Datasets by Country


Australia Government Department of Health

Coronavirus (COVID-19) health alert - Department of Health

Media hub - coronavirus disease (COVID-19)

Coronavirus COVID-19 - Victoria Australia - Data on public exposure sites.

NSW Department of Health - COVID-19 Statistics- New South Wales

Current status and contact tracing alerts - Queensland Health


Belgium data

Belgium Data


Long lists of coronavirus keywords with estimated search volumes, sourced from the search engine marketing tool SEMrush.

Health MInistry (currently, offfline 03-20/2020)*

This repository has a list of data sources and projects about Brazilian data.


COVID-19 Canadian Open Data Handbook Draft

🇨🇦🔗COVID-19: A data perspective
COVID-19 data from Canada's statistical agency. Includes economic dashboard and interactive case map.

US and Canada data


🗣 https://twitter.com/warmspeakers
Geographical scope: China
Language: Chinese & English
Currently, people in China use VPNs (Virtual Private Networks) to bypass censorship, and Google info about the coronavirus (COVID-19). This website compiles these Google searches, word for word, in real-time.

National Health Commission of the People’s Republic of China (NHC)

🇨🇳🔗 http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm
China CDC (CCDC)

Zhejiang Government

🇨🇳🔗WHO Daily Situation Reports
Daily figures for countries and cities in China about confirmed cases, deaths and transmission classifications.

Travel History of Confirmed Cases on Public Transportation in China (from Dec 21, 2019)

Clinical characteristics of COVID-19 patients with digestive symptoms in Hubei, China: a descriptive, cross-sectional, multicenter study

Diabetes Comorbidity China

National Health Commission of the People's Republic of China


Cases in the Czech republic - total, by region


European Centre for Disease Prevention and Control

European Centre for Disease Prevention and Control- Situational update worldwide

ECDC-Europe data. Situational update for the EU/EEA and the UK by the ECDC

EU Open Data portal on COVID-19. Direct access to the dataset of the latest available public data (daily situation)


Cases in the Republic of Estonia


🇫🇮🔗 https://korona.kans.io/
Source: Helsingin Sanomat (media) JSON API



Long lists of coronavirus keywords with estimated search volumes, sourced from the search engine marketing tool SEMrush.

🇫🇷🔗 https://github.com/opencovid19-fr/data
Source: French Gov. Open data platform

Dataviz, French case numbers, updated on a daily basis
Source : Santé publique France 

SUIVI DU CORONAVIRUS EN FRANCE - Dernière mise à jour : 31/03/2020


Germany data

Robert Koch institute’s numbers of cases in Germany, by state

Robert Koch institute's numbers of cases in Germany as CSV timeline

German Case Numbers


Ghana Health Service site for COVID-19


Greek case numbers, from the ministry of health press releases, updated every day

🇭🇰Hong Kong

Hong Kong Department of Health

Demographic details of confirmed cases, including gender, age and case classification.
Daily confirmed cases. 


Hungarian case numbers and maps, updated every day

Another data source with daily updates and a rudimentary SIR forecast


Long lists of coronavirus keywords with estimated search volumes, sourced from the search engine marketing tool SEMrush.

Ministry of Health and Family Welfare | India


Indonasia- Requires log-in.


Long lists of coronavirus keywords with estimated search volumes, sourced from the search engine marketing tool SEMrush.

Italy Ministry of Health

"Initial estimates of the effects of the lockdown on the mobility and proximity of Italians"

This is a joint organization between the ISI Foundation, a research institute based in Turin Italy, and Cuebiq Inc. a location intelligence company.


Tracking of COVID-19 in Italy | Google Sheets

Italian Outbreak Dashboard - WFP Emergency Division

🇮🇹🔗 https://github.com/pcm-dpc/COVID-19
Data Release by Italian Government

Report sulle caratteristiche dei pazienti deceduti positivi a COVID-19 in Italia Il presente report è basato sui dati aggiornat (Report on the characteristics of COVID-19 positive patients who died in Italy This report is based on updated data) (Information on previous illnesses of victims, in Italian)

🇮🇹🔗 https://www.epicentro.iss.it/coronavirus/bollettino/Bollettino-sorveglianza-integrata-COVID-19_12-marzo-2020.pdf
Epidemia COVID-19 Aggiornamento nazionale 12 marzo 2020 – ore 16:00 (Information on casualties by gender, with 70% being men)


Official data from the Department of Civil Protection and Emergency Management in dashboard

Official daily status reports


Caribbean Public Health Agency (CARPHA): Situational Reports


Japan Ministry of Health: CoViD-19 information page (Japanese)

Japan Ministry of Health: CoViD-19 information page (English)

Japan Ministry of Health: CoViD-19 infection count/status reports, by date (Japanese)

Tokyo CoViD-19 Task Force


COVID-19 - Kosovo 


Macau Government data


Состојба со корона вирус (COVID - 19) во Македонија


MG🔗 https://docs.google.com/spreadsheets/d/1DBcm9Y6ZCOoh61RNeCjROpXTJ76RiFWh_cVABTaDRe8/edit?usp=sharing
Coronavirus search term variation data in French for Madagascar from Answer The Public (as at 6 April 2020)


🇳🇱🔗 https://www.rivm.nl/nieuws/actuele-informatie-over-coronavirus
Current Information about Coronavirus

Tracking of COVID-19 in The Netherlands | Google Sheets

🇳🇿New Zealand

COVID-19 (novel coronavirus) | New Zealand Ministry of Health COVID-19 status updates and advice

🇳🇿🔗 https://statisticsnz.shinyapps.io/trade_dashboard/
New Zealand Trade Dashboard | Stats NZ 

🇳🇿🔗 https://www.stats.govt.nz/experimental/provisional-indications-effects-of-coronavirus-outbreak-on-new-zealand-trade-with-china
Provisional indications - effects of coronavirus outbreak on New Zealand trade with China | Stats NZ 

🇳🇿🔗 https://www.rnz.co.nz/national/programmes/checkpoint/audio/2018738700/covid-19-medicine-vaccine-supply-disruptions-likely-pharmac
Coronavirus: Medicine, vaccine supply disruptions likely warns Pharmac | RNZ 


Data from Pakistan


🔗Paraguay: repository extracting data from official sources


Portugal Cases

🇱🇨Saint Lucia

Caribbean Public Health Agency (CARPHA): Situational Reports


Ministry of Health Singapore (MOH)

Singapore MOH 

covid-19 SG Dashboard


Long lists of coronavirus keywords with estimated search volumes, sourced from the search engine marketing tool SEMrush.

Spanish Health Ministry page on the virus, 

Spain csv data


Prisotnost koronavirusa (COVID-19) v Sloveniji in ukrepi

COVID-19 v Sloveniji 


Koronavírus na Slovensku- Overené fakty o stave, prevencii a boji s koronavírusom na Slovensku monitoruje Národné centrum zdravotníckych informácií v súčinnosti s Úradom vlády Slovenskej republiky.

🇿🇦South Africa

National Institute for Communicable Diseases: Situational Reports

🇿🇦🔗 https://github.com/dsfsi/covid19za
Repo for machine readable data and a dashboard

🇰🇷South Korea

Press releases, including updates on cases with detailed data on fatalities


Taiwan National Health Insurance

Taiwan CDC

Taiwan Centers for Disease Control: Home

🇹🇹Trinidad & Tobago

Caribbean Public Health Agency (CARPHA): Situational Reports

🇬🇧United Kingdom

Official Covid-19 related datasets
Total UK cases/deaths numbers: UTLA-level case numbers for England

Potential COVID-19 symptoms reported through NHS Pathways and 111 online

ONS data on deaths where Covid-19 reported on death certificate, includes deaths outside hospitals

Devolved administration official data:
Total cases in Northern Ireland
Scotland Cases by Region
Total Cases in Wales  

Dashboards, maps etc of official data (nb: move to Infographics sheet?)
Coronavirus Live UK -  live map showing confirmed cases of coronavirus in the UK with facility to report your own self-isolation. (move to infographics sheet?)

LGA dashboard of cases by local authority, with comparators

Dashboard with map, charts and prediction

Compiled versions of official data
collated dataset of tests, cases, deaths from UK public health bodies (England, Scotland, Wales, NI)

Archive of the PHE Cases by Local Authority files - Each file from March 16th archived and available to download.

Google sheet tracking testing, cases and deaths by day

Number of tests carried out in the UK by date (table made with Wayback Machine)

An open API includs real-time and history data from official channel

Related health datasets
UK Govt Number of People going to Emergency Departments.

Search-related data
🗣 https://twitter.com/samgilb
Search term variation data on coronavirus - that is, what people google when they google “coronavirus”
Geographical scope: UK only

Long lists of coronavirus keywords with estimated search volumes, sourced from the search engine marketing tool SEMrush.

Miscellaneous other links posted here
UK Healthcare data -- note: appears to be lists of NHS organisations and performance indicators provenance unknown

The largest set of publicly available real time urban data in the UK.

🇺🇸United States

COVID tracking Project. Live data graded for all 50 states.

Long lists of coronavirus keywords with estimated search volumes, sourced from the search engine marketing tool SEMrush.

Centers for Disease Control and Prevention

US CDC Coronavirus data

Testing in the US

US and Canada data

US state level data

Information on morbidity of Covid-19

Alabama GIS map by Department of Emergency Management at Jacksonville State University (JSU)

Live visualisation of CONVID-19 case data

Florida Cases

Massachusetts Cases

New Jersey

New York Cases

New York City Cases by Zip Code

Ohio Cases

Oregon Cases

Pennsylvania Cases

Tennessee Cases

Texas Cases


Archive of daily reports from WA:
Washington Cases


Article from Reichlab on Influenza-Like Illness Data

State by State data posted on Reddit

🇺🇸🔗 https://github.com/reichlab
Reichlab’s Github

Economic Datasets

Year on Year Restaurant Sales 

Citymapper Mobility Index- % of people moving in cities compared to usual.

Year-on-year and week-to-week comparison by geographic region/venue type

Project Ideas (add your own!)

💡[email protected]
Requirements to be “high quality”:

  1. Taken from source, e.g. PHE, not a re-publisher like JHU CSSE.
  2. Validated against multiple sources if available. Can then flag when other data providers have errors, thus improving data quality of all providers.
  3. Historical data. Some providers only give “today’s count”, e.g. PHE by region.
  4. Tidy data. JHU CSSE’s data is forcing many to reshape it from wide to long, time that could be better spent on improved modelling.
  5. Localised as much as possible. COVID-19 spreads locally within communities. Good localised data may be extremely useful for researchers evaluating efficacy of different strategies.
  6. Automated ETL with auto-validation. CSSE has a huge number of data errors now on their published datasets (314 open issues at time of writing). Other data sources rely on manual updates. This is unacceptable and may stop if the maintainer contracts COVID-19.

💡Coronavirus activity on Kaggle
[Evaluated as of 24th March 2020] Search - Other than the CORD-19 dataset below, data is mostly out of date or country breakdowns of the CSSE/Hopkins data which can be found here.

💡🔗 COVID-19 Open Research Dataset Challenge (CORD-19)
An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House
This project looks hugely worth every available effort - 30,000 studies as data, urgently needs analysis.

💡Knowledge Graph
🗣[email protected]Searchity.ai 
Working with key multilaterals on a knowledge graph with curated subsets for reliable information and optimised resource allocation. This is extending the UN AI data commons initiative to a knowledge commons.

💡Connect any data set & monitor in real-time with OpenFn
🗣[email protected]
The OpenFn integration platform offers enterprise-grade, scalable, and secure infrastructure for organizations to connect ANY app, automate data sharing, & setup real-time disease surveillance. OpenFn is offering free setup help & zero-margin licenses for projects related to the urgent COVID-19 response. 

Relevant real-world case studies: 

  1. SwissTPH uses OpenFn to forward CommCare case updates to national DHIS2 registry for real-time monitoring of child diagnoses in Nigeria. 
  2. International Rescue Committee used OpenFn in the DRC to connect multiple Kobo Toolbox data collection sources to a central Case Tracker for ongoing monitoring of suspect Ebola cases reported.