Data
Data collection began in May 2013 and ended in November 2020. We had to stop because the NEA had changed their website design in a way that is very challenging for data collection. To promote research and communal participation, we hope that the NEA will publish detailed non-aggregated data in machine-friendly formats.
It has been a good 7 years. Thank you for your support! We will continue to publish the existing data here.
The dengue dataset has two parts:
- PDF files in a public Google Drive folder. There are two types of files — "clusters" record the location and size of dengue clusters, whereas "cases" show the daily and weekly reports of new dengue cases.
- PDF converted to CSV files on best effort basis. Disclaimer: we cannot guarantee perfect conversion because of dirty data e.g. wrong / misspelled / duplicate addresses.
Each file is a snapshot of the National Environment Agency's (NEA) webpage that was taken on a certain date. Hence, filenames contain YYMMDD date format to denote when data was collected (e.g. 12 August 2014 is written as 140812). This file-naming convention allows the files to be sorted in chronological order of data capture.
Data is collected twice a week since May 2013 (except for a gap in October 2013). To date, we had collected 250+ snapshots. We are sharing this dataset because detailed historical data is not available on NEA's website (only current information is shown).
The dataset is also available in a machine-friendly format known as Comma-Separated Values (CSV). Every PDF file is converted into CSV format. Each row in the CSV file represents a single location where dengue cases are reported.
To plot dengue locations on a map, the CSV files provide the latitude and longitude, which are not available in the PDF file. The following table shows the CSV schema (sample CSV file). For enquiries, please contact admin (at) sgcharts (dot) com.
- Number Of Cases
- Number of reported dengue cases at this location
- Street Address
- Street address where dengue cases are reported (down to the apartment block level)
- Latitude
- Latitude of the street address
- Longitude
- Longitude of the street address
- Cluster Number
- Every dengue cluster is labelled with a serial number. However, this serial number cannot be used as an unique identifier because (1) the serial number is reused in other snapshots and (2) the serial number will change throughout the cluster's lifetime.
- Recent Cases In Cluster**
- Number of dengue cases with onset in the last 2 weeks
- Total Cases In Cluster
- otal number of dengue cases reported in this cluster
- Date
- Date string in YYMMDD format
- Month Number
- Index number of the month, where 1=January and 12=December
**NEA published the count of recent cases per cluster only from December 2013 onwards. For prior data, this field is substituted with a placeholder value of -1.
If you would like to use this dataset, please ensure proper attribution to the National Environment Agency's website (https://www.nea.gov.sg/dengue-zika/dengue/dengue-clusters). Acknowledgement of SG Outbreak with a link back to this site is appreciated :)