The dengue dataset is a collection of PDF files in a public Google Drive folder. There are two types of files — "clusters" record the location and size of dengue clusters, whereas "cases" show the daily and weekly reports of new dengue cases.

Each file is a snapshot of the National Environment Agency's (NEA) webpage that was taken on a certain date. Hence, filenames begin with the YYMMDD datestamp to denote when data was collected (e.g. 12 August 2014 would be written as 140812). Such a file-naming convention allows the files to be sorted in chronological order of data capture.

Data is collected twice a week since May 2013 (except for a gap in October 2013). To date, there are more than a hundred snapshots. We decided to share this dataset because historical data is not available on NEA's website (only current information is shown).

The dataset is also available in a machine-friendly format known as Comma-Separated Values (CSV). Every PDF file is converted into CSV format. Each row in the CSV file represents a single location where dengue cases are reported.

For the convenience of plotting dengue locations on a map, the CSV files provide the latitude and longitude, which are not available in the PDF file. The following table shows the CSV schema (sample CSV file). For enquiries, please contact admin (at) sgcharts (dot) com.

Number Of Cases
Number of reported dengue cases at this location
Street Address
Street address where dengue cases are reported (down to the apartment block level)
Latitude of the street address
Longitude of the street address
Cluster Number
Every dengue cluster is labelled with a serial number. However, this serial number cannot be used as an unique identifier because (1) the serial number is reused in other snapshots and (2) the serial number will change throughout the cluster's lifetime.
Recent Cases In Cluster**
Number of dengue cases with onset in the last 2 weeks
Total Cases In Cluster
otal number of dengue cases reported in this cluster
Date string in YYMMDD format
Month Number
Index number of the month, where 1=January and 12=December

**NEA published the count of recent cases per cluster only from December 2013 onwards. For prior data, this field is substituted with a placeholder value of -1.

If you would like to use this dataset, please ensure proper attribution to the National Environment Agency's website ( Acknowledgement of SG Outbreak with a link back to this site is appreciated :)