Eliciting Disease Data from Wikipedia Articles

Geoffrey Fairchild, Lalindra De Silva, Sara Y. Del Valle, Alberto M. Segre


Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. This study presents the use of Wikipedia article content in this sphere.  We demonstrate how a named-entity recognizer can be trained to tag case, death, and hospitalization counts in the article text. We also show that there are detailed time series data that are consistently updated that closely align with ground truth data.  We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system.

Full Text:


DOI: https://doi.org/10.5210/ojphi.v8i1.6526

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org