Construction and Validation of Synthetic Electronic Medical Records

Linda Moniz, Anna L. Buczak, Lang Hung, Steven Babin, Michael Dorko, Joseph Lombardo


There is a current and pressing need for a test bed of electronic medical records (EMRs) to insure consistent development, validation and verification of public health related algorithms that operate on EMRs. However, access to full EMRs is limited and not generally available to the academic algorithm developers who support the public health community. This paper describes a set of algorithms that produce synthetic EMRs using real EMRs as a model. The algorithms were used to generate a pilot set of over 3000 synthetic EMRs that are currently available on CDC’s Public Health grid. The properties of the synthetic EMRs were validated, both in the entire aggregate data set and for individual (synthetic) patients. We describe how the algorithms can be extended to produce records beyond the initial pilot data set.

Full Text:



Online Journal of Public Health Informatics * ISSN 1947-2579 *