Books by Jules J. Berman, covers



Abstract submitted to: The Center for Open Source in Government

Conference: Open Source for National and local eGovernment Programs in the U.S. and EU

January 2, 2003

Title: Open Source Confidentiality Methods

Scientific progress requires the free exchange of research data. Because medical research is often conducted using confidential records, medical researchers have historically refused to share their primary data, thus denying other scientists the opportunity of using these data sets for further research.

Pressured by federal regulations restricting the use of identified medical records (HIPAA and the Common Rule), and by recent data-sharing proposals from NIH and from publishers, researchers have devised a variety of innovative technical solutions that permit researchers to obtain and share large data sets derived from medical records without breaching patient confidentiality. Some of the methods used are: one-way hashing of patient identification fields (such as name and social security number),data scrubbing (removing private information from free-text), and threshold splitting (dividing text into multiple files, any one of which can be shared and used for scientific purposes without breaching confidentiality), and data ambiguating (ensuring non-uniqueness of records). Using these methods, large medical data sets can be safely used for research without obtaining patient consent and can be shared by the scientific community. These methods and their available open source implementations will be discussed.

Page last modified December 18, 2012