Sharing Pathology Data: Confidentiality and Data Representation Issues
Jules Berman, Cancer Diagnosis Program, DCTD, NCI, NIH
It is now technically feasible to perform data analysis on large datasets containing hundreds of thousands of surgical pathology reports all linked to clinical data residing in integrated hospital information systems. Such datasets could support many types of research, including data mining, hypothesis generation, marker development and validation, and outcomes analysis. Because surgical pathology reports are linked to archived tissue blocks, large institutional pathology datasets could be used as resource locators for tissue samples.
In April 2001, the NCI will begin work on the Shared Pathology Informatics Network. The objective of this initiative is to create a model Web-based system to access data related to archived human specimens at multiple institutions. The data to be accessed will be derived from existing medical databases. The ability to automatically access information from medical databases is the first step toward the long-term goal of developing informatics systems to support NCI's efforts to improve researchers' access to human specimens and clinical data.
Many factors have limited the usefulness of hospital-based pathology archives. Two of these are: 1) the confidential nature of medical information, which imposes restrictions on the ways that data can be accessed and shared, and 2) the free-text format of pathology reports, which makes it difficult to prepare common data elements representing the conceptual information contained in the electronic medical record. Approaches to the solution of both these problems will be discussed.