Subject: Shared Pathology Informatics Network Concept

The Concept for a Shared Pathology Informatics Network was approved at the Nov 8-9 NCI Board of Scientific Advisors meeting. Please keep in mind that approval of a Concept does not always mean that an RFA will follow. The Concept is a public document and can be distributed freely.

Jules J. Berman, Ph.D., M.D.

Shared Pathology Informatics Network

1. Background:

The objective of this initiative for a Shared Pathology Informatics Network is to create a Web-based model system that can request and receive data from existing medical databases at multiple institutions. The system could facilitate a wide variety of research efforts. It will be able to automatically identify and obtain the requested data for cases that meet defined search criteria and which have archived tissue specimens. The identity of the patient and other identifying information will be encrypted or otherwise modified to protect patient confidentiality. The system will enable researchers to quickly review the characteristics of large numbers of archived specimens in order to plan marker or assay validation studies. The ability to automatically access information from medical databases, is the first step toward the long-term goal of developing informatics systems to support NCI's efforts to improve researchers access to human specimens and clinical data.

The technology needed to develop the Shared Pathology Informatics Network already exists. Hospitals and medical institutions have developed sophisticated informatics systems that can be searched to meet their internal needs and to provide electronic data to third party payers and to federal agencies. Increasingly sophisticated software tools are available to facilitate communications among systems with different architecture and search strategies. The Shared Pathology Informatics Network will take advantage of the availability of, and rapid advances in, Internet technology to access pathology data residing in multiple institutions.

The Network will create the systems needed to access pathology data related to stored specimens. Vast numbers of specimens currently exist in pathology archives. Many pathology laboratories never discard archived tissues, and some have collections that date from the 19th century. The National Bioethics Advisory Commission reported that in the U.S. there are at least 282 million stored specimens. Most pathology departments have, for at least the last 10 years, stored patient data related to specimens in electronic form. Collecting and updating the clinical data stored in centralized databases is a major cost of creating and maintaining useful specimen resources for clinical research. The need for individual tissue banks would be greatly reduced if systems were developed to take maximal advantage of the availability of tissue and linked electronic data from pathology practices. This proposed Network will create a practical system to access pathology data. Future initiatives could apply that system to create effective, efficient virtual resources to make the vast archives of tissue and data accessible for research.

2. Purpose of RFA/PA:

The purpose of the initiative is to request applications from institutions interested in developing a web-based Shared Pathology Informatics Network to access pathology data linked to tissue specimens from pathology databases in multiple institutions. We plan to fund 7 to 10 institutions for a period of 5 years at approximately $300,000 per year. Over five years the Network will develop and test the communications protocols needed to access data simultaneously from every participating institution. The data to be searched will include patient demographics, diagnostic information, vital status, clinical history, outcome data, and when available, information related to recurrence and treatment. The data returned by participating institutions will be collated and returned to the requestor as a structured report.

The application of the shared pathology data system to create a tissue resource will require that institutions other than those participating in this Network be connected. One of the efforts of this Network will be to define procedures for connecting new institutions. Since the query system will interact with existing institutional databases, new institutions can be added to the Network after customizing the data interface software to meet their institutional requirements. We anticipate that this process will get progressively easier once the Network has fully defined the key data elements and gained experience with the operating characteristics of various commercial pathology data systems.

The RFA will require applicants to demonstrate efforts and expertise in pathology-related informatics and to provide evidence of a cooperative relationship between pathologists and hospital information systems managers. Institutions responding to the RFA must have existing information systems that store and retrieve patient information, including demographics and pathology reports and have access to archived specimens. Institutions with access to hospital clinical data and cancer registry data, in addition to pathology data, will have a distinct advantage in the competition.

The NCI Office of Informatics, which is responsible for coordinating NCI informatics efforts, has developed software to support distributed Internet queries. They have agreed to collaborate with the Shared Pathology Informatics Network and to make their Internet query software available . Their participation in this endeavor will ensure that Network systems build on past NCI informatics activities and are consistent with and compatible with current and future efforts.

There are ample reasons to believe that pathology departments will want to participate in a Shared Pathology Informatics Network and the related initiatives that might follow. Many pathology departments already have a strong interest in pathology informatics and have established informatics programs. There is even considerable momentum toward the development of a sub-specialty in Pathology Informatics. The Network efforts will provide real incentives for pathology departments to develop standard data formats. The Network will encourage collaborations between pathologists and researchers who want access to their pathology data and specimens. It will help pathology departments serve researchers at their institutions by facilitating access to very large numbers of specimens with clinical data. It will also improve access to rare tumors or uncommon presentations of common tumors. In addition institutions participating in the Network will be well positioned to participate in future tissue resources that utilize the Network systems. Finally, Network participants will be able to efficiently identify the specimens and data that they need to create standardized and specialized tissue microarrays. The Shared Pathology Informatics Network will be developed in three overlapping phases:

Organization Phase (Years 1 & 2)

Form a Coordinating Committee to oversee the Network. The Coordinating Committee will meet in person approximately 3 times per year and will hold frequent telephone conference calls to plan and implement the program and to assess progress toward meeting its goals.

Identify and agree on the names of standard data elements. We expect the Network to have access to extensive datasets that encompass a variety of data systems and data types. The Coordinating Committee will examine the institutional data dictionaries to determine which data elements are available and whether the institutions use common terminology. They will then identify the key data elements that should be made available for Network searches. Some of the key data elements will probably be available from some Network institutions but not from others. Participation of the NCI Office of Informatics will aid these efforts since they were responsible for development of common data elements to support sharing of data among the NCI oncology groups and are involved in similar efforts with the SPOREs and other NCI programs. Providing a broad set of search terms, well beyond the common dataset, will enable searchers to identify appropriate cases that are only available from a subset of Network institutions. This feature distinguishes the Shared Pathology Informatics Network from networks that can only search for data elements common to all of the participating institutions.

Agree on coding rules to automatically translate free text pathology reports into a standard nomenclature. Awardees will evaluate the quality of their free-text reports and determine rules for converting free text into standard nomenclature. The Coordinating Committee will need to agree on the specific translation rules to convert text terms to coded nomenclature and to develop procedures for determining whether the derived codes accurately represent the original free-text diagnosis. This effort will be complicated by the current lack of reporting specifications for important characteristics such as tumor size. This aspect alone will stimulate development of better structured pathology reports, which would in itself represent a major advance in pathology informatics. Public domain text translation software is available, but additional programming will be required to optimize its performance at each Network Institution. This task will continue throughout the 5-year grant period and will require close cooperation between the pathologists and the programmers.

Agree on a standard format for replies to requests for information (query replies). Development of a standard query reply format is needed in order for the search broker to merge the multiple replies from several institutions into a single coherent report.

Develop strategies to preserve patient confidentiality. Identify those data elements that must be encrypted, truncated or deleted to protect patient identity.

Component Selection, Development and Implementation Phase (Years 2-3.5) (much of the work described in this section will be pursued concurrently)

Select and develop mechanisms to distribute queries to Network Institutions via the Internet. A single computer server, the Network Server, using the search broker software described below, will provide the interface for all communications between researchers and the Network institutions. Other software on this server provides the security function that authenticates queries, providing information needed by institutions to allow queries through their fire walls. The authentication of queries will prevent unauthorized access to databases in the Network institutions.

Select and implement the search broker software. The search broker software, which resides on the Network Server, handles all communications between researchers and Network Institutions. Each query and query reply is formatted by the search broker. Each communication sent by the search broker is tagged with encrypted information to assure Network Institutions that the request is legitimate. The search broker software also ensures that reports to requesters are complete, properly formatted and exclude confidential information. The Network server is an Internet domain, available to receive and handle queries every minute of every day, and it will require an Internet connection and informatics experts capable of operating the Internet site.

The selection and implementation of the search broker may be the single most expensive undertaking of the Informatics Network. Funds for this development will be restricted and may only be used for the Broker software. Options for this development include using a commercial contractor, an existing NCI informatics contract or modifying existing NCI software. These efforts will require close consultation with the NCI Office of Informatics and coordination with other NCI informatics efforts. Costs may be substantially reduced if the Network can adapt the Public Domain software developed by the NCI Office of Informatics. Program staff will oversee the process to ensure that the funds are spent wisely, federal contracting policy observed, and conflicts of interest avoided.

Develop "handshaking software" to interface between each institutional information system and the search broker. Handshaking software is needed to establish connections between the Network Server and data systems at each participating institution. This is the link that permits queries to cross fire walls and to enter institutional databases as though they originated within the institution. Consultation with suppliers of institutional pathology information systems is required to develop the handshaking software and these costs are included in the budget estimate.

Obtain approval from institutional IRBs for access to patient information. Implementation of the system will require IRB approval of plans to access patient data. The Network will develop protocols to securely transmit query results over the Internet and protocols to ensure patient confidentiality using automatic encryption, deletion or truncation of sensitive or identifying information.

Testing and Validation Phase (Year 3.5-5)

Initial testing of the system to demonstrate that all components work at each institution. Correct software and hardware errors and improve performance of components. Develop operation and repair protocols for the system.

Alpha test phase. Test queries will be developed from actual inquiries to NCI tissue resources, such as the Cooperative Breast Cancer Tissue Resource. A parallel test of SEER data may also be initiated. The performance of the Informatics Network will be evaluated by the Coordinating Committee. Particularly, do query replies identify all of the available cases with the requested diagnosis? This can be assessed by comparing the query response data from each institution with the information obtained by searching locally (bypassing the translation and handshaking software). Free-text pathology reports will also be reviewed to determine whether relevant cases were missed. The system will be iteratively improved and retested.

Beta test phase. Access to the Network will be by invitation and password protected Queries from program staff, participants and selected members of the research community will be solicited and the effectiveness and ease of use of the system evaluated and the system optimized.

3. Current Portfolio Analysis:

While these activities are closely related to efforts by the Office of Informatics to develop systems to exchange data in support of a variety of NCI activities, they are complementary and not duplicative. The Office of Informatics is actively developing common data elements for clinical trials with the NCI clinical cooperative groups and SPOREs and other programs. They have agreed to act as consultant to the Network Coordinating Committee and to provide technical assistance and access to software that they have already developed. To avoid overlap and to promote integration of NCI informatics systems, the Shared Pathology Informatics Network will coordinate its activities closely with the complementary activities of the NCI Office of Informatics.

The NCI Office of Science Policy is overseeing the Scientific Information System (SIS) Project, the NCI Cancer Information Gateway, based on the Gazebo Gateway from the National Computational Science Alliance (NCSA). They are developing tool s to support network queries distributed over multiple databases . However, they are not addressing queries of pathology data systems. We expect the Shared Pathology Informatics Network to have access to the SIS network query tools developed by NCI.

Major informatics efforts are also supported by other government agencies such as the NLM, CDC, and AHCPR. Their main purpose is to create useful networks of electronic medical information for use by clinicians. These efforts, while unrelated, may result in new standards or technologies that could be adapted for use by the Shared Pathology Informatics Network.

4. Justification for Use of RFA or PA Mechanism:

This initiative involves creating a Shared Pathology Informatics Network to access confidential data housed in hospitals or large commercial laboratories. These activities will require cooperation among academic pathologists with a thorough knowledge of terminology and nomenclatures and informatics specialists with access to and specialized knowledge of their institutional databases. The Network Coordinating Committee will make operational decisions, including the choice of nomenclature, standard data format, security measures, and factors related to current community standards. These complex decisions cannot be specified in a Statement of Work. The project must respond rapidly to changing technologies and to emerging legal and ethical paradigms and regulations. These intrinsically complex and fluid and professional activities cannot be conducted via the contract mechanism. An RFA is needed to provide set-aside funds, and it is unlikely that investigators would respond to such a request without assurance that funds are available and of an NCI review.

5. Justification of Use of Cooperative Agreement (if applicable):

This project will require extensive coordination by NCI staff. Use of the Cooperative Agreement mechanism will allow program staff to coordinate a highly complex project that involves cooperative efforts by the Network awardees and consultation with the NCI Office of Informatics. The Program Director will participate as a voting member of the Network Coordinating Committee and must closely coordinate the efforts of Network awardees and those of the Office of Informatics to develop software that securely sends and receives confidential information over the Internet. This will assure comparability between the Shared Pathology Informatics Network and related NCI efforts. Applications for Cooperative Agreements can only be received in response to an announcement from NIH institutes or centers.