Subject: Shared Pathology Informatics Network Concept
The Concept for a Shared Pathology Informatics Network was approved at the Nov
8-9 NCI Board of Scientific Advisors meeting. Please keep in mind that approval
of a Concept does not always mean that an RFA will follow. The Concept is a
public document and can be distributed freely.
Jules J. Berman, Ph.D., M.D.
Shared Pathology Informatics Network
1. Background:
The objective of this initiative for a Shared Pathology Informatics Network is
to create a Web-based model system that can request and receive data from
existing medical databases at multiple institutions. The system could
facilitate a wide variety of research efforts. It will be able to automatically
identify and obtain the requested data for cases that meet defined search
criteria and which have archived tissue specimens. The identity of the patient
and other identifying information will be encrypted or otherwise modified to
protect patient confidentiality. The system will enable researchers to quickly
review the characteristics of large numbers of archived specimens in order to
plan marker or assay validation studies. The ability to automatically access
information from medical databases, is the first step toward the long-term goal
of developing informatics systems to support NCI's efforts to improve
researchers access to human specimens and clinical data.
The technology needed to develop the Shared Pathology Informatics Network
already exists. Hospitals and medical institutions have developed sophisticated
informatics systems that can be searched to meet their internal needs and to
provide electronic data to third party payers and to federal agencies.
Increasingly sophisticated software tools are available to facilitate
communications among systems with different architecture and search strategies.
The Shared Pathology Informatics Network will take advantage of the availability
of, and rapid advances in, Internet technology to access pathology data residing
in multiple institutions.
The Network will create the systems needed to access pathology data related to
stored specimens. Vast numbers of specimens currently exist in pathology
archives. Many pathology laboratories never discard archived tissues, and some
have collections that date from the 19th century. The National Bioethics
Advisory Commission reported that in the U.S. there are at least 282 million
stored specimens. Most pathology departments have, for at least the last 10
years, stored patient data related to specimens in electronic form. Collecting
and updating the clinical data stored in centralized databases is a major cost
of creating and maintaining useful specimen resources for clinical research.
The need for individual tissue banks would be greatly reduced if systems were
developed to take maximal advantage of the availability of tissue and linked
electronic data from pathology practices. This proposed Network will create a
practical system to access pathology data. Future initiatives could apply that
system to create effective, efficient virtual resources to make the vast
archives of tissue and data accessible for research.
2. Purpose of RFA/PA:
The purpose of the initiative is to request applications from institutions
interested in developing a web-based Shared Pathology Informatics Network to
access pathology data linked to tissue specimens from pathology databases in
multiple institutions. We plan to fund 7 to 10 institutions for a period of 5
years at approximately $300,000 per year. Over five years the Network will
develop and test the communications protocols needed to access data
simultaneously from every participating institution. The data to be searched
will include patient demographics, diagnostic information, vital status,
clinical history, outcome data, and when available, information related to
recurrence and treatment. The data returned by participating institutions will
be collated and returned to the requestor as a structured report.
The application of the shared pathology data system to create a tissue resource
will require that institutions other than those participating in this Network be
connected. One of the efforts of this Network will be to define procedures for
connecting new institutions. Since the query system will interact with existing
institutional databases, new institutions can be added to the Network after
customizing the data interface software to meet their institutional
requirements. We anticipate that this process will get progressively easier
once the Network has fully defined the key data elements and gained experience
with the operating characteristics of various commercial pathology data systems.
The RFA will require applicants to demonstrate efforts and expertise in
pathology-related informatics and to provide evidence of a cooperative
relationship between pathologists and hospital information systems managers.
Institutions responding to the RFA must have existing information systems that
store and retrieve patient information, including demographics and pathology
reports and have access to archived specimens. Institutions with access to
hospital clinical data and cancer registry data, in addition to pathology data,
will have a distinct advantage in the competition.
The NCI Office of Informatics, which is responsible for coordinating NCI
informatics efforts, has developed software to support distributed Internet
queries. They have agreed to collaborate with the Shared Pathology Informatics
Network and to make their Internet query software available . Their
participation in this endeavor will ensure that Network systems build on past
NCI informatics activities and are consistent with and compatible with current
and future efforts.
There are ample reasons to believe that pathology departments will want to
participate in a Shared Pathology Informatics Network and the related
initiatives that might follow. Many pathology departments already have a strong
interest in pathology informatics and have established informatics programs.
There is even considerable momentum toward the development of a sub-specialty in
Pathology Informatics. The Network efforts will provide real incentives for
pathology departments to develop standard data formats. The Network will
encourage collaborations between pathologists and researchers who want access to
their pathology data and specimens. It will help pathology departments serve
researchers at their institutions by facilitating access to very large numbers
of specimens with clinical data. It will also improve access to rare tumors or
uncommon presentations of common tumors. In addition institutions participating
in the Network will be well positioned to participate in future tissue resources
that utilize the Network systems. Finally, Network participants will be able to
efficiently identify the specimens and data that they need to create
standardized and specialized tissue microarrays.
The Shared Pathology Informatics Network will be developed in three overlapping
phases:
Organization Phase (Years 1 & 2)
Form a Coordinating Committee to oversee the Network. The Coordinating Committee
will meet in person approximately 3 times per year and will hold frequent
telephone conference calls to plan and implement the program and to assess
progress toward meeting its goals.
Identify and agree on the names of standard data elements. We expect the
Network to have access to extensive datasets that encompass a variety of data
systems and data types. The Coordinating Committee will examine the
institutional data dictionaries to determine which data elements are available
and whether the institutions use common terminology. They will then identify
the key data elements that should be made available for Network searches. Some
of the key data elements will probably be available from some Network
institutions but not from others. Participation of the NCI Office of
Informatics will aid these efforts since they were responsible for development
of common data elements to support sharing of data among the NCI oncology groups
and are involved in similar efforts with the SPOREs and other NCI programs.
Providing a broad set of search terms, well beyond the common dataset, will
enable searchers to identify appropriate cases that are only available from a
subset of Network institutions. This feature distinguishes the Shared Pathology
Informatics Network from networks that can only search for data elements common
to all of the participating institutions.
Agree on coding rules to automatically translate free text pathology reports
into a standard nomenclature. Awardees will evaluate the quality of their
free-text reports and determine rules for converting free text into standard
nomenclature. The Coordinating Committee will need to agree on the specific
translation rules to convert text terms to coded nomenclature and to develop
procedures for determining whether the derived codes accurately represent the
original free-text diagnosis. This effort will be complicated by the current
lack of reporting specifications for important characteristics such as tumor
size. This aspect alone will stimulate development of better structured
pathology reports, which would in itself represent a major advance in pathology
informatics. Public domain text translation software is available, but
additional programming will be required to optimize its performance at each
Network Institution. This task will continue throughout the 5-year grant period
and will require close cooperation between the pathologists and the programmers.
Agree on a standard format for replies to requests for information (query
replies). Development of a standard query reply format is needed in order for
the search broker to merge the multiple replies from several institutions into a
single coherent report.
Develop strategies to preserve patient confidentiality. Identify those data
elements that must be encrypted, truncated or deleted to protect patient
identity.
Component Selection, Development and Implementation Phase (Years 2-3.5)
(much of the work described in this section will be pursued concurrently)
Select and develop mechanisms to distribute queries to Network Institutions via
the Internet. A single computer server, the Network Server, using the search
broker software described below, will provide the interface for all
communications between researchers and the Network institutions. Other software
on this server provides the security function that authenticates queries,
providing information needed by institutions to allow queries through their fire
walls. The authentication of queries will prevent unauthorized access to
databases in the Network institutions.
Select and implement the search broker software. The search broker software,
which resides on the Network Server, handles all communications between
researchers and Network Institutions. Each query and query reply is formatted
by the search broker. Each communication sent by the search broker is tagged
with encrypted information to assure Network Institutions that the request is
legitimate. The search broker software also ensures that reports to requesters
are complete, properly formatted and exclude confidential information. The
Network server is an Internet domain, available to receive and handle queries
every minute of every day, and it will require an Internet connection and
informatics experts capable of operating the Internet site.
The selection and implementation of the search broker may be the single most
expensive undertaking of the Informatics Network. Funds for this development
will be restricted and may only be used for the Broker software. Options for
this development include using a commercial contractor, an existing NCI
informatics contract or modifying existing NCI software. These efforts will
require close consultation with the NCI Office of Informatics and coordination
with other NCI informatics efforts. Costs may be substantially reduced if the
Network can adapt the Public Domain software developed by the NCI Office of
Informatics. Program staff will oversee the process to ensure that the funds
are spent wisely, federal contracting policy observed, and conflicts of interest
avoided.
Develop "handshaking software" to interface between each institutional
information system and the search broker. Handshaking software is needed to
establish connections between the Network Server and data systems at each
participating institution. This is the link that permits queries to cross fire
walls and to enter institutional databases as though they originated within the
institution. Consultation with suppliers of institutional pathology information
systems is required to develop the handshaking software and these costs are
included in the budget estimate.
Obtain approval from institutional IRBs for access to patient information.
Implementation of the system will require IRB approval of plans to access
patient data. The Network will develop protocols to securely transmit query
results over the Internet and protocols to ensure patient confidentiality using
automatic encryption, deletion or truncation of sensitive or identifying
information.
Testing and Validation Phase (Year 3.5-5)
Initial testing of the system to demonstrate that all components work at each
institution. Correct software and hardware errors and improve performance of
components. Develop operation and repair protocols for the system.
Alpha test phase. Test queries will be developed from actual inquiries to NCI
tissue resources, such as the Cooperative Breast Cancer Tissue Resource. A
parallel test of SEER data may also be initiated. The performance of the
Informatics Network will be evaluated by the Coordinating Committee.
Particularly, do query replies identify all of the available cases with the
requested diagnosis? This can be assessed by comparing the query response data
from each institution with the information obtained by searching locally
(bypassing the translation and handshaking software). Free-text pathology
reports will also be reviewed to determine whether relevant cases were missed.
The system will be iteratively improved and retested.
Beta test phase. Access to the Network will be by invitation and password
protected Queries from program staff, participants and selected members of the
research community will be solicited and the effectiveness and ease of use of
the system evaluated and the system optimized.
3. Current Portfolio Analysis:
While these activities are closely related to efforts by the Office of
Informatics to develop systems to exchange data in support of a variety of NCI
activities, they are complementary and not duplicative. The Office of
Informatics is actively developing common data elements for clinical trials with
the NCI clinical cooperative groups and SPOREs and other programs. They have
agreed to act as consultant to the Network Coordinating Committee and to provide
technical assistance and access to software that they have already developed.
To avoid overlap and to promote integration of NCI informatics systems, the
Shared Pathology Informatics Network will coordinate its activities closely with
the complementary activities of the NCI Office of Informatics.
The NCI Office of Science Policy is overseeing the Scientific
Information System (SIS) Project, the NCI Cancer Information Gateway, based on
the Gazebo Gateway from the National Computational Science Alliance (NCSA).
They are developing tool s to support network queries distributed over multiple
databases . However, they are not addressing queries of pathology data systems.
We expect the Shared Pathology Informatics Network to have access to the SIS
network query tools developed by NCI.
Major informatics efforts are also supported by other government
agencies such as the NLM, CDC, and AHCPR. Their main purpose is to create
useful networks of electronic medical information for use by clinicians. These
efforts, while unrelated, may result in new standards or technologies that could
be adapted for use by the Shared Pathology Informatics Network.
4. Justification for Use of RFA or PA Mechanism:
This initiative involves creating a Shared Pathology Informatics Network
to access confidential data housed in hospitals or large commercial
laboratories. These activities will require cooperation among academic
pathologists with a thorough knowledge of terminology and nomenclatures and
informatics specialists with access to and specialized knowledge of their
institutional databases. The Network Coordinating Committee will make
operational decisions, including the choice of nomenclature, standard data
format, security measures, and factors related to current community standards.
These complex decisions cannot be specified in a Statement of Work. The project
must respond rapidly to changing technologies and to emerging legal and ethical
paradigms and regulations. These intrinsically complex and fluid and
professional activities cannot be conducted via the contract mechanism. An RFA
is needed to provide set-aside funds, and it is unlikely that investigators
would respond to such a request without assurance that funds are available and
of an NCI review.
5. Justification of Use of Cooperative Agreement (if applicable):
This project will require extensive coordination by NCI staff. Use of the
Cooperative Agreement mechanism will allow program staff to coordinate a highly
complex project that involves cooperative efforts by the Network awardees and
consultation with the NCI Office of Informatics. The Program Director will
participate as a voting member of the Network Coordinating Committee and must
closely coordinate the efforts of Network awardees and those of the Office of
Informatics to develop software that securely sends and receives confidential
information over the Internet. This will assure comparability between the Shared
Pathology Informatics Network and related NCI efforts. Applications for
Cooperative Agreements can only be received in response to an announcement from
NIH institutes or centers.