Berman JJ. Pathology Data Integration with eXtensible Markup Language. Human Pathology 2005, 36(2):139-145.
key words: XML, pathology informatics, data integration, data standards
Abstract
It is impossible to overstate the importance of XML (eXtensible Markup Language) as a data organization tool. With XML, pathologists can annotate all of their data (clinical and anatomic) in a format that can transform every pathology report into a database, without compromising narrative structure. The purpose of this manuscript is to provide an overview of XML basics for pathologists. Examples will demonstrate how pathologists can use XML to annotate the individual data elements and to structure reports in a common format that can be merged with other XML files or queried using standard XML tools. This manuscript gives pathologists a glimpse into how XML annotation can benefit patients, enhance their ability to compete for research funding, and reduce their dependence on centralized, proprietary databases.
Background
These are interesting times for pathologists charged with managing laboratory information. Everyone seems to want pathology data, but federal regulations restrict access to medical records.1-4 The world of information technology encourages open source software solutions, but large medical centers are opting for megamillion dollar commercial information systems that lock hospital data into proprietary systems. Bionformaticians luxuriate in freely available standards and databases, while the procedure terminologies and disease nomenclatures used by pathologists, surgeons and clinicians are closely guarded intellectual properties.5 Pathologists are told that they must include specific data elements in their cancer-related reports,6 but they have no standard methods to utilize this data (i.e., retrieve the data, integrate the data with related clinical or research datasets, and merge the data from multiple institutions). The Internet and broadband telecommunications permit the rapid transmission of huge amounts of information, but institutional Security Officers seem devoted to ensuring that pathology information never crosses the local firewall.
It seems that two distinct tracks of biomedical data have emerged: the medical track that captures data in proprietary information systems, and the research model that has developed a startling array of free and open methods for data organization, data integration and data sharing.7-15
Many pathologists seem to be unaware of the progress made by biologists in the field of data annotation and data integration.16-22 Data annotation is the very simple concept that every piece of data in a record can be annotated with another set of information that describes the data (so-called metadata). Once data has been annotated, it can be associated with other, related data [data integration], even when the other data is found in a seemingly unrelated database.23 In the past few years, biomedical informatics has transformed into the science that derives biomedical value from computations performed on annotated databases.
The purpose of this manuscript is to review some of the advances in biomedical informatics, identifying specific methods that can be used in a pathology setting. The manuscript will provide an overview of XML (eXtensible Markup Language), the most important advance in data organization since the invention of the book.23-25 XML documents have properties that permit data integration between research databases and pathology data. There are mutual advantages for pathologists who understand and utilize data integrating technologies and for biomedical researchers who require specific pathologic data related to human diseases.
XML (eXtensible Markup Language)
The importance of XML as a data organizing tools cannot be overstated. As a data organizing technology, it is as important as the invention of written language (circa 3000 BC), or the mass-printed book (circa 1450 AD). At its simplest, XML is a method for marking up files so that every piece of data is surrounded by bracketed text that describes the piece of data (e.g. <number>5</number>). Markup allows us to convey any message as XML (a pathology report, a radiology image, a genome database, a software program, an email).
XML markup tags are sets of alphanumeric descriptors enclosed by angle brackets. Each tag is repeated at the beginning and end of the data element, the ending tag demarcated by a slant character "/".
The following are examples of XML markup.
<name_of_patient>John Public</name_of_patient>
<age_of_patient>25 years</age_of_patient>
<gender_of_ patient>Male</gender_of_ patient>
<birthdate_of_patient>January 1, 1954</birthdate_of_patient>
Inspection of these annotations reveals that XML lets us describe data. When the data element "25 years" is flanked by an <age_of_patient> tag, we can be sure that it's not referring to an anniversary event or a mortgage.
XML tags can appear in narrative text. For example:
<pathologic_diagnosis>Adenocarcinoma of colon</pathologic_diagnosis>extending into the <anatomic_site>muscularis propria</anatomic_site>.
The extra verbiage carried by XML annotations are unsightly, but browsers can easily remove annotation tags from the text made visible on computer screens. To everyone concerned, an XML document can display just like any other text. In fact, I am composing this paper on ABI Word, a free, open source XML editor.26 As I type, the editor adds XML formatting tags to the document file, but all those tags are invisible to me and you.
Generally, XML documents fall into one of two different types: structural or data-centric.
Possibly the earliest published example of an XML specification for surgical pathology was published by Berman and Moore in 2000.19 It is an example of a structural XML document. Excerpts are shown, with clipped text represented by "....":
....
<path_report>
....
<submitting_service> SURGERY </submitting_service>
<pathologist> JOHN Q PATHOLOGIST MD </pathologist>
<patient>
<patient_name> VETERAN,JOHN Q. </patient_name>
<patient_identifier> 123-45-6789 </patient_identifier>
....
<gross_description>
1. THE SPECIMEN IS RECEIVED FRESH, LABELED WITH THE PATIENT'S NAME, AND ADDITIONALLY LABELED "LARYNGECTOMY".
THE SPECIMEN CONSISTS OF A LARYNGECTOMY RESECTION, MEASURING 10.5 X 5.5 X 3.5 CM. THE LARYNX IS EDEMATOUS. THE LARYNX IS OPENED POSTERIORLY, TO REVEAL AN IRREGULARITY OF APPARENT TUMOR, ON THE SURFACE OF THE LEFT TRUE VOCAL CORD, MEASURING 3.0 X 1.5 CM. THE TUMOR DOES NOT APPEAR TO INVOLVE THE SUBGLOTTIS, NOR THE ANTERIOR COMMISSURE. THE SUPERIOR, INFERIOR, ANTERIOR, AND POSTERIOR MARGINS ARE GROSSLY UNINVOLVED BY TUMOR REPRESENTATIVE SECTIONS OF TUMOR ARE SUBMITTED,
....
</gross_description>
</gross>
<diagnosis>
<diagnosis_number> 1 </diagnosis_number>
<disease_concept> SQUAMOUS CELL CARCINOMA </disease_concept>
....
</path_report>
....
This XML-based pathology report is very easy to read. The XML tags roughly correspond to the familiar sections of a pathology report. This structural XML document can be compared with a data-centric XML document. The following is an excerpt from a tissue microarray data file that conforms to the recently proposed Tissue Microarray Data Exchange Specification developed by the Association for Pathology Informatics:27,28
<record>
<cpctr:IMS_Case_Identifier>1053371588</cpctr:IMS_Case_Identifier>
<cpctr:Location_Code>G61</cpctr:Location_Code>
<cpctr:Race>Caucasian</cpctr:Race>
<cpctr:Year_of_Birth>1926</cpctr:Year_of_Birth>
<cpctr:Year_of_Diagnosis>1991</cpctr:Year_of_Diagnosis>
<cpctr:Year_of_Prostatectomy>1991</cpctr:Year_of_Prostatectomy>
<cpctr:Is_Residual_Carcinoma_Present>Yes
</cpctr:Is_Residual_Carcinoma_Present>
<cpctr:Gleason_Primary_Grade>4</cpctr:Gleason_Primary_Grade>
<cpctr:Gleason_Secondary_Grade>3</cpctr:Gleason_Secondary_Grade>
<cpctr:Gleason_Sum_Score>7</cpctr:Gleason_Sum_Score>
<cpctr:Number_of_Nodes_Examined>5</cpctr:Number_of_Nodes_Examined>
<cpctr:Number_of_Nodes_Positive>0</cpctr:Number_of_Nodes_Positive>
<cpctr:pT_Stage>pT3b</cpctr:pT_Stage>
<cpctr:pN_Stage>pN0</cpctr:pN_Stage>
<cpctr:pM_Stage>pMX</cpctr:pM_Stage>
<cpctr:Vital_Status>Alive</cpctr:Vital_Status>
<array_locations>row 9, column 18|row 10, column 4</array_locations>
</record>
This fragment of XML is relatively easy to read, but it does not provide an obvious structure for the data elements. It consists of data flanked by XML tags, and little else. Data-centric XML files are roughly equivalent to databases or spreadsheets. In fact, it is exceedingly easy to port data-centric XML files to and from other data structures.28
Why is XML so important if it serves only one of two simple purposes: 1) markup that divides chunks of text, or 2) markup that flanks individual data elements? The value of XML comes from a handful of properties.
The Six Special Properties of XML
XML is endowed with a set of six properties that permit XML files to be self-descriptive (able to describe every aspect of its own content and organization) and "aware" of their own data and the data in the internet universe. These are:
1. Enforced and defined structure (XML rules and schema)
2. Formal metadata (through ISO11179 specification)
3. Namespaces (permits sharing of uniquely identifiable CDEs)
4. Linking of data via the internet (through Unique Resource Identifiers)
5. Logic and meaning (the Semantic Web)
6. Self-awareness (embedded protocols and commands)
Enforced and defined structure (XML rules and schemas)
Two terms describe XML conformance: well-formedness and validity. A file that contains XML markup is considered an XML file only if it is well-formed. That is, it must have a proper XML header; it must consist of text in a readable form (typically the simple letters and punctuation found on a keyboard), and it must follow the general rules for using tagging data. The header can vary somewhat, but it usually looks something like: <?xml version="1.0" ?>. Tags must have a certain form (e.g. spaces are not permitted within a tag), and tags must be properly nested (i.e. no overlapping). For example, <chapter><chapter_title>Pathlogists love XML</chapter_title></chapter> is nicely nested XML. <chapter><chapter_title>Pathologists love XML</chapter></chapter_title> is improperly nested. Most current browsers parse through files that have a .XML suffix to determine if they are well-formed. If they break any of the rules, an error message is generated.
A well-formed XML file must be structured according to either a DTD (Document Type Definition) or to a schema before it can be considered a valid XML document. DTDs and schemas are blocks of descriptors that specify the structure and content of an XML file. Nothing in the world of XML has engendered as much confusion and acrimony as the issue of how best to specify the descriptor block. Suffice it to say that a variety of schema languages have appeared.
Regarding schemas, the only point worth noting is that when an XML file is described by a schema and parsed by a validating browser (or by a so-called XML parser), you can be certain that the data contained in the file conforms to a specified structure and content. Files using the same schema will have the same data organization, and this greatly facilities data integration between files. A valid XML file may contain the schema within the file (always near the start of the document). Or, a valid XML file may have a linking tag that contains the unique identification/location of an external schema file.
Formal metadata (through the ISO11179 specification)
The concept of formalized metadata is quite simple. Unfortunately, formalized metadata is seldom implemented by XML designers.
Consider the seemingly obvious metadata tag, <date>. Pathologists may think of this tag as a calendar day. Farmers may think of this tag as a fruit. Co-eds may think of this as something that is often blind.
Consider the following: <date>09/05/15</date>
An American may think this represents September 5, 1915. A Englishman may think this is May 15, 1909. Others may interpret this date as September 5, 2015 or May 15, 2009.
In fact, the International Standards Organization thought it so important to have standards for the representation of time and dates that they created ISO 86015 for four metadata tags: date, time, dateTime and timePeriod.29 Incidentally, the standard representation for calendar date is YYYY-MM-DD.
The International Standards Organization has created a standard way of defining metadata tags (also known as Common Data Elements or CDEs). This standard, the ISO 11179 specifies that metadata should have a qualified name or identifier, an authority that registers the name, a versioning history (allowing for modifications), a language or origin, a statement relating to usage, a data typing statement, and a definition that is unambiguous.16 XML files should always include a pointer to the internet location for the metadata definition file. The creators of the TMA data Exchange Specification have created a file that lists each of the 80 XML tags used in the specification, along with the ISO 11179 descriptors for each tag.27
This metatdata definition file for the TMA data exchange specification currently resides at: [http://12.183.10.150/jjb/tma_cde.htm]. An example is shown for the metadata tag <slide_level>:
slide_level
Identifier: slide_level
Version: 1.0
Registration Authority: Association for Pathology Informatics
Language: English (en)
Obligation: Optional
Datatype: Integer
Maximum Occurrence: Unlimited
Definition: This is an integer corresponding to the level of the block for this slide, e.g. 25. Comment: Some people may include the level number in the slide_identifier element (as a suffix). If this is the case, they should either redundantly include the suffix in this element or describe how the level may be extracted from the slide_identifier in the
block_protocol element.
Without clear definitions for XML metadata tags, the meaning of XML data is virtually nil.
Namespaces (permits sharing of uniquely identifiable metadata tags)
A word may mean different things to different people, and that's why we carefully define metadata (using the ISO 11179 specification). Defining the metadata elements in an XML file ensures that anyone can understand the use of a tag within the file. A problem arises when two different XML files use the equivalently named tag to mean different things. For instance, the farmer's XML file defines the <date> tag as a fruit, while the astronmer's XML file defines the <date> tag as something else entirely. XML deals with this problem by creating protected namespaces for data elements.30
Every data element can be prefixed by a specific namespace (defined metadata collection). Consider the header section for the TMA Data Exchange specification. Within the root element three different namespaces are announced using the xmlns (XML namespace) attribute.
<?xml version="1.0" ?>
<histo xmlns="http://12.183.10.150/jjb/tma_cde.htm"
xmlns:cpctr="http://www.pathology.pitt.edu/pdf/cpctr/cpctr-cde-v22.pdf"
xmlns:dc="http://dublincore.org">
Metadata tags used in the file are taken from three different external sources! These sources are the previously described listing of metadata tags provided by the tissue microarray data exchange specification (xmlns="http://12.183.10.150/jjb/tma_cde.htm"),
the metadata tags provided by the Cooperative Prostate Cancer Tissue Resource (cpctr="http://www.pathology.pitt.edu/pdf/cpctr/cpctr-cde-v22.pdf"), and the metadata tags provided by the Dublin Core, an association of library scientists who have devised a set of standard header elements for XML files (xmlns:dc="http://dublincore.org"). The prefix designated for each source (e.g., "dc" or "cpctr") is conveyed to the metadata tags used in the XML file.
For instance: <dc:creator>CPCTR</dc:creator>. This indicates that the "creator" tag is derived from the "dc" metadata source. Or: <cpctr:slide_stain>FISH</cpctr:slide_stain>. This indicates that the "slide_stain" is derived from the "cpctr" metadata source.
In fact, a single XML file can use the "date" metadata to mean the calendar date or the fruit date, so long as each use of a metadata element is prefixed with the correct namespace.
This simple annotation is a powerful informatics tool. It allows XML creators to choose metadata from a variety of different sources, ensuring that every data element is associated with unambiguous metadata.
Linking related data via the internet
Data obtained from tissue microarray studies provide experimental data related to many biopy cores. The tissue specimen sampled in the tissue microarray core may have been provided by a tissue bank, and the tissue bank may have created an XML file that contains data related to all the tissue specimens in its collection. Snippets from two files may look something like this:
Tissue microarray file:
<cpctr:core>
<gtb:Bank_Identifier>1053371588</gtb:Bank_Identifier>
Tissue banker's XML file:
<tissue_sample>
<gtb:Bank_Identifier>1053371588</gtb:Bank_Identifier>
The same data element, representing the tissue banker's identifier, is present in both files. This means that it is possible for a software agent encountering the tissue bank entry in the tissue microarray file to reach through the internet, locating additional information related to the same tissue bank sample in the tissue banker's file. Tissue banks sometimes update their datasets with treatment or outcome data related to patients with banked tissue samples. The software agent that interrogates the tissue bank XML database may retrieve information that enhances the value of the information contained in the tissue microarray file.
How is this actually accomplished? How does the software agent know where to go? The topic of internet software agents and the methodology for connecting XML files is complex. Suffice it to say that methods for locating and retrieving data from external XML files are widely used.31 All such methods depend on the notion of a Uniform Resource Identifier (URI) for every information document. The type of URI most familiar to web surfers is the web address, also known as the Uniform Resource Locator (URL).
In the Cooperative Prostate Cancer Tissue Resource's implementation of the Tissue Microarray Data Exchange Standard, the protocol used to create the tissue microarray block and the protocol for producing slide sections are both referenced using a URL.
<block_protocol>http://www.pathology.pitt.edu/pdf/cpctr/block.htm</block_protocol>
<slide_sectioning_protocol>http://www.pathology.pitt.edu/pdf/cpctr/section.htm
</slide_sectioning_protocol>
The URL naming system employed by the World Wide Web is the underlying standard that permits software to reach data anywhere on the internet.
Logic and meaning (the Semantic Web)
Although the technical methologies associated with XML can be daunting, the most difficult issues always relate to the meaning things. A variety of formal approaches have been proposed to reach the level of meaning within the context of XML. By far, the simplest of these is the Resource Description Framework (RDF).23 This model proposes that all pairs of data and associated metadata are about something. If you simply specify a relationship between data, metadata and the subject of the data, you take a giant step toward providing meaning to your XML records.
<rdf:Description rdf:about="http://www.pathology_stuff/report1.htm>
<pathologist>Dr. Tumori</pathologist>
</rdf:Description
This trivial example demonstrates the basic RDF model. The first line indicates that the RDF description concerns a unique object specified by a filename, report1.htm, located at a specified web address. This is followed by a data/metadata pair indicating that the pathologist associated with the report is Dr. Tumori. In an actual implementation, the pathology report may be associated with many different data/metadata pairs.
The importance of the RDF model is that it binds data and metadata to a unique object with a web location. Consistent use of the RDF model assures that data anywhere on the the web can always be connected through unique objects with RDF descriptions. The association of described data with a unique object confers meaning and greatly advances our ability to integrate data over the internet.
Self-awareness (embedded protocols and commands)
Unlike databases, which are nothing more than structured collections of data, XML documents are not limited to descriptions of data elements. An XML file may contain simple data or it may contain logical assertions, or queries, or program commands in a designated programming language. In fact, anything can be expressed in XML. A variety of methods permit cross-internet communication between XML files. These technologies make it possible for XML files to attain a type of intelligence. When an XML file is capable of displaying autonomous behavior, posing questions to external files, generating replies to received questions and modifying its own content, it is usually referred to as a software agent. Needless to say, this is a exciting area for future work.
Obstacles
It would be misleading to suggest that XML is a panacea. The design of XML files requires great wisdom. roducing a complex data structure of well-described data does not always yield benefit.32 Personally, I tend to abandon XML schemas that I cannot understand. Although software tools can resolve any valid schema, rendering the contents of an XML file as a computer-ready data structure, I tend not to trust abstractions that I cannot grasp.
Many pathologists are unaware that two standards now exist for pathology data. DICOM
(Digital Imaging and Communications in Medicine) is an ISO standard for medical images. DICOM was created as a machine standard for the transfer of electronic radiology images. A visible light version of DICOM was created so that microscopists and endoscopists can exchange images according to a standard.33 To my knowledge (based on considerable investigation) no pathology department in the world is currently using the DICOM visible light standard for their histologic images. The standard is written using a technical specification that is too difficult to implement.
The ASTM (American Society of Testing and Material Data Type Definition) has created a standard schema for pathology reports.34 I have never encountered a pathologist who is even aware of this XML standard. It seems that new standards are created faster than they can be embraced.
Much of the value of XML derives from the common usage of standard XML schemas and standard metadata. Successful data integration requires that colleagues from many different biomedical disciplines adhere to the rules of XML and apply those rules in a manner that makes sense to their interdisciplinary colleagues. This is a difficult task.
Discussion
Many pathologists are uncomfortable with informatics issues. They like to think of themselves as "docs" and take great pride in their special diagnostic skills. The world stands in awe when a pathologist determines a patient's fate by viewing a few cells under a microscope. One of the unintentional by-products of service pathology is lots and lots of data. Pathology reports captured by modern laboratory information systems are richly annotated with such data elements as transaction time-stamps, bar-code values, encryption hashes, patient unique identifiers, message headers, and data element tags. It is tempting to allow others more skilled in these matters to unburden us from data-intensive tasks. The problem is that pathologists are the only people who understand pathology data, and pathologists are the only people who can sensibly integrate pathology data with related clinical and biological data.
Even the most computer-phobic pathologists can play a vital role if they they understand the important role of pathology data in the grand scheme of things. Despite the complexities of biomedical data, XML is a simple artifact whose basic purpose (making data understandable) is achievable.
Acknowledgments
This manuscript represents the opinion of the author and does not represent official policy of the NIH or of any other federal agency.
References
1. Department of Health and Human Services. 45 CFR (Code of Federal Regulations), Parts 160 through 164. Standards for Privacy of Individually Identifiable Health Information (Final Rule). Federal Register: December 28, 2000 (Volume 65, Number 250)], Pages 82461-82510 [http://aspe.hhs.gov/admnsimp/final/PvcPre01.htm]
2. Department of Health and Human Services.45 CFR (Code of Federal Regulations), 46. Protection of Human Subjects (Common Rule). 56 Federal Register, June 18, 1991, volume 56, p. 28003 [http://ohrp.osophs.dhhs.gov/humansubjects/guidance/45cfr46.htm]
3. Final NIH statement on sharing research data.
[http://grants1.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html]
4. Berman JJ. Racing to share pathology data. Am J Clin Pathol 121:169-171, 2004
5. Michael Y. Galperin.The Molecular Biology Database Collection: 2004 update
Nucl Acids Res 32: D3-D22, 2004
6. Connolly JL, Fletcher CD. What is needed to satisfy the American College of Surgeons Commission on Cancer (COC) requirements for the pathologic reporting of cancer specimens? Hum Pathol 34:111, 2003
7. Berman JJ. Concept-Match Medical Data Scrubbing: How pathology datasets can be used in research. Arch Pathol Lab Med 127:680-686, 2003
8. Berman JJ. Threshold protocol for the exchange of confidential medical data. BMC Medical Research Methodology 2:12, 2002
9. Berman JJ. Confidentiality for Medical Data Miners. Artif Intell Med 26:25-36, 2002
10. Berman JJ. Zero-Check: A Zero-Knowledge Protocol for Reconciling Patient Identities Across Institutions. Archives of Pathology and Laboratory Medicine 128:344-346, 2004
11. Malin B, Sweeney L. How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J Biomed Inform 37:179-92, 2004
12. L. Sweeney. Guaranteeing anonymity when sharing medical data, the Datafly system. Proc AMIA Ann Fall Symp 51-55, 1997
13. Spellman PT, Miller M, Stewart J: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 23;3(9):RESEARCH0046, 2002
14. Harris MA, Clark J, Ireland A, et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004 Jan 1;32 Database issue:D258-261
15. Covitz PA, Hartel F, Schaefer C, De Coronado S, Fragoso G, Sahni H, Gustafson S, Buetow KH. caCORE: a common infrastructure for cancer informatics. 19:2404-2412, 2003
16. Solbrig HR: Metadata and the reintegration of clinical information: ISO 11179. MD Comput 2000, 3:25-28
17. Karasavvas KA, Baldock R, Burger A. Bioinformatics integrations and agent technology. J of Biomedical Informatics 37:205-219, 2004.
18. Cantor MN, Lussier YA. Putting data integration into practice: using biomedical terminologies to add structure to existing data sources. Proc AMIA Symp 125-129, 2003
19. Moore GW and Berman JJ. Anatomic Pathology Data Mining. In: Cios KJ, ed, Medical Data Mining and Knowledge Discovery. Springer-Verlag, Berlin/Heidelberg, 2000, pp 72-117
20. Paterson GI, Shepherd M, Wang X, Watters C. Using the XML-based Clinical Document Architecture for exchange of Structured Discharge Summaries. Proceedings of the 35th Annual Hawaii International Conference on System Sciences. 2002
21. Baorto DM, Cimino JJ, Parvin CA, Kahn MG. Combining laboratory data sets from multiple institutions using the logical observation identifier names and codes (LOINC). Int J Med Inf. 51:29-37, 1998
22. Marti'n-Sanchez F, Maojo V, Lo'pez-Campos G. Integrating genomics into health information systems. Methods Inf Med 41:25-30, 2002
23. Ahmed K, Ayers D, Birbeck M, Cousins J, Dodds D, Lubell J, Nic M, Rivers-Moore D, Watt A, Worden R, Wrightson A: Professional XML Meta Data. Wrox Press Ltd. Birmingham 2001.
24. W3C Architecture Domain. Extensible Markup Language (XML) [http://www.w3c.org/XML/]
25. White C, Quin L, Burman L: Mastering XML: Premium Edition.
Sybex, San Francisco 2001
26. AbiWord: Word Processing for Everyone. [http://www.abisource.com/].
27. Berman JJ, Edgerton ME, Friedman B. The Tissue Microarray Data Exchange Specification: A Community-based, Open Source Tool for Sharing Tissue Microarray Data. BMC Medical Informatics and Decision Making. 3:5, 2003
28. Berman JJ, Datta MW, Kajdacsy-Balla A, Melamed J, Orenstein J, Dobbin K, Patel A, Dhir R, Becich MJ. Tissue microarray data exchange specification: implementation by the Cooperative Prostate Cancer Tissue Resource. BMC Bioinformatics 5:19, 2004
29. Numeric representation of Dates and Time: The ISO solution to a long-standing source of confusion [http://www.iso.ch/iso/en/prods-services/popstds/datesandtime.html]
30. XML Namespaces. [http://www.w3schools.com/xml/xml_namespaces.asp]
31. W3C XML Pointer, XML Base and XML Linking. [http://www.w3.org/XML/Linking]
32. Smith CA. Effect of XML Markup on Retrieval of Clinical Documents. Proc AMIA Symp. 614-618, 2003
33. Korman LY, Delvaux M, Bidgood D. Structured reporting in gastrointestinal endoscopy: integration with DICOM and minimal standard terminology.
Int J Med Inform 48:201-206, 1998
34. American Society of Testing and Material Data Type Definition (DTD) Pathology Report version 1.0. [http://www.openhealth.org/ASTM/pathology.report.dtd]