biomedical informatics cover Perl Programming for Medicine and Biology Cover

Biomedical Informatics books
by Jules J. Berman

  • Jones & Bartlett sales and informational website for Biomedical Informatics
  • U.S. book site for Biomedical Informatics
  • Full Table of Contents from Library of Congress for Biomedical Informatics
  • List of book-related resources
  • Brief author biography on Association for Pathology Informatics Website
  • Quick link to PubMed listing for Jules J. Berman
  • Full list of Publications for Jules J. Berman
  • Dr. Bruce Friedman's review of Biomedical Informatics
  • Perl Programming for Medicine and Biology companion site.
  • Author's blog on data specifications
  • Contact author

  • Download 2003 Tissue Microarray Specification:
    The tissue microarray data exchange specification: A community-based, open source tool for sharing tissue microarray data. BMC Med Inform Decis Making. Accepted 23 May 2003
    View or Download Article

    *This letter summarizes the Standards Session of the May 30, 2001 AIMCL
    Tissue Microarray Infostructure Workshop*
    The AIMCL "Automated Information Management in the Clinical Laboratory"  is
    hosted each year by the University of Michigan and is organized by Dr. Bruce
    Friedman.  This year, Bruce hosted a special workshop on Tissue Microarray
    (TMA) Infostructure on May 30.  I didn't do a head-count, but there seemed
    to be at least 50 (maybe as many as 70) people in attendance.  Many of the
    attendees represented commercial interests.
    During the morning session, several TMA investigators described the ongoing
    research projects in their laboratories (Mark Rubin, U of Michigan; Steve
    Bova, Johns Hopkins; David Rimm, Yale; Matt van de Rijn, Stanford).
    The afternoon shifted discussion to TMA data exchange standards.  I gave a
    short lecture on the importance of standards for data exchange and made the
    argument that, at present, all tissue microarray data is handled
    idiosyncratically by different laboratories, resulting in the inability of
    researchers to exchange TMA datasets.
    If there were a standard for the exchange of TMA data, researchers could:
    1. Share their TMA data to collaborators in other laboratories
    2. Submit their TMA data to journals or to data respositories in a standard
    3. Merge their TMA data with other TMA datasets.
    4. Update their TMA datasets with accrued data from related datasets (e.g.
    specimen repositories that have collected patient follow-up data related to
    tissue cores included in the TMA)
    5.  Validate TMA experimental data (by comparing experimental data elements
    with the corresponding data elements produced by other researchers using the
    same TMA block or the same TMA tissue cores).
    6. Support distributed dataset queries (by searching over common data
    elements found in different available datasets, including tissue repository
    datasets and gene array datasets)
    7. Extend the value of any dataset by allowing a community of researchers to
    perform data analyses that may not have been anticipated at the time that
    the TMA was designed.
    The key to all of these efforts is the concept of DATA SHARING.  Sharing may
    be altruistic (placing your TMA data into the public domain), commercial (I
    will share my data with you if you pay me the required fee), or professional
    (you can have my data if I'm included as an author on the paper).
    During my presentation, and in the group discussion afterward, the following
    general qualities of a TMA standard were discussed:
    1. The standard should be free and non-proprietary.
    2. The standard should be self-descriptive.  Anyone reviewing a TMA file
    should be able to precisely determine how the data is organized by reading
    the data tags included in the file.
    3. The standard should, when feasible, use publicly available common data
    elements linked to a web site that fully defines each common data element
    included in the standard (needed to support dataset-independent distributed
    network queries).  This means that the committee that creates the TMA
    standard must work with other standards committees to ensure cross-database
    compatibility of common data elements
    4. The standard should be generic (able to describe any laboratory's TMA
    data structure)
    5. The standard should be extensible.  This means that there will need to be
    a standards committee that can make changes in the standard over time and
    that can keep a documented history of modifications in the standard.
    6. The standard should be easy to implement.  It should be relatively easy
    for a programmer to translate any commercial TMA dataset into the TMA
    standard (and to reverse the process)
    7. The standard should not be a requirement.  The committee that creates the
    standard should take no measure to require laboratories to implement the
    standard.  Those using the standard would be able to choose that data that
    is included in their
    shared datasets (e.g. they may choose to withold or encrypt patient
    8.  The standard should have community buy-in.  Laboratories, commercial
    vendors, pathology organizations, government agencies, and other standards
    committees should all have the opportunity to comment on the standards.
    Some of the data elements included in the TMA file standard might be:
    Tissue Microarray Header Data
    1. File-type
    2. Creator
    3. Lab or origin
    4. Creation date
    5. Modification dates
    6. Unique identifier
    7. Usage
    Specific core data:
    8. Numbers of cores in array
    9 Size of cores in array
    10.core coordinate system
    11. Coordinates of cores
    14. Core data
    Data element (1 of 1000)
       a. Surgical pathology specimen (de-identified)
          This may link to patient identifier in another dataset
          This may link to another core of the same specimen in another
          tissue microarray file
          Patient identifier may link to clinical/demographic data in one or
          more datasets
       b. Code for particular core
       c. Stain
       d. Results of stain (method identifier)
       e. Image of data element
    There was enthusiasm among the group to continue work towards developing the
    standard.  There was consensus for writing the standard in XML.  There were
    no dissenters to the idea of working on drafts of the TMA standard through a
    listserver consisting of all the TMA workshop registrants.
    The next TMA standards workshop has been organized by Dr. Mary Edgerton
    (Vanderbilt) and will be held Oct. 6, 2001 in conjuction with the APIII
    meeting in Pittsburgh.  If all goes well, before the Pittsburgh meeting, we
    should have a rudimentary draft of an XML standard, including comments from
    the Ann Arbor registrants.
    Jules Berman, Ph.D., M.D.
    Program Director, Pathology Informatics
    Cancer Diagnosis Program, DCTD, NCI, NIH