Books by Jules J. Berman, covers

Moore GW, Berman JJ, Hanzlick RL, Buchino JJ, Hutchins GM. 
A prototype international autopsy database: 1625 consecutive 
fetal and neonatal autopsy facesheets spanning twenty years. 
Archives Pathol Lab Med 120(8):782-785, 1996 

G. William Moore, MD, PhD, Jules J. Berman, PhD, MD, Randy L. Hanzlick, MD, John J. Buchino, MD, Grover M. Hutchins, MD


Objective.- To demonstrate that cause of death statements can be generated by a computer algorithm from an autopsy database composed of diagnostic terms. Data Sources.- Over 49,000 autopsy facesheets were collected from a publicly accessible Internet autopsy database contributed by over a dozen institutions. This database is available at the web-site: Study Selection.- To test the feasiblity of creating and using a publicly available autopsy database, and to identify the technical and medicolegal problems that may arise with such a novel resource, a prototype study was designed by selecting autopsy facesheets from fetal and neonatal deaths. An algorithm was developed to determine the cause of death from the listing of anatomic diagnoses. Data Extraction.- 1625 fetal and neonatal autopsy facesheets were selected, for fetal and neonatal deaths occurring up to 28 days after birth.

Data Synthesis.- The algorithm determined causes of death from autopsy facesheet data in all cases. Upon review by an experienced pediatric pathologist, these automatically generated cause of death statements required no modification or only slight modification in over 90% of cases. Conclusions.- A large multi-institutional autopsy database composed of demographic and diagnostic information has been deposited on the Internet. This information can be freely downloaded and used by any researcher without violating patient confidentiality. As a demonstration of one possible application of the database, fetal and neonatal autopsies generated cause of death statements using a computer algorithm. One can anticipate that the wealth of information contained in autopsy facesheets can be assembled into a database that will serve the public interest.


In 1975, the College of American Pathologists embarked on the development of a computerized National Autopsy Databank to serve as a central repository of pathologic, biomedical, demographic, and epidemiologic information which would potentially benefit a wide range of scientific and research endeavors (1,2,3). Past efforts were hampered by the state of computer technology and the enormity of the clerical effort. At present, powerful personal computers are available to all pathologists in developed countries. Furthermore, pathologists can now share information via the Internet, at no expense above a nominal monthly charge. It is now realistic to create an accessible repository of autopsy records, so long as those records are stored in an electronic format that conforms to a few rules for patient privacy.

The importance of a publicly available autopsy database extends from the role of the individual autopsy in patient care, to quality assurance, research, and disease surveillance. Autopsy data address the most serious of all medical outcomes: death. Unfortunately, most statistical studies that examine causes of death are derived from death certificate data. Annual data for the entire U.S. population have been collected since 1935 by the Vital Statistics Program of the National Center for Health Statistics. These data are taken from death certificates, and account for more than 99% of U.S. deaths (4). Death certificate data are notoriously error-prone, and these problems seem to extend beyond national borders, as similar questions related to the validity of death certificate data have been voiced in the United States and the United Kingdom (5-9). The most common error occurs when a mode or mechanism of death is listed as the cause of death (e.g., cardiac arrest, cardiopulmonary arrest), thus nullifying the potential value of the death certificate (10). Another problem involving death certificates is their proven lack of agreement with autopsy data (7). This is hardly surprising, since death certificates are often completed before the autopsy results become available. A recent survey of 49 national and international health atlases has shown that there is virtually no consistency in the way that death data are presented (11).

The autopsy is the most reliable cause of death document we have. An autopsy facesheet database constructed from merged records of many institutions will permit epidemiologists to obtain cause of death data accompanied by a list of pathologic findings. Furthermore, large autopsy databases allow pathologists to test hypotheses based on a small number of personally encountered cases against data collected from a large number of similar cases. In fact, virtually any idea derived from observations on a single autopsy will benefit from evaluating similar cases collected from a large database. This would include the sensitive but important issue of examining the quality of patient care based on autopsy findings. In addition, review of an autopsy database may reveal the emergence of diseases that would not be apparent from the observations in a single autopsy or even from the personal experiences of a pathologist who performs a large number of autopsies.

The Internet Autopsy Database, developed by Drs. Moore, Berman, and Hutchins, consists of over 49,000 autopsy facesheets contributed by over a dozen academic medical institutions, available at URL: As evidence of the usefulness of a large autopsy database for research purposes, over 1200 research papers have been published to date utilizing autopsy reports and accompanying materials from The Johns Hopkins Hospital (12). As a prototype investigation for a database derived from autopsy facesheets, we selected autopsy facesheet data from autopsies performed on fetuses and infants who died within 28 days of birth, autopsied from 1975 through 1994. Causes of death were obtained from a database program that matches diagnoses appearing in the autopsy facesheet against an ordered thesaurus of accepted perinatal causes of death. Computer-selected causes of death were compared with the cause of death determined by a pediatric pathologist who reviewed the autopsy facesheets.


Computer-readable autopsy facesheets were examined for patients up to 28 days old in the Internet Autopsy Database in years 1975 through 1994, inclusive. To preserve confidentiality, demographic information is reduced to age, race, sex, and year of death; and autopsy reports are renumbered using a random number generator (13).

Clinical history and anatomical diagnoses, contributed to the database in the form of short, declarative phrases, are converted into a 'reduced facesheet' by dropping the sentences in the clinical history and anatomical diagnoses to entirely lower-case; and by removing all numerals, single letters, and stop-words (i.e., prepositions, articles, conjunctions, common adjectives and verbs). Single-word or multiple-word phrases within each diagnostic sentence are matched to corresponding terms in a thesaurus of possible causes of death, assembled from the literature (14-19) and from our own experience. Thesaurus terms for the first section of the death certificate are ranked in Groups A through C, according to their likely position on the certificate, as shown in Table 1 (20). That is, a thesaurus term likely to be an 'immediate cause of death' is assigned to Group A; a term likely to be an 'intermediate cause of death' is assigned to Group B; and a term likely to be an 'underlying cause of death' is assigned to Group C. Thesaurus terms for the second section of the death certificate are ranked in Groups D or E. A term likely to be an 'other significant condition' is assigned to Group D; and a term likely to be a 'risk factor' is assigned to Group E. For each autopsy facesheet, the computer-generated causes of death may be rearranged by the attending pathologist, in accordance with the particulars of the case.

Each thesaurus term is pointed to one or more 'hit terms', as they might occur in a reduced autopsy facesheet. For example, thesaurus term 'chorioamnionitis' might point to hit terms such as 'chorionitis' or 'placentitis'. The list of hit terms is created from experience, and enriched by the 'barrier word method' applied to the autopsy facesheets (21). The database program constructs a provisional cause of death statement from each autopsy facesheet by matching hit terms in the thesaurus with terms in the reduced autopsy facesheet.


There were 1625 autopsy facesheets from patients up to 28 days old in the Internet Autopsy Database, autopsied between 1975 and 1994, inclusive. Causes of death that appeared at least 50 times are shown in Table 2. In Group A, the most common cause of death is fetal pneumonia (211/1625, 13% cases). In Group B, the most common cause of death is hyaline membrane disease (240/1625, 15% cases). In Group C, the most common cause of death is chorioamnionitis (360/1625, 22% cases). In Group D, the most common other significant condition is Cesarean section (275/1625, 17% cases). In Group E, the most common risk factor is maternal toxemia (73/1625, 4% cases).

The database program identified 126 cases with an unknown cause of death, and review of the facesheets by a pathologist showed that the computer assignment concurred with the pathologist in 117 (93%) cases. Errors were explained by incorrectly spelled words, or idiosyncratic diagnostic statements. Review of other computer-assigned causes of death showed them to require only deletions or rearrangements, in over 90% of cases.


In this study, a prototype autopsy database has been constructed from the facesheet of 1625 fetal and newborn autopsies. The records in the database consist of patient demographics, a listing of anatomic diagnoses (obtained from the autopsy facesheet), and a computer-assigned cause of death statement.

A cause of death cannot be established in all autopsies. In this study, a computer-generated cause of death was assigned in over 90% of cases. A recent study by Saller et al (19) determined a cause of death in about 76% of infant deaths studied. The high rate of cause of death ascertained in our study may be an artefact of the database program, which matches anatomic diagnoses from the facesheet against a list of causes of death, without actually analyzing the autopsy report to determine whether the match is an appropriate cause of death in the individual autopsy. The computer-assigned cause of death is best regarded as a list of those conditions present in the report that are capable of causing death, rather than an exact determination of the cause of death. In practice, the computer-assigned cause of death would be appropriately modified by the pathologist at the time of completion of the autopsy report.


Persons interested in contributing to the Internet Autopsy Database should contact the database administrator at the email address provided on the web-site. We propose the following guidelines to facilitate the electronic transfer of autopsy records into the database, based upon published guidelines (22). An example report is shown in Table 3. denotes carriage-return-line-feed (ASCII 13, 10); denotes blank space (ASCII 32).

1. Records should be submitted on 3.5" floppy disks in IBM-PC- compatible format. Collected autopsy records should include consecutive reports, each report appearing once only. Every report must have a unique identifier. The identifying number is a secret number that the contributing institution can use to track the actual autopsy number. The institutional identifying number is renumbered using a random number generator, prior to publication on the public database.

2. Individual autopsy reports should be demarcated (separated) in the electronic file in an unambiguous manner and divided into unambiguous data fields that include: age, race, sex, year of death, location, occupation, clinical history, and anatomical diagnoses.

3. There should be a field consisting of a list of pathologic diagnoses verified by the autopsy. Each line of the list should be written as short sentences in English with common terminology, and should include an anatomic site.

4. Facesheets should consist of IBM-compatible, computer-readable files in 7-bit ASCII, i.e., only ASCII characters numbered 10, 13, and 32 through 126. Ideally, lines should be at most 60 characters long, followed by . Most word processing and electronic mail software can manage such files without difficulty. Each facesheet begins as a demographic line, starting with ### and ending with . No other part of the autopsy facesheet should contain the # character (ASCII 35). The demographic line consists of autopsy number, age, race, sex, year of autopsy, first three digits of U.S. postal zip code, and occupation. For countries outside the USA, one should use the telephone country code (e.g., 40=Great Britain, 49=Germany, etc.). The 7 segments of the demographic line are separated by the ^ character (ASCII 94).

5. The clinical history portion of the facesheet begins with the text: CLINICAL HISTORY:. The anatomical diagnosis portion of the facesheet begins with the text: ANATOMICAL DIAGNOSIS:. The cause of death portion of the facesheet begins with the text: CAUSE OF DEATH:. Each text- sentence should be terse, without syntactical complexities, and should end with an unambiguous sentence delimiter. The importance of an unambiguous sentence delimiter has been discussed (22), and an institution may choose from a variety of options, including a period followed by two spaces, provided that line-interrupts never break the two spaces. As an alternate, The sentence delimiter is currently in use at The Johns Hopkins Hospital, and is the sentence delimiter suggested by the CAP Autopsy Committee (23). Database records appearing in our proposed format can easily be reconstituted to a user-readable form by a database engine.


An autopsy database accessible to all researchers will have great value for epidemiologic studies. Since geographic location is one of the demographic fields supplied by contributors (partial zip-code for facesheets from the USA, and telephone-country-code for facesheets contributed from outside the USA), studies can be stratified by location. Similarly, patient age and year of autopsy are provided, allowing researchers to analyze age-stratified disease and causes of death as well as trends associated with the era in which the autopsies were performed. Because review of glass slides is usually necessary for pathologic studies and access to tissue (as paraffin block sections) is required for molecular biology studies, it is important that legitimate researchers have access to original autopsy materials. Although the database maintains the confidentiality of patients, care-givers, and institutional contributors, we have a tentative plan for those who wish to have prepared tissue sections. The web-site provides an e-mail linkage to the database administrator, who forwards requests to the Institution that contributed the relevant autopsies. It is then up to the Institution to contact the person requesting access to detailed records, slides, or tissues. Collaborative research studies might be arranged by this method. In this manner, the contributing institution maintains control of the amount of information and material released to researchers. Our hope is that the database will grow to become one of the richest and most accessible sources of autopsy lesions, and that researchers worldwide will utilize the database to further our understanding of all diseases that are sampled at autopsy.

In the 105 years of its existence, cases from the autopsy files of The Johns Hopkins Hospital have provided materials for over 1200 peer-reviewed publications appearing in academic journals. We propose that a public database containing contributed autopsies from any interested contributing institution will have even greater value for research and epidemiologic studies. It is important to recognize that the envisioned autopsy database will not be an unbiased sample of all deaths. Autopsies are only performed for a minority of deaths. Currently they are seldom performed on patients dying at home of natural causes. The autopsy rate in Muslim countries is close to zero, and a review of international autopsy rates shows that most of the sampled countries have low autopsy rates (24). Since only a small proportion of institutions that perform autopsies are likely to contribute to a voluntary autopsy database, the database population will not represent any identifiable demographic group. Regardless, there are appropriate intellectual paradigms for managing information of this nature. Since the patient's age, sex, and year of death are provided with each facesheet, the results on a large, potentially biased autopsy sample could be age-adjusted and sex-adjusted by standard epidemiologic methods. McFarlane and coworkers (25,26) have suggested the paradigm of an 'epidemiologic necropsy', in which clinical information known about the patient prior to death is used to stratify or pro-rate autopsy information from heterogeneous sources. In particular, cases with a 'necropsy surprise', i.e., autopsy diagnoses which were unsuspected or unknown clinically, may be used as cases for which no clinical bias based upon the surprise could have influenced the selection of that patient for autopsy. Since it has repeatedly been shown that about 15-25% of autopsies contain a significant unsuspected or unknown finding (27-29), it seems that the necropsy surprise paradigm could be used to evaluate data from an autopsy database.

In conclusion, it is our opinion that a public autopsy database that contains information of value for epidemiologists and other researchers is technically feasible using current technology. Such a database can be designed to protect patient privacy and to provide a computer-assigned cause of death in most cases. Placing the autopsy database on the Internet maximizes its access to researchers interested in using or contributing to the database.


1. Peery TM. The autopsy data bank. A proposal for pathologists to contribute to the health care of the nation. Am J Clin Pathol. 1978; 69 (Suppl): 258-259.

2. Carter JR, Nash NP, Cechner RL, Platt RD. Proposal for a national autopsy data bank. A potential major contribution of pathologists to the health care of the nation. Am J Clin Pathol. 1981; 76 (Suppl): 597-617.

3. Kircher T, Carter JR, Sinton E. The national autopsy databank. Pathologist. 1985; 39:22-26.

4. Frey CM, McMillen MM, Cowan CD, Horm JW, Kessler LG: Representativeness of the surveillance, epidemiology, and end results program data: recent trends in cancer mortality rate. JNCI 1992; 84:872-877.

5. Ashworth TG: Inadequacy of death certification: proposal for change. J Clin Pathol 1991; 44:265-268. 6. Bjornsson J, Jonasson JG, Nielsen GP: The accuracy of death certificates. Lab Invest 1992; 66:106A.

7. Kircher T, Nelson J, Burdo H: The autopsy as a measure of accuracy of the death certificate. N Engl J Med. 1985; 313:1263-1269.

8. Kircher T, Anderson RE: Cause of death: proper completion of the death certificate. JAMA 1987; 258:349-352.

9. Erlander D: Computer data processing of medical diagnoses in pathology. Am J Clin Pathol 1975; 63:538-544.

10. Slater DN: Certifying the cause of death: an audit of wording inaccuracies. J Clin Pathol 1993; 46:232-234.

11. Walter SD, Birnie SE: Mapping mortality and morbidity patterns: an international comparison. Intl J Epidemiol. 1991; 20:678-689.

12. Moore GW, Boitnott JK, Miller RE, Eggleston JC, Hutchins GM. Integrated pathology reporting, indexing, and retrieval system using natural language diagnoses. Modern Pathol. 1988; 1:44-50.

13. Schneier B. Applied Cryptography. Protocols, Algorithms, and Source Code in C. New York: John Wiley & Sons, 1994.

14. Cole SK. Accuracy of death certificates in neonatal deaths. Community Medicine 1989; 11:1-8.

15. Dunn PM. The search for perinatal definitions and standards. Acta Paediatr Scand Suppl 1985; 319: 7-16.

16. Lammer EJ, Brown EJ, Anderka MT, Guyer B. Classification and analysis of fetal deaths in Massachusetts. J Amer Med Assn 1989; 261:1757-1762.

17. Valdes-Dapena MA, Arey JB. The causes of neonatal mortality: An analysis of 501 autopsies on newborn infants. J Pediatr 1970; 77: 366-375.

18. Alberman E, Botting B, Blachley N, Twidell A. A new hierarchical classification of causes of infant deaths in England and Wales. Arch Dis Childh 1994; 70: 403-409.

19. Saller DN jr, Lesser KB, Harrel U, Rogers BB, Oyer CE. The clinical utility of the perinatal autopsy. J Amer Med Assn 1995; 273:663-665.

20. Hanzlick R, ed. The medical cause of death manual. Instructions for writing cause of death statements for deaths due to natural causes. Northfield, IL: College of American Pathologists, 1994.

21. Moore GW, Miller RE, Hutchins GM. Indexing by MeSH titles of natural language pathology phrases identified on first encounter using the barrier word method. In: Scherrer JR, Cote RA, Mandil SH, eds. Computerized Natural Medical Language Processing for Knowledge Representation. Amsterdam: North-Holland; 1989: 29-39.

22. Berman JJ, Moore GW. SNOMED-encoded surgical pathology databases: a tool for epidemiologic investigation. Modern Pathology, in press.

23. Hutchins GM and the Autopsy Committee of the College of American Pathologists: Practice guidelines for autopsy pathology: autopsy reporting. Arch Pathol Lab Med. 1995; 119:123-130.

24. Svendsen E, Hill RB. Autopsy legislation and practice in various countries. Arch Pathol Lab Med. 1987; 111:846-850.

25. McFarlane MJ, Feinstein AR, Wells CK, Chan CK. The 'epidemiologic necropsy'. Unexpected detections, and changing rates of lung cancer. JAMA. 1987; 258:331-338.

26. McFarlane MJ. Clinical diagnosis is not a source of bias in selection for necropsy. Arch Pathol Lab Med. 1989; 113:64-67.

27. Goldman L, Sayson R, Robbins S, Cohn LH, Bettmann M, Weisberg M. The value of the autopsy in three medical eras. N Engl J Med. 1983; 308:1000-1005.

28. Goldman L. Diagnostic advances versus the value of the autosy: 1912-1980. Arch Pathol Lab Med. 1984; 108:501-505.

29. Cameron HM, McGoogan E. A prospective study of 1152 hospital autopsies. 1. Inaccuracies in death certification. J Pathol. 1981; 133: 273-283.

    Group A:  Likely as Immediate Cause of Death.
    Group B:  Likely as Intermediate Cause of Death.
    Group C:  Likely as Underlying Cause of Death.
    Group D:  Likely as Other Significant Condition.
    Group E:  Likely as Risk Factor.
Fetal pneumonia...........................211

Hyaline membrane disease..................240
Hypoplasia lung...........................105
Meconium aspiration........................61

Brain hemorrhage..........................303
Abruptio placentae........................108
Ventricular septal defect.................107
Atrial septal defect.......................73
Necrotizing enterocolitis..................55

Cesarean section..........................275
Breech delivery...........................198
Premature rupture membranes...............164
Twin pregnancy............................156
Premature labor...........................148

Maternal toxemia...........................73
Maternal diabetes..........................59

      Pregnancy in a 27 year old, Coombs negative, Rubella
      immune, white female; hypertension; hospitalization;
      medications; ultrasound demonstrating viable gestational sac;
      polyhydramnios, possible esophageal atresia per sonogram;
      pre-eclampsia; Aldomet; absence of fetal movement;
      absence of fetal heart sounds with no fetal heart motion
      confirmed by sonogram; fetal death in utero;
      admission to hospital following rupture of membranes;
      pitocin-induced vaginal delivery of stillborn female fetus
      delivery of placenta.
 Premature female fetus, anatomic age consistent with
      38 weeks gestation (weight 1900 gm, crown-heel length 43 cm,
      crown-rump length 31 cm, right foot length 6.3 cm).
 Atresia, esophagus.
 Sanguineous pleural effusions, right 15 ml, left 15 ml.
 Cephalohematoma, scalp.
 Esophageal atresia.
 Risk Factor - toxemia.

 NOTE:   denotes carriage-return-line-feed (ASCII 13, 10).
  denotes blank space (ASCII 32).
Books by Jules J. Berman, covers