Moore GW, Berman JJ, Hanzlick RL, Buchino JJ, Hutchins GM.
A prototype international autopsy database: 1625 consecutive
fetal and neonatal autopsy facesheets spanning twenty years.
Archives Pathol Lab Med 120(8):782-785, 1996
A PROTOTYPE INTERNET AUTOPSY DATABASE: 1625 CONSECUTIVE FETAL AND NEONATAL AUTOPSY FACESHEETS SPANNING TWENTY YEARS
G. William Moore, MD, PhD,
Jules J. Berman, PhD, MD, Randy L. Hanzlick, MD, John J. Buchino, MD,
Grover M. Hutchins, MD
Objective.- To demonstrate that cause of death statements can be
generated by a computer algorithm from an autopsy database composed
of diagnostic terms.
Data Sources.- Over 49,000 autopsy facesheets were collected from
a publicly accessible Internet autopsy database contributed by over
a dozen institutions. This database is available at the web-site:
Study Selection.- To test the feasiblity of creating and using
a publicly available autopsy database, and to identify the technical
and medicolegal problems that may arise with such a novel resource,
a prototype study was designed by selecting autopsy facesheets
from fetal and neonatal deaths. An algorithm was developed to
determine the cause of death from the listing of anatomic diagnoses.
Data Extraction.- 1625 fetal and neonatal autopsy facesheets were
selected, for fetal and neonatal deaths occurring up to 28 days
Data Synthesis.- The algorithm determined causes of death
from autopsy facesheet data in all cases. Upon review
by an experienced pediatric pathologist, these automatically
generated cause of death statements required no modification
or only slight modification in over 90% of cases.
Conclusions.- A large multi-institutional autopsy database composed
of demographic and diagnostic information has been deposited on the
Internet. This information can be freely downloaded and used by
any researcher without violating patient confidentiality.
As a demonstration of one possible application of the database,
fetal and neonatal autopsies generated cause of death statements
using a computer algorithm. One can anticipate that the wealth of
information contained in autopsy facesheets can be assembled
into a database that will serve the public interest.
In 1975, the College of American Pathologists embarked on the
development of a computerized National Autopsy Databank to serve
as a central repository of pathologic, biomedical, demographic,
and epidemiologic information which would potentially benefit a wide
range of scientific and research endeavors (1,2,3). Past efforts
were hampered by the state of computer technology and the enormity
of the clerical effort. At present, powerful personal computers are
available to all pathologists in developed countries. Furthermore,
pathologists can now share information via the Internet, at no
expense above a nominal monthly charge. It is now realistic to
create an accessible repository of autopsy records, so long as those
records are stored in an electronic format that conforms to
a few rules for patient privacy.
The importance of a publicly available autopsy database extends
from the role of the individual autopsy in patient care, to quality
assurance, research, and disease surveillance. Autopsy data address
the most serious of all medical outcomes: death. Unfortunately,
most statistical studies that examine causes of death are derived
from death certificate data. Annual data for the entire U.S.
population have been collected since 1935 by the Vital Statistics
Program of the National Center for Health Statistics. These data are
taken from death certificates, and account for more than 99% of U.S.
deaths (4). Death certificate data are notoriously error-prone, and
these problems seem to extend beyond national borders, as similar
questions related to the validity of death certificate data have been
voiced in the United States and the United Kingdom (5-9). The most
common error occurs when a mode or mechanism of death is listed as
the cause of death (e.g., cardiac arrest, cardiopulmonary arrest),
thus nullifying the potential value of the death certificate (10).
Another problem involving death certificates is their proven lack
of agreement with autopsy data (7). This is hardly surprising,
since death certificates are often completed before the autopsy
results become available. A recent survey of 49 national and
international health atlases has shown that there is virtually
no consistency in the way that death data are presented (11).
The autopsy is the most reliable cause of death document we
have. An autopsy facesheet database constructed from merged records
of many institutions will permit epidemiologists to obtain cause
of death data accompanied by a list of pathologic findings.
Furthermore, large autopsy databases allow pathologists to test
hypotheses based on a small number of personally encountered cases
against data collected from a large number of similar cases.
In fact, virtually any idea derived from observations on a single
autopsy will benefit from evaluating similar cases collected from
a large database. This would include the sensitive but important
issue of examining the quality of patient care based on autopsy
findings. In addition, review of an autopsy database may reveal
the emergence of diseases that would not be apparent from
the observations in a single autopsy or even from the personal
experiences of a pathologist who performs a large number
The Internet Autopsy Database, developed by Drs. Moore, Berman,
and Hutchins, consists of over 49,000 autopsy facesheets contributed
by over a dozen academic medical institutions, available at URL:
As evidence of the usefulness of a large autopsy database for
research purposes, over 1200 research papers have been published
to date utilizing autopsy reports and accompanying materials from
The Johns Hopkins Hospital (12). As a prototype investigation
for a database derived from autopsy facesheets, we selected
autopsy facesheet data from autopsies performed on fetuses and
infants who died within 28 days of birth, autopsied from 1975 through
1994. Causes of death were obtained from a database program
that matches diagnoses appearing in the autopsy facesheet against
an ordered thesaurus of accepted perinatal causes of death.
Computer-selected causes of death were compared with the cause
of death determined by a pediatric pathologist who reviewed
the autopsy facesheets.
MATERIALS AND METHODS
Computer-readable autopsy facesheets were examined for patients
up to 28 days old in the Internet Autopsy Database in years 1975
through 1994, inclusive. To preserve confidentiality, demographic
information is reduced to age, race, sex, and year of death; and
autopsy reports are renumbered using a random number generator (13).
Clinical history and anatomical diagnoses, contributed to the
database in the form of short, declarative phrases, are converted
into a 'reduced facesheet' by dropping the sentences in the clinical
history and anatomical diagnoses to entirely lower-case; and by
removing all numerals, single letters, and stop-words (i.e.,
prepositions, articles, conjunctions, common adjectives and verbs).
Single-word or multiple-word phrases within each diagnostic sentence
are matched to corresponding terms in a thesaurus of possible causes
of death, assembled from the literature (14-19) and from our own
experience. Thesaurus terms for the first section of the death
certificate are ranked in Groups A through C, according to their
likely position on the certificate, as shown in Table 1 (20).
That is, a thesaurus term likely to be an 'immediate cause of death'
is assigned to Group A; a term likely to be an 'intermediate cause
of death' is assigned to Group B; and a term likely to be an
'underlying cause of death' is assigned to Group C. Thesaurus terms
for the second section of the death certificate are ranked
in Groups D or E. A term likely to be an 'other significant
condition' is assigned to Group D; and a term likely to be a
'risk factor' is assigned to Group E. For each autopsy facesheet,
the computer-generated causes of death may be rearranged by the
attending pathologist, in accordance with the particulars
of the case.
Each thesaurus term is pointed to one or more 'hit terms',
as they might occur in a reduced autopsy facesheet.
For example, thesaurus term 'chorioamnionitis' might point
to hit terms such as 'chorionitis' or 'placentitis'. The list
of hit terms is created from experience, and enriched by
the 'barrier word method' applied to the autopsy facesheets (21).
The database program constructs a provisional cause of death
statement from each autopsy facesheet by matching hit terms
in the thesaurus with terms in the reduced autopsy facesheet.
There were 1625 autopsy facesheets from patients up to 28 days
old in the Internet Autopsy Database, autopsied between 1975
and 1994, inclusive. Causes of death that appeared at least
50 times are shown in Table 2. In Group A, the most common cause
of death is fetal pneumonia (211/1625, 13% cases). In Group B,
the most common cause of death is hyaline membrane disease (240/1625,
15% cases). In Group C, the most common cause of death
is chorioamnionitis (360/1625, 22% cases). In Group D, the most
common other significant condition is Cesarean section (275/1625,
17% cases). In Group E, the most common risk factor is maternal
toxemia (73/1625, 4% cases).
The database program identified 126 cases with an unknown cause
of death, and review of the facesheets by a pathologist showed that
the computer assignment concurred with the pathologist in 117 (93%)
cases. Errors were explained by incorrectly spelled words,
or idiosyncratic diagnostic statements. Review of other
computer-assigned causes of death showed them to require
only deletions or rearrangements, in over 90% of cases.
In this study, a prototype autopsy database has been
constructed from the facesheet of 1625 fetal and newborn autopsies.
The records in the database consist of patient demographics,
a listing of anatomic diagnoses (obtained from the autopsy
facesheet), and a computer-assigned cause of death statement.
A cause of death cannot be established in all autopsies.
In this study, a computer-generated cause of death was assigned
in over 90% of cases. A recent study by Saller et al (19)
determined a cause of death in about 76% of infant deaths studied.
The high rate of cause of death ascertained in our study may be an
artefact of the database program, which matches anatomic diagnoses
from the facesheet against a list of causes of death, without
actually analyzing the autopsy report to determine whether the match
is an appropriate cause of death in the individual autopsy. The
computer-assigned cause of death is best regarded as a list of those
conditions present in the report that are capable of causing death,
rather than an exact determination of the cause of death.
In practice, the computer-assigned cause of death would be
appropriately modified by the pathologist at the time of completion
of the autopsy report.
GUIDELINES FOR CONTRIBUTING AUTOPSY FACESHEETS
Persons interested in contributing to the Internet Autopsy
Database should contact the database administrator at the email
address provided on the web-site. We propose the following
guidelines to facilitate the electronic transfer of autopsy records
into the database, based upon published guidelines (22). An example
report is shown in Table 3. denotes carriage-return-line-feed
(ASCII 13, 10); denotes blank space (ASCII 32).
1. Records should be submitted on 3.5" floppy disks in IBM-PC-
compatible format. Collected autopsy records should include
consecutive reports, each report appearing once only. Every report
must have a unique identifier. The identifying number is a secret
number that the contributing institution can use to track the
actual autopsy number. The institutional identifying number
is renumbered using a random number generator, prior to publication
on the public database.
2. Individual autopsy reports should be demarcated (separated)
in the electronic file in an unambiguous manner and divided into
unambiguous data fields that include: age, race, sex,
year of death, location, occupation, clinical history, and
3. There should be a field consisting of a list of pathologic
diagnoses verified by the autopsy. Each line of the list should be
written as short sentences in English with common terminology,
and should include an anatomic site.
4. Facesheets should consist of IBM-compatible, computer-readable
files in 7-bit ASCII, i.e., only ASCII characters numbered 10, 13,
and 32 through 126. Ideally, lines should be at most 60 characters
long, followed by . Most word processing and electronic mail
software can manage such files without difficulty.
Each facesheet begins as a demographic line, starting with
### and ending with . No other part of the autopsy
facesheet should contain the # character (ASCII 35). The
demographic line consists of autopsy number, age, race, sex,
year of autopsy, first three digits of U.S. postal zip code,
and occupation. For countries outside the USA, one should use
the telephone country code (e.g., 40=Great Britain, 49=Germany,
etc.). The 7 segments of the demographic line are separated
by the ^ character (ASCII 94).
5. The clinical history portion of the facesheet begins with
the text: CLINICAL HISTORY:. The anatomical diagnosis
portion of the facesheet begins with the text: ANATOMICAL
DIAGNOSIS:. The cause of death portion of the facesheet
begins with the text: CAUSE OF DEATH:. Each text-
sentence should be terse, without syntactical complexities,
and should end with an unambiguous sentence delimiter.
The importance of an unambiguous sentence delimiter has been
discussed (22), and an institution may choose from a variety
of options, including a period followed by two spaces, provided
that line-interrupts never break the two spaces. As an alternate,
The sentence delimiter is currently in use at
The Johns Hopkins Hospital, and is the sentence delimiter
suggested by the CAP Autopsy Committee (23). Database records
appearing in our proposed format can easily be reconstituted
to a user-readable form by a database engine.
USES OF THE AUTOPSY DATABASE.
An autopsy database accessible to all researchers will have great
value for epidemiologic studies. Since geographic location is one
of the demographic fields supplied by contributors (partial zip-code
for facesheets from the USA, and telephone-country-code for
facesheets contributed from outside the USA), studies can be
stratified by location. Similarly, patient age and year of autopsy
are provided, allowing researchers to analyze age-stratified disease
and causes of death as well as trends associated with the era
in which the autopsies were performed. Because review of glass
slides is usually necessary for pathologic studies and access to
tissue (as paraffin block sections) is required for molecular biology
studies, it is important that legitimate researchers have access to
original autopsy materials. Although the database maintains the
confidentiality of patients, care-givers, and institutional
contributors, we have a tentative plan for those who wish
to have prepared tissue sections. The web-site provides an
e-mail linkage to the database administrator, who forwards requests
to the Institution that contributed the relevant autopsies. It is
then up to the Institution to contact the person requesting access to
detailed records, slides, or tissues. Collaborative research studies
might be arranged by this method. In this manner, the contributing
institution maintains control of the amount of information and
material released to researchers. Our hope is that the database will
grow to become one of the richest and most accessible sources
of autopsy lesions, and that researchers worldwide will utilize
the database to further our understanding of all diseases that are
sampled at autopsy.
In the 105 years of its existence, cases from the autopsy files
of The Johns Hopkins Hospital have provided materials for over
1200 peer-reviewed publications appearing in academic journals.
We propose that a public database containing contributed autopsies
from any interested contributing institution will have even greater
value for research and epidemiologic studies. It is important
to recognize that the envisioned autopsy database will not be
an unbiased sample of all deaths. Autopsies are only performed
for a minority of deaths. Currently they are seldom performed
on patients dying at home of natural causes. The autopsy rate
in Muslim countries is close to zero, and a review of international
autopsy rates shows that most of the sampled countries have
low autopsy rates (24). Since only a small proportion of
institutions that perform autopsies are likely to contribute
to a voluntary autopsy database, the database population will not
represent any identifiable demographic group. Regardless, there are
appropriate intellectual paradigms for managing information
of this nature. Since the patient's age, sex, and year of death
are provided with each facesheet, the results on a large,
potentially biased autopsy sample could be age-adjusted and
sex-adjusted by standard epidemiologic methods. McFarlane
and coworkers (25,26) have suggested the paradigm of
an 'epidemiologic necropsy', in which clinical information known
about the patient prior to death is used to stratify or pro-rate
autopsy information from heterogeneous sources. In particular,
cases with a 'necropsy surprise', i.e., autopsy diagnoses which
were unsuspected or unknown clinically, may be used as cases
for which no clinical bias based upon the surprise could have
influenced the selection of that patient for autopsy. Since it has
repeatedly been shown that about 15-25% of autopsies contain
a significant unsuspected or unknown finding (27-29), it seems that
the necropsy surprise paradigm could be used to evaluate data
from an autopsy database.
In conclusion, it is our opinion that a public autopsy database
that contains information of value for epidemiologists and other
researchers is technically feasible using current technology.
Such a database can be designed to protect patient privacy and
to provide a computer-assigned cause of death in most cases.
Placing the autopsy database on the Internet maximizes its access
to researchers interested in using or contributing to the database.
1. Peery TM. The autopsy data bank. A proposal for pathologists
to contribute to the health care of the nation. Am J Clin Pathol.
1978; 69 (Suppl): 258-259.
2. Carter JR, Nash NP, Cechner RL, Platt RD. Proposal for
a national autopsy data bank. A potential major contribution
of pathologists to the health care of the nation. Am J Clin Pathol.
1981; 76 (Suppl): 597-617.
3. Kircher T, Carter JR, Sinton E. The national autopsy databank.
Pathologist. 1985; 39:22-26.
4. Frey CM, McMillen MM, Cowan CD, Horm JW, Kessler LG:
Representativeness of the surveillance, epidemiology, and end results
program data: recent trends in cancer mortality rate.
JNCI 1992; 84:872-877.
5. Ashworth TG: Inadequacy of death certification: proposal
for change. J Clin Pathol 1991; 44:265-268. 6. Bjornsson J, Jonasson JG, Nielsen GP: The accuracy of death
certificates. Lab Invest 1992; 66:106A.
7. Kircher T, Nelson J, Burdo H: The autopsy as a measure of
accuracy of the death certificate. N Engl J Med.
8. Kircher T, Anderson RE: Cause of death: proper completion
of the death certificate. JAMA 1987; 258:349-352.
9. Erlander D: Computer data processing of medical diagnoses
in pathology. Am J Clin Pathol 1975; 63:538-544.
10. Slater DN: Certifying the cause of death: an audit of wording
inaccuracies. J Clin Pathol 1993; 46:232-234.
11. Walter SD, Birnie SE: Mapping mortality and morbidity patterns:
an international comparison. Intl J Epidemiol. 1991; 20:678-689.
12. Moore GW, Boitnott JK, Miller RE, Eggleston JC,
Hutchins GM. Integrated pathology reporting, indexing, and retrieval
system using natural language diagnoses. Modern Pathol.
13. Schneier B. Applied Cryptography. Protocols, Algorithms,
and Source Code in C. New York: John Wiley & Sons, 1994.
14. Cole SK. Accuracy of death certificates in neonatal deaths.
Community Medicine 1989; 11:1-8.
15. Dunn PM. The search for perinatal definitions and standards.
Acta Paediatr Scand Suppl 1985; 319: 7-16.
16. Lammer EJ, Brown EJ, Anderka MT, Guyer B. Classification and
analysis of fetal deaths in Massachusetts. J Amer Med Assn 1989;
17. Valdes-Dapena MA, Arey JB. The causes of neonatal mortality:
An analysis of 501 autopsies on newborn infants. J Pediatr 1970;
18. Alberman E, Botting B, Blachley N, Twidell A.
A new hierarchical classification of causes of infant deaths
in England and Wales. Arch Dis Childh 1994; 70: 403-409.
19. Saller DN jr, Lesser KB, Harrel U, Rogers BB, Oyer CE.
The clinical utility of the perinatal autopsy. J Amer Med Assn 1995;
20. Hanzlick R, ed. The medical cause of death manual.
Instructions for writing cause of death statements for deaths
due to natural causes. Northfield, IL: College of American
21. Moore GW, Miller RE, Hutchins GM. Indexing by MeSH titles
of natural language pathology phrases identified on first encounter
using the barrier word method. In: Scherrer JR, Cote RA, Mandil SH,
eds. Computerized Natural Medical Language Processing for Knowledge
Representation. Amsterdam: North-Holland; 1989: 29-39.
22. Berman JJ, Moore GW. SNOMED-encoded surgical pathology
databases: a tool for epidemiologic investigation. Modern
Pathology, in press.
23. Hutchins GM and the Autopsy Committee of the College
of American Pathologists: Practice guidelines for autopsy pathology:
autopsy reporting. Arch Pathol Lab Med. 1995; 119:123-130.
24. Svendsen E, Hill RB. Autopsy legislation and practice
in various countries. Arch Pathol Lab Med. 1987; 111:846-850.
25. McFarlane MJ, Feinstein AR, Wells CK, Chan CK.
The 'epidemiologic necropsy'. Unexpected detections,
and changing rates of lung cancer. JAMA. 1987; 258:331-338.
26. McFarlane MJ. Clinical diagnosis is not a source
of bias in selection for necropsy. Arch Pathol Lab Med.
27. Goldman L, Sayson R, Robbins S, Cohn LH, Bettmann M,
Weisberg M. The value of the autopsy in three medical eras.
N Engl J Med. 1983; 308:1000-1005.
28. Goldman L. Diagnostic advances versus the value of the autosy:
1912-1980. Arch Pathol Lab Med. 1984; 108:501-505.
29. Cameron HM, McGoogan E. A prospective study of 1152 hospital
autopsies. 1. Inaccuracies in death certification. J Pathol.
1981; 133: 273-283.
TABLE 1. CAUSE OF DEATH GROUPS.
Group A: Likely as Immediate Cause of Death.
Group B: Likely as Intermediate Cause of Death.
Group C: Likely as Underlying Cause of Death.
Group D: Likely as Other Significant Condition.
Group E: Likely as Risk Factor.
TABLE 2. CAUSE OF DEATH THESAURUS, FREQUENCY AT LEAST 50 CASES.
GROUP A: LIKELY IMMEDIATE CAUSE OF DEATH.
GROUP B: LIKELY INTERMEDIATE CAUSE OF DEATH.
Hyaline membrane disease..................240
GROUP C: LIKELY UNDERLYING CAUSE OF DEATH.
Ventricular septal defect.................107
Atrial septal defect.......................73
GROUP D: LIKELY OTHER SIGNIFICANT CONDITIONS.
Premature rupture membranes...............164
GROUP E: LIKELY RISK FACTORS.
TABLE 3. REPRESENTATIVE AUTOPSY REPORT IN SUGGESTED FORMAT.
Pregnancy in a 27 year old, Coombs negative, Rubella
immune, white female; hypertension; hospitalization;
medications; ultrasound demonstrating viable gestational sac;
polyhydramnios, possible esophageal atresia per sonogram;
pre-eclampsia; Aldomet; absence of fetal movement;
absence of fetal heart sounds with no fetal heart motion
confirmed by sonogram; fetal death in utero;
admission to hospital following rupture of membranes;
pitocin-induced vaginal delivery of stillborn female fetus
delivery of placenta.
Premature female fetus, anatomic age consistent with
38 weeks gestation (weight 1900 gm, crown-heel length 43 cm,
crown-rump length 31 cm, right foot length 6.3 cm).
Sanguineous pleural effusions, right 15 ml, left 15 ml.
CAUSE OF DEATH:
Risk Factor - toxemia.
NOTE: denotes carriage-return-line-feed (ASCII 13, 10).
denotes blank space (ASCII 32).