Moore GW, Berman JJ. Object-oriented English-to-Snomed translator
using Transoft + Hyperpad. Symposium on Computer Applications in
Medical Care 15:973-975, 1991
OBJECT-ORIENTED ENGLISH-TO-SNOMED TRANSLATOR USING TRANSOFT+HYPERPAD
G. William Moore, M.D., Ph.D.
Jules J. Berman, Ph.D., M.D.
From Laboratory Service, Veterans Affairs Medical Center, Baltimore, Maryland; Department of Pathology, University of Maryland School of Medicine, Baltimore, Maryland; and Department of Pathology, The Johns Hopkins Medical Institutions, Baltimore, Marylan
Running Title: ENGLISH-TO-SNOMED TRANSLATOR
Address correspondence to: G. William Moore, M.D., Ph.D., Chief, Autopsy and Electron Microscopy Sections, Laboratory Service (113), Department of Veterans Affairs Medical Center, 3900 Loch Raven Boulevard, Baltimore, MD 21218, TEL: 1-301-467-9932 ext 5321, FAX: 1-301-467-0023.
SNOTRAN (SNOmed TRANslator) is a table-driven, public-domain computer translator, written for HyperPAD ((C) 1989, Brightbill-Roberts), an object-oriented scripting environment. SNOTRAN operates on surgical pathology files, extracting diagnoses and assigning codes in the Systematized Nomenclature Of MEDicine (SNOMED). Context-sensitive translation algorithms are employed, and syntactically correct diagnostic items are produced that are matched with SNOMED codes. English-language surgical pathology reports, accessioned over one year at the Baltimore Veterans Affairs (VA) Medical Center, were translated by SNOTRAN into SNOMED. With an interface to a larger hospital information system, all natural language pathology reports are automatically rendered as SNOMED Topography and Morphology codes. This translator frees the pathologist from the time-intensive task of personally coding each report, and may be used to flag certain diagnostic categories that require specific quality assurance (QA) actions.
All software is provided by the author. Needed for the demonstration are: an IBM PC/AT or compatible (less-than-286 computers run too slowly for demonstration); MS-DOS version at least 3.0; at least 2 Mb of available hard disk drive; and a VGA monitor. A 1.2 Mb, 5.25" floppy disk drive is preferred, but other floppy drives are acceptable. SNOTRAN is in the public domain, and copies will be made available gratis to conference participants on 1.2 Mb, 5.25" floppy disks.
Unstructured natural language text is an indispensable part of the medical record, but there is no widely-available software for abstracting textual information from routine patient care transactions. TRANSOFT is a table-driven, public-domain computer translation shell (1,2), which is embedded in the Veterans Affairs (VA) File Manager (FileMan), the core database management system used in 169 Veterans Affairs Medical Centers (3). The user supplies the dictionary and a grammar in the augmented-transition-network (ATN) style common to many computer translators (4-6). The SNOTRAN (SNOmed TRANslator) demonstration is an object-oriented version of the TRANSOFT computer-translation-shell, applied to one year of English-language surgical pathology reports collected at the Baltimore VA Medical Center, using HyperPAD as the user interface, and the Systematized Nomenclature Of MEDicine (SNOMED) as the target language (7). Words or phrases in the edited reports were pointed to SNOMED-compatible terms. TRANSOFT was used to rearrange the natural language word order used in pathology reports into a standard format for SNOMED topography and morphology codes.
The SNOTRAN demonstration operates on HyperPAD (Brightbill-Roberts, Inc.), an object-oriented scripting environment for the IBM PC or compatibles, akin to HyperCard for the Macintosh computer (8). The public-domain edition of SNOTRAN is supported by a run-time-only version of HyperPAD, called BROWSER, which gives the user access to all input tests and rules, but does not allow reprogramming. HyperPAD objects consist of buttons, fields, pages, backgrounds, and pads. The number of pages is limited only by available disk space. Operations on data contained in fields are controlled by `scripts' (programs) written in the PADtalk language. Input surgical pathology reports reside in the `input' page field; the English-to-SNOMED lexicon resides in the `lexicon' page field; and the grammatical parser (word rearrangement formulas) resides in the `grammar' page field.
TRANSOFT is a table-driven, public-domain computer translation shell, for translating medical statements into formal codes. Defining features of the source language (English) and target language (SNOMED) reside in a database, not in program code. By contrast, commercially available translation programs are `turnkey' or `hands off' systems, where the user is dependent on the vendor to install modifications. Part-of-speech designators include a combination of punctuation, traditional parts-of-speech, and SNOMED axes: [=start-sentence; ]=end-sentence; ,=comma; T=TOPOGRAPHY; M=MORPHOLOGY; C=conjunction; R=preposition; U=undetermined; Y=negation. Translations are obtained by pointing English words or phrases in the lexicon to the corresponding SNOMED term, and rearranging word order according to a `parsing formula', as for example:
[ prostatic adenocarcinoma . English (=source language)
[ T M ] Parts-of-speech
[ T77100 M81403 ] SNOMED (=target language)
1 2 3 4 Target positions
In this case, the English and SNOMED word orders are identical. The parsing formula for this translation is `1[2t3m4]', which means: put [ (=start-sentence) into target position 1; put t (=SNOMED T-code) into target position 2; put m (=SNOMED M-code) into target position 3; put ] (=end-sentence) into target position 4. In a more complex example:
[ adenocarcinoma of prostate , with stromal-hyperplasia .
[ M R T , C M ]
[ T77100 M81403 ] [ T77100 M72430 ]
1 2 3 4 5 6 7 8
The formula for this translation is `1+5[3m0r2+6t0c7m4+8]', which means: put [ (=start-sentence) into target positions 1 and 5; put the first m (=adenocarcinoma) into target position 3; put r (=preposition) into target position 0, i.e., delete; put t (=SNOMED T-code) into target positions 2 and 6; put c (=conjunction) into target position 0, i.e., delete; put the second m (=stromal-hyperplasia) into target position 7; put ] (=end-sentence) into target positions 4 and 8.
Despite the enormous expense of medical record keeping, the textual part of medical records is not routinely translated into controlled vocabularies for quality assurance (QA) reviews. Commercial translators perform poorly with long sentences, or do not produce standardized output codes that are portable to existing hospital information systems. TRANSOFT is a public-domain translator, portable to MS-DOS-based or UNIX-based microcomputers, as well as to a range of minicomputer and mainframe operating systems. TRANSOFT is embedded in the VA hospital information system, with the largest userbase worldwide. SNOTRAN uses TRANSOFT as the translation mechanism; HyperPAD provides an intuitive user interface. All files (source text, lexicon, grammar) may be imported into or exported from HyperPAD as plain ASCII files.
The most important reason for recovering records coded in controlled medical vocabularies is QA (9). In our laboratory, QA consists of examining the sequence of events in each patient's medical history and flagging exceptional sequences, say, a patient with a suspicious biopsy and no followup. In order for QA to be cost-effective, computer translation must be fully automated and have a seamless interface from routine reports into the larger information system. SNOTRAN forms an environment in which natural language medical documents are translated into controlled medical vocabularies, and can serve in the recovery of primary medical records in hospital information systems. This technology can lead to better quality assurance in routine medical practice.
1. Moore GW, Riede UN, Polacsek RA, Miller RE, and Hutchins GM: Automated translation of German to English medical text. Am J Med 81:103-111, 1986.
2. Moore GW, Wakai I, Satomura Y, and Giere W: TRANSOFT: Medical translation expert system. Artif Intell Med 1:149-157, 1989.
3. Davis RG: FileMan: A User Manual. Bethesda, MD: National Association of VA Physicians, 1987.
4. Woods W: Transition network grammars for natural language analysis. Commun Assn Comp Mach 13:591-606, 1970.
5. Vasconcellos M and Leon M: SPANAM and ENGSPAN: Machine translation at the Pan American Health Organization. Comput Linguist 11:122-136, 1985.
6. Hutchins WJ: Machine Translation: Past, Present, Future. Chichester: Ellis Horwood Ltd, 1986.
7. Wingert F, Rothwell D, and C“t‚ R: Automated Indexing into SNOMED and ICD. In, Scherrer JR, C“t‚ RA, and Mandil SH (eds.), Computerized Natural Medical Language Processing for Knowledge Representation. Amsterdam: Elsevier Science Publishers B.V. pp. 201-239.
8. Winkler D and Kamin S: Hypertalk 2.0: The Book. Bantam Books, New York, 1990.
9. Berman JJ: Solving Quality Assurance Problems with Object Scripting Languages. Artif Intell Med, in press, 1991.
Page last modified December 18, 2012