biomedical informatics cover Perl Programming for Medicine and Biology Cover

Biomedical Informatics books
by Jules J. Berman

  • Jones & Bartlett sales and informational website for Biomedical Informatics
  • Amazon.com U.S. book site for Biomedical Informatics
  • Full Table of Contents from Library of Congress for Biomedical Informatics
  • List of book-related resources
  • Brief author biography on Association for Pathology Informatics Website
  • Quick link to PubMed listing for Jules J. Berman
  • Full list of Publications for Jules J. Berman
  • Dr. Bruce Friedman's review of Biomedical Informatics
  • Perl Programming for Medicine and Biology companion site.
  • Author's blog on data specifications
  • Contact author



  • Submitted by Jules J. Berman, Ph.D., M.D.
    
    Session Title: Predicting tumor marker outcomes with Monte Carlo
    simulations.
    
    APIII, Pittsburgh, PA, Oct 7-10, 2003 
    
    Scientific Session Categories: Outcomes Research
    
    Author:
    
    Jules J. Berman, NIH
    
    Background:
    Genome and proteome research have promised a revolution in tumor
    diagnosis.  The revolution has not arrived.  In fact, only a handful
    of new markers have appeared in the past several years. A simple
    thought experiment demonstrates the problem.
    
    In a retrospective study, Dr. X demonstrated a "perfect" tumor marker
    that never failed to distinguish between two tumor variants (aggressive
    and indolent) with identical morphology.  In this example, an aggressive
    variant grows 10 times as fast and metastasizes at ten times the rate of
    the indolent variant with the same morphology.
    
    In a prospective trial of the same marker, 200 tumors are excised at the
    time of clinical detection (tumor size 2 cm).  Dr. X finds that 100 of
    the tumors stain as "indolent variants" and 100 tumors stain as
    "aggressive variants". The trials follows all 200 patients, determining
    survival at five years. At the end of the trial, there is no survival
    difference between patients with "indolent variants" and patients with
    "aggressive variants".  The marker is considered a total failure, with
    millions of dollars wasted on the prospective trial.
    
    How is this possible?  In the prospective study, all tumors were excised
    at 2 cm.  Survival after excision was determined entirely by the presence
    of metastases, as patients with aggressive or indolent tumors without
    metastasis [prior to excision] were cured by the procedure.  Since the
    aggressive tumors have a growth rate 10 times that of the indolent tumors,
    they reached 2 cm size in 1/10th the time required for the indolent
    tumors. The rate of metastasis in the aggressive tumor is 10 times that
    of the indolent tumor, but since aggressive tumors had 1/10th the growth
    history in which to metastasize, both the aggressive and indolent tumors
    had the same number of metastastic cases when the tumors were excised.
    Hence, there was no difference in the survival outcome between the tumor
    variants. Dr. X may have benefited from a simulation model designed to
    predict outcomes from a set of biological conditions and restraints.
    
    The purpose of this project is to provide general scripts for predicting
    tumor marker outcomes using calculation-intensive Monte Carlo algorithms
    that model tumor growth and metastasis.
    
    Design:
    Perl scripts written by the author made use of a random number generator
    to create Monte Carlo simulations of tumor growth and metastasis.
    Scripts were written with two isomorphic simulations,  probabilistic
    prediction (fast) and brute-force per/cell random number generation
    (slow).
    
    Results:
    Simulations predicted differences in growth and metastatic occurrences
    from preset potential  probabilities.  Monte Carlo algorithms using per
    cell calculations required seconds to minutes for each tumor growth
    simulation, on a 2.79 GHz desk-top computer.
    
    Conclusion:
    Computer simulations may be helpful when they model plausible outcomes
    unanticipated by human thought.  Perl is a free, open source, cross-
    platform language.  All Perl scripts, along with explanatory text, are
    placed in the public domain and are available for download from:
    http://65.222.228.150/jjb/randab.htm
    
    Contact Address Information:
    - Jules J. Berman, Ph.D., M.D.
    - 6130 Executive Blvd (EPN/6028)
    - Rockville, MD 20892
    - USA
    
    Contact Phone: 301-496-7147
    Contact Fax: 301-402-7819
    Contact Email: bermanj@mail.nih.gov
    
    Citations to this abstract/script should appear as:
    Berman JJ. Predicting tumor marker outcomes with Monte Carlo
    simulations. Submitted, August 23, 2003, to APIII, Oct 8-10, 2003,
    Pittsburgh, PA.
    
    Monte6.pl is an outcome simulator, creating a simulation similar
    to that described in the abstract above.
    
    An example outcome of the perl script is shown below:
    
    c:\ftp>perl monte6.pl
    It took 19 seconds to compute 50,000 simulations
    for both tumors (low probability of metastasis and high
    probability of metastasis.
    
    50000 49866 22666
    chance ratio is 10
    simulation ratio is 2.20003529515574
    
    
    What does monte6.pl do, and what does the output mean?
    
    Monte6.pl takes an initial per cell per generation chance of 
    metastasis for each of two tumors.  The two tumors are referred
    to as "good" and "bad", corresponding to the indolent and 
    aggressive variants described in the abstract's example case.
    
    We provide an initial metastatic probability of 
    0.00000009, for the bad tumor and
    0.000000009, for the good tumor.
    
    This meaans that each cell in either tumor has a very low likelihood
    of metastasizing but that the probability for the cells in the bad
    tumor are ten times as high as the probability for the cells in the
    good tumor.
    
    Both tumor begin their growth history with a single cell, and the
    cell number doubles with each generation.  This is actually a poor
    mathematical expression of tumor growth.  Real tumor growth is a 
    balance between the processes of cell growth and cell death.  Berman
    and Moore previously described a tumor growth model that accounted
    for per cell probabilities of cell death:
    
    Berman JJ, Moore GW. The role of cell death in the growth of 
    preneoplastic lesions: a Monte Carlo simulation model. Cell 
    Proliferation 25(6):549-557, 1992.  
    
    This article is available at: http://www.pathinfo.com/jjb/init.htm
    
    Monte6.pl is a simple growth model that assumes that there is a 
    doubling time for tumor growth determined wholly by the generation
    (cell cycle time) for the cells in the tumor.  
    
    Each time the tumor doubles in size, the probability increases that 
    at least one of the cells in the tumor has succeeded to metastasize.
    
    The formula for this probability is:
    $baddoublechance = 1 - ((1-$badprevchance)*(1-$badprevchance));
    
    In each generation, the number of tumor cells doubles.  Each 
    "half" of the resultant population has the same probability of
    metastasizing as the same set of cells from the prior generation.
    The probability that neither  half of the population has 
    metastastasized is: 
    (1-previous generation probability)*(1-previous generation probability)
    The probability that at least one cell has metastasized is
    1-((1-previous generation probability)*(1-previous generation probability))
    
    Each generation, the probability iterates, using the prior probability
    computed for the previous tumor cell generation.
    
    In this example, the increasing probabilities for metastastis
    in the tumor with the higher rate of growth and metastasis used in
    monte6.pl, is shown below.
    
    
    Tumor       Probability 
    cell        of
    number      metastasis
    
    1        9e-008
    2        1.7999999191165e-007
    4        3.59999951404788e-007
    8        7.19999773246549e-007
    16        1.43999902812997e-006
    32        2.87999598269639e-006
    64        5.75998367102759e-006
    128        1.15199341645944e-005
    256        2.30397356203449e-005
    512        4.60789404113093e-005
    1024        9.21557575538356e-005
    2048        0.000184303022424004
    4096        0.000368572077243945
    8192        0.00073700830911172
    16384        0.00147347343697579
    32768        0.00294477574998209
    65536        0.00588087979574647
    131072        0.011727174844321
    262144        0.0233168230588127
    524288        0.0460899718800694
    1048576        0.0900556582522332
    2097152        0.172001294921223
    4194304        0.314418144387869
    8388608        0.529977519255427
    16777216        0.779078867594718
    33554432        0.951193853256768
    67108864        0.997617960040078
    134217728        0.999994325885629
    268435456        0.999999999967804
    536870912        1
    
    In this simulation example, when the bad tumor achieves a cell
    number of 536,870,912, it's chance of metastasis is essentially
    1.
    
    This script performs 5000 simulations (calculated experiments)
    for the good tumor and for the bad tumor.  
    
    In each simulation, Perl checks to make sure that the tumor has
    not metastized (i.e. that $nicresult is not equal to 1).
    If this is the case, Perl selects a random number between 0 and
    1.  If the value of the random number is less than the probability
    that the tumor has metastasized, then it assigns a state of 
    metastasis to the tumor (i.e. sets $niceresult to 1).
    
    In this case, if the tumor has not yet reached a size of 50 
    million cells, then it has (in the simulation) metastasized prior
    to the time that the tumor was excised [as part of the prospective
    study].
    
        if ($niceresult != 1) #if a metastatic state has not been achieved
           {
           $randomvar = rand();
           $niceresult = $randomvar < $goodhash{$nice} ? 1: 0;
           }
        if (($niceresult == 1) && ($nice < 50000000))
           {
           $niceoutcome = "$n $nice $niceresult";
           }
     
    
    When the tumors exceed a size of 50,000,000 cells, the experiment 
    stops for the tumors, and it noted whether the tumor has succeeded in 
    metastasizing (whether $niceresult or $badresult has shifted to 1).
    
    The results are tabulated for the 5000 simulations.
    
    The results are summarized for 5000 simulations:
    
    c:\ftp>perl monte6.pl
    It took 19 seconds to compute 50,000 simulations
    for both tumors (low probability of metastasis and high
    probability of metastasis.
    
    50000 49866 22666
    chance ratio is 10
    simulation ratio is 2.20003529515574
    
    The most important line is : 50000 49866 22666 
    
    This indicates that in 50 thousand simulations, the bad tumor 
    metastasized 49,866 times by the time that the tumor reached  
    a calculated size exceeding 50 million cells.  This in turn means
    that in a prospective study where tumors are excised at a size
    of 50 million cells in a study population of 50,000 patients,
    49,866 patients would die from the tumor (because it had already
    metastasized at the time of primary tumor excision).
    
    On the other hand, in the good tumor popultion, there were 22,666
    predicted metastastic tumors.  This means that fewer than half of
    the patients with good tumors would have metastatic disease at the
    time of tumor excision.
    
    Remember that our initial conditions gave the bad tumor a 10 fold 
    higher metastastic rate than the good tumors.  Wouldn't we expect
    that the bad tumors would metastasize 10 times more frequently than
    the good tumors.  Well, it doesn't turn out that way.  That's the
    whole point of having a simulation. 
    
    
    #!/usr/bin/perl
    #monte6.pl
    #Copyright Jules J. Berman, 2003
    #This software was created August 22, 2003
    #This software is entered into the public domain and
    #can be used freely.
    #Citations to this abstract/script should appear as:
    #Berman JJ. Predicting tumor marker outcomes with Monte Carlo
    #simulations. Submitted, August 23, 2003 to APIII, Oct 8-10, 2003,
    #Pittsburgh, PA.
    #The software is provided "as is", without warranty of any kind,
    #express or implied, including but not limited to the warranties
    #of merchantability, fitness for a particular purpose and
    #noninfringement. in no event shall the authors or copyright
    #holders be liable for any claim, damages or other liability,
    #whether in an action of contract, tort or otherwise, arising
    #from, out of or in connection with the software or the use or
    #other dealings in the software.
    #
    #
    $start = time();
    my $badprevchance =  0.00000009;
    my $stillbad = $badprevchance;
    my $goodprevchance = 0.000000009;
    my $stillgood = $goodprevchance;
    my %badhash;
    my %goodhash;
    my $size = 1;
    my $i = 1;
    my %count = 0;
    my $badtotal = 0;
    my $nicetotal = 0;
    $badhash{$size} = $badprevchance;
    while ($i < 30)
      {
      $size = $size * 2;
      my $baddoublechance = 1 - ((1-$badprevchance)*(1-$badprevchance));
      $badhash{$size} = $baddoublechance;
      $badprevchance = $baddoublechance;
      $i = $i + 1;
      }
    $size = 1;
    $i = 1;
    $goodhash{$size} = $goodprevchance;
    while ($i < 30)
      {
      $size = $size * 2;
      my $gooddoublechance = 1 - ((1-$goodprevchance)*(1-$goodprevchance));
      $goodhash{$size} = $gooddoublechance;
      $goodprevchance = $gooddoublechance;
      $i = $i + 1;
      }
    while ($count < 50000)
      {
      my $n = 1;
      $count++;
      my $bad = 1;
      my $nice = 1;
      my $badoutcome = "No mets for the bad tumor";
      my $niceoutcome = "No mets for the slow tumor";
      my $niceresult = 0;
      my $badresult = 0;
      my $randomvar = 0;
      while (1)   #this will loop forever unless something in block promts exit
        {
        $nice = 2 * $nice; #tumor number doubles
        if ($niceresult != 1) #if a metastatic state has not been achieved
           {
           $randomvar = rand();
           $niceresult = $randomvar < $goodhash{$nice} ? 1: 0;
           }
        if (($niceresult == 1) && ($nice < 50000000))
           {
           $niceoutcome = "$n $nice $niceresult";
           }
        $bad = 2 * $bad;
        if ($badresult != 1)
            {
            $randomvar = rand();
            $badresult = $randomvar < $badhash{$bad} ? 1: 0;
            }
        if (($badresult == 1) && ($bad < 50000000))
            {
            $badoutcome = "$n $bad $badresult";
            }
        $n = $n +1;
        last if (($bad > 50000000) && ($nice > 50000000));
        last if (($badresult ==1) && ($nice > 50000000));
        last if (($bad > 50000000) && ($niceresult == 1));
        }
      #print "$badoutcome\n$niceoutcome\n\n";
      $badtotal++ if ($badoutcome =~ /\d/);
      $nicetotal++ if ($niceoutcome =~ /\d/);
      }
    
    $end = time();
    $totaltime = $end - $start;
    print "It took $totaltime seconds to compute 50,000 simulations\n";
    print "for both tumors (low probability of metastasis and high\n";
    print "probability of metastasis.\n\n";
    print "$count $badtotal $nicetotal\n";
    $chanceratio = ($stillbad) / ($stillgood);
    print "chance ratio is $chanceratio\n";
    $simratio = $badtotal / $nicetotal;
    print "simulation ratio is $simratio\n";
    exit;
    
    monteone.pl
    This script was written to show how fast you might expect
    a simulation to run if it's all brute force, running random
    numbers for each and every cell, at each and every generation.
    This script does not use computed probability predictions for
    populations, as did monte6.pl.  It performs a random operation
    for each and every tumor cells, at each growth geneeration.
    These studies typically take 14 seconds to 1 minute to 
    execute.  Although feasible, it would not be practical to repeat
    this study 5,000 times (as is done in monte6.pl).
    
    
    #!/usr/bin/perl
    #monteone.pl
    #Copyright Jules J. Berman, 2003
    #This software was created August 22, 2003
    #This software is entered into the public domain and
    #can be used freely.
    #Citations to this abstract/script should appear as:
    #Berman JJ. Predicting tumor marker outcomes with Monte Carlo
    #simulations. Submitted, August 23, 2003, to APIII, Oct 8-10, 2003,
    #Pittsburgh, PA.
    #The software is provided "as is", without warranty of any kind,
    #express or implied, including but not limited to the warranties
    #of merchantability, fitness for a particular purpose and
    #noninfringement. in no event shall the authors or copyright
    #holders be liable for any claim, damages or other liability,
    #whether in an action of contract, tort or otherwise, arising
    #from, out of or in connection with the software or the use or
    #other dealings in the software.
    #Assumptions:
    #We assume that metastasis is related to the number of
    #cells in a tumor.  A tumor with 100 cells has a lower
    #chance of metastasizing than a tumor with 1 million cells.
    #We endow each cell in a tumor with the same probability
    #and watch the Monte Carlo simulation of the events.
    
    my $badresult = 0;  #begin with a state of no metastasis for bad tumor
    my $bad = 1;   #begin with 1 bad tumor cell
    my $n = 1; #begin on the first generation
    my $badoutcome = "No mets for the bad tumor";  #begin with no metastases
    $start = time();
    while (1)   #this will loop forever unless something in block promts exit
      {
      $bad = 2 * $bad;
      print "$bad\n";
      srand;
      for (1...$bad)
         {
         $badchance = int(rand(5000)) +1;
         if ($badchance == 5000)
             {
             $badchance = int(rand(10000)) +1;
             if ($badchance == 10000)
                {
                $badoutcome = "$n $bad $badchance\n";
                print $badoutcome;
                $end = time();
                $totaltime = $end - $start;
                print "Time for execution is $totaltime seconds\n";
                exit;
                }
             }
         }
      $n++;
      if ($bad > 50000000)
        {
        print "$badoutcome\n";
        $end = time();
        $totaltime = $end - $start;
        print "Time for execution is $totaltime seconds\n";
        exit;
        }
      }
    exit;
    
    rand.pl script
    This script does a rough test of Perl's built-in random
    number generator.  This is necessary because the random
    number generator may not pick evenly distributed [random]
    numbers for the range desired, when the range is very large.
    
    So, if you have:
    $x = range(50000000) you may not get what you expect.
    The largest range used in the scripts provided here is
    10000 (i.e. rand(10000))
    The code line shown is what's being tested by the script.
    $x = int(rand(10000)) + 1;
    This should select a random number between 1 and 10000.
    So if I repeat the random selection 100 million times,
    each number, each number from 1 to 5000 should be picked
    about the same number of times (i.e. 10000 times).
    The script creates a hash consisting of each number from
    1 to 10000 as keys and the number of times each number is
    chosed by the random number generator as values.
    It demonstrates (at least for my system), that each
    number is selected about 10,000 times (varies +- about a
    thousand).
    
    #!/usr/bin/perl
    #rand.pl
    #Copyright Jules J. Berman, 2003
    #This software was created August 22, 2003
    #This software is entered into the public domain and
    #can be used freely.
    #Berman JJ. Predicting tumor marker outcomes with Monte Carlo
    #simulations. Submitted, August 23, 2003, to APIII, Oct 8-10, 2003,
    #Pittsburgh, PA.
    #The software is provided "as is", without warranty of any kind,
    #express or implied, including but not limited to the warranties
    #of merchantability, fitness for a particular purpose and
    #noninfringement. in no event shall the authors or copyright
    #holders be liable for any claim, damages or other liability,
    #whether in an action of contract, tort or otherwise, arising
    #from, out of or in connection with the software or the use or
    #other dealings in the software.
    #
    #
    open (HOLD,">holder.txt")||die"cannot";
    while ($n < 100000000)
      {
      $x = int(rand(10000)) + 1;  #pick a number between 1 and a hundred
      $randhash{$x}++;          #make a hash of the numbers picked and the
      $n++;                     #number of times each is picked
      }
    
    foreach $key (sort byval keys %randhash)
      {
      print HOLD "$randhash{$key} $key\n";
      }
    
    sub byval
      {
      $randhash{$a} <=> $randhash{$b};
      }
    exit;
    
    key words: tumor outcomes, prospective trial, clinical trial, 
    statistics, resampling, Efron, api, informatics, pathology
    
    biomedical informatics cover Perl Programming for Medicine and Biology Cover