Submitted by Jules J. Berman, Ph.D., M.D.
Session Title: Predicting tumor marker outcomes with Monte Carlo
simulations.
APIII, Pittsburgh, PA, Oct 7-10, 2003
Scientific Session Categories: Outcomes Research
Author:
Jules J. Berman, NIH
Background:
Genome and proteome research have promised a revolution in tumor
diagnosis. The revolution has not arrived. In fact, only a handful
of new markers have appeared in the past several years. A simple
thought experiment demonstrates the problem.
In a retrospective study, Dr. X demonstrated a "perfect" tumor marker
that never failed to distinguish between two tumor variants (aggressive
and indolent) with identical morphology. In this example, an aggressive
variant grows 10 times as fast and metastasizes at ten times the rate of
the indolent variant with the same morphology.
In a prospective trial of the same marker, 200 tumors are excised at the
time of clinical detection (tumor size 2 cm). Dr. X finds that 100 of
the tumors stain as "indolent variants" and 100 tumors stain as
"aggressive variants". The trials follows all 200 patients, determining
survival at five years. At the end of the trial, there is no survival
difference between patients with "indolent variants" and patients with
"aggressive variants". The marker is considered a total failure, with
millions of dollars wasted on the prospective trial.
How is this possible? In the prospective study, all tumors were excised
at 2 cm. Survival after excision was determined entirely by the presence
of metastases, as patients with aggressive or indolent tumors without
metastasis [prior to excision] were cured by the procedure. Since the
aggressive tumors have a growth rate 10 times that of the indolent tumors,
they reached 2 cm size in 1/10th the time required for the indolent
tumors. The rate of metastasis in the aggressive tumor is 10 times that
of the indolent tumor, but since aggressive tumors had 1/10th the growth
history in which to metastasize, both the aggressive and indolent tumors
had the same number of metastastic cases when the tumors were excised.
Hence, there was no difference in the survival outcome between the tumor
variants. Dr. X may have benefited from a simulation model designed to
predict outcomes from a set of biological conditions and restraints.
The purpose of this project is to provide general scripts for predicting
tumor marker outcomes using calculation-intensive Monte Carlo algorithms
that model tumor growth and metastasis.
Design:
Perl scripts written by the author made use of a random number generator
to create Monte Carlo simulations of tumor growth and metastasis.
Scripts were written with two isomorphic simulations, probabilistic
prediction (fast) and brute-force per/cell random number generation
(slow).
Results:
Simulations predicted differences in growth and metastatic occurrences
from preset potential probabilities. Monte Carlo algorithms using per
cell calculations required seconds to minutes for each tumor growth
simulation, on a 2.79 GHz desk-top computer.
Conclusion:
Computer simulations may be helpful when they model plausible outcomes
unanticipated by human thought. Perl is a free, open source, cross-
platform language. All Perl scripts, along with explanatory text, are
placed in the public domain and are available for download from:
http://65.222.228.150/jjb/randab.htm
Contact Address Information:
- Jules J. Berman, Ph.D., M.D.
- 6130 Executive Blvd (EPN/6028)
- Rockville, MD 20892
- USA
Contact Phone: 301-496-7147
Contact Fax: 301-402-7819
Contact Email: bermanj@mail.nih.gov
Citations to this abstract/script should appear as:
Berman JJ. Predicting tumor marker outcomes with Monte Carlo
simulations. Submitted, August 23, 2003, to APIII, Oct 8-10, 2003,
Pittsburgh, PA.
Monte6.pl is an outcome simulator, creating a simulation similar
to that described in the abstract above.
An example outcome of the perl script is shown below:
c:\ftp>perl monte6.pl
It took 19 seconds to compute 50,000 simulations
for both tumors (low probability of metastasis and high
probability of metastasis.
50000 49866 22666
chance ratio is 10
simulation ratio is 2.20003529515574
What does monte6.pl do, and what does the output mean?
Monte6.pl takes an initial per cell per generation chance of
metastasis for each of two tumors. The two tumors are referred
to as "good" and "bad", corresponding to the indolent and
aggressive variants described in the abstract's example case.
We provide an initial metastatic probability of
0.00000009, for the bad tumor and
0.000000009, for the good tumor.
This meaans that each cell in either tumor has a very low likelihood
of metastasizing but that the probability for the cells in the bad
tumor are ten times as high as the probability for the cells in the
good tumor.
Both tumor begin their growth history with a single cell, and the
cell number doubles with each generation. This is actually a poor
mathematical expression of tumor growth. Real tumor growth is a
balance between the processes of cell growth and cell death. Berman
and Moore previously described a tumor growth model that accounted
for per cell probabilities of cell death:
Berman JJ, Moore GW. The role of cell death in the growth of
preneoplastic lesions: a Monte Carlo simulation model. Cell
Proliferation 25(6):549-557, 1992.
This article is available at: http://www.pathinfo.com/jjb/init.htm
Monte6.pl is a simple growth model that assumes that there is a
doubling time for tumor growth determined wholly by the generation
(cell cycle time) for the cells in the tumor.
Each time the tumor doubles in size, the probability increases that
at least one of the cells in the tumor has succeeded to metastasize.
The formula for this probability is:
$baddoublechance = 1 - ((1-$badprevchance)*(1-$badprevchance));
In each generation, the number of tumor cells doubles. Each
"half" of the resultant population has the same probability of
metastasizing as the same set of cells from the prior generation.
The probability that neither half of the population has
metastastasized is:
(1-previous generation probability)*(1-previous generation probability)
The probability that at least one cell has metastasized is
1-((1-previous generation probability)*(1-previous generation probability))
Each generation, the probability iterates, using the prior probability
computed for the previous tumor cell generation.
In this example, the increasing probabilities for metastastis
in the tumor with the higher rate of growth and metastasis used in
monte6.pl, is shown below.
Tumor Probability
cell of
number metastasis
1 9e-008
2 1.7999999191165e-007
4 3.59999951404788e-007
8 7.19999773246549e-007
16 1.43999902812997e-006
32 2.87999598269639e-006
64 5.75998367102759e-006
128 1.15199341645944e-005
256 2.30397356203449e-005
512 4.60789404113093e-005
1024 9.21557575538356e-005
2048 0.000184303022424004
4096 0.000368572077243945
8192 0.00073700830911172
16384 0.00147347343697579
32768 0.00294477574998209
65536 0.00588087979574647
131072 0.011727174844321
262144 0.0233168230588127
524288 0.0460899718800694
1048576 0.0900556582522332
2097152 0.172001294921223
4194304 0.314418144387869
8388608 0.529977519255427
16777216 0.779078867594718
33554432 0.951193853256768
67108864 0.997617960040078
134217728 0.999994325885629
268435456 0.999999999967804
536870912 1
In this simulation example, when the bad tumor achieves a cell
number of 536,870,912, it's chance of metastasis is essentially
1.
This script performs 5000 simulations (calculated experiments)
for the good tumor and for the bad tumor.
In each simulation, Perl checks to make sure that the tumor has
not metastized (i.e. that $nicresult is not equal to 1).
If this is the case, Perl selects a random number between 0 and
1. If the value of the random number is less than the probability
that the tumor has metastasized, then it assigns a state of
metastasis to the tumor (i.e. sets $niceresult to 1).
In this case, if the tumor has not yet reached a size of 50
million cells, then it has (in the simulation) metastasized prior
to the time that the tumor was excised [as part of the prospective
study].
if ($niceresult != 1) #if a metastatic state has not been achieved
{
$randomvar = rand();
$niceresult = $randomvar < $goodhash{$nice} ? 1: 0;
}
if (($niceresult == 1) && ($nice < 50000000))
{
$niceoutcome = "$n $nice $niceresult";
}
When the tumors exceed a size of 50,000,000 cells, the experiment
stops for the tumors, and it noted whether the tumor has succeeded in
metastasizing (whether $niceresult or $badresult has shifted to 1).
The results are tabulated for the 5000 simulations.
The results are summarized for 5000 simulations:
c:\ftp>perl monte6.pl
It took 19 seconds to compute 50,000 simulations
for both tumors (low probability of metastasis and high
probability of metastasis.
50000 49866 22666
chance ratio is 10
simulation ratio is 2.20003529515574
The most important line is : 50000 49866 22666
This indicates that in 50 thousand simulations, the bad tumor
metastasized 49,866 times by the time that the tumor reached
a calculated size exceeding 50 million cells. This in turn means
that in a prospective study where tumors are excised at a size
of 50 million cells in a study population of 50,000 patients,
49,866 patients would die from the tumor (because it had already
metastasized at the time of primary tumor excision).
On the other hand, in the good tumor popultion, there were 22,666
predicted metastastic tumors. This means that fewer than half of
the patients with good tumors would have metastatic disease at the
time of tumor excision.
Remember that our initial conditions gave the bad tumor a 10 fold
higher metastastic rate than the good tumors. Wouldn't we expect
that the bad tumors would metastasize 10 times more frequently than
the good tumors. Well, it doesn't turn out that way. That's the
whole point of having a simulation.
#!/usr/bin/perl
#monte6.pl
#Copyright Jules J. Berman, 2003
#This software was created August 22, 2003
#This software is entered into the public domain and
#can be used freely.
#Citations to this abstract/script should appear as:
#Berman JJ. Predicting tumor marker outcomes with Monte Carlo
#simulations. Submitted, August 23, 2003 to APIII, Oct 8-10, 2003,
#Pittsburgh, PA.
#The software is provided "as is", without warranty of any kind,
#express or implied, including but not limited to the warranties
#of merchantability, fitness for a particular purpose and
#noninfringement. in no event shall the authors or copyright
#holders be liable for any claim, damages or other liability,
#whether in an action of contract, tort or otherwise, arising
#from, out of or in connection with the software or the use or
#other dealings in the software.
#
#
$start = time();
my $badprevchance = 0.00000009;
my $stillbad = $badprevchance;
my $goodprevchance = 0.000000009;
my $stillgood = $goodprevchance;
my %badhash;
my %goodhash;
my $size = 1;
my $i = 1;
my %count = 0;
my $badtotal = 0;
my $nicetotal = 0;
$badhash{$size} = $badprevchance;
while ($i < 30)
{
$size = $size * 2;
my $baddoublechance = 1 - ((1-$badprevchance)*(1-$badprevchance));
$badhash{$size} = $baddoublechance;
$badprevchance = $baddoublechance;
$i = $i + 1;
}
$size = 1;
$i = 1;
$goodhash{$size} = $goodprevchance;
while ($i < 30)
{
$size = $size * 2;
my $gooddoublechance = 1 - ((1-$goodprevchance)*(1-$goodprevchance));
$goodhash{$size} = $gooddoublechance;
$goodprevchance = $gooddoublechance;
$i = $i + 1;
}
while ($count < 50000)
{
my $n = 1;
$count++;
my $bad = 1;
my $nice = 1;
my $badoutcome = "No mets for the bad tumor";
my $niceoutcome = "No mets for the slow tumor";
my $niceresult = 0;
my $badresult = 0;
my $randomvar = 0;
while (1) #this will loop forever unless something in block promts exit
{
$nice = 2 * $nice; #tumor number doubles
if ($niceresult != 1) #if a metastatic state has not been achieved
{
$randomvar = rand();
$niceresult = $randomvar < $goodhash{$nice} ? 1: 0;
}
if (($niceresult == 1) && ($nice < 50000000))
{
$niceoutcome = "$n $nice $niceresult";
}
$bad = 2 * $bad;
if ($badresult != 1)
{
$randomvar = rand();
$badresult = $randomvar < $badhash{$bad} ? 1: 0;
}
if (($badresult == 1) && ($bad < 50000000))
{
$badoutcome = "$n $bad $badresult";
}
$n = $n +1;
last if (($bad > 50000000) && ($nice > 50000000));
last if (($badresult ==1) && ($nice > 50000000));
last if (($bad > 50000000) && ($niceresult == 1));
}
#print "$badoutcome\n$niceoutcome\n\n";
$badtotal++ if ($badoutcome =~ /\d/);
$nicetotal++ if ($niceoutcome =~ /\d/);
}
$end = time();
$totaltime = $end - $start;
print "It took $totaltime seconds to compute 50,000 simulations\n";
print "for both tumors (low probability of metastasis and high\n";
print "probability of metastasis.\n\n";
print "$count $badtotal $nicetotal\n";
$chanceratio = ($stillbad) / ($stillgood);
print "chance ratio is $chanceratio\n";
$simratio = $badtotal / $nicetotal;
print "simulation ratio is $simratio\n";
exit;
monteone.pl
This script was written to show how fast you might expect
a simulation to run if it's all brute force, running random
numbers for each and every cell, at each and every generation.
This script does not use computed probability predictions for
populations, as did monte6.pl. It performs a random operation
for each and every tumor cells, at each growth geneeration.
These studies typically take 14 seconds to 1 minute to
execute. Although feasible, it would not be practical to repeat
this study 5,000 times (as is done in monte6.pl).
#!/usr/bin/perl
#monteone.pl
#Copyright Jules J. Berman, 2003
#This software was created August 22, 2003
#This software is entered into the public domain and
#can be used freely.
#Citations to this abstract/script should appear as:
#Berman JJ. Predicting tumor marker outcomes with Monte Carlo
#simulations. Submitted, August 23, 2003, to APIII, Oct 8-10, 2003,
#Pittsburgh, PA.
#The software is provided "as is", without warranty of any kind,
#express or implied, including but not limited to the warranties
#of merchantability, fitness for a particular purpose and
#noninfringement. in no event shall the authors or copyright
#holders be liable for any claim, damages or other liability,
#whether in an action of contract, tort or otherwise, arising
#from, out of or in connection with the software or the use or
#other dealings in the software.
#Assumptions:
#We assume that metastasis is related to the number of
#cells in a tumor. A tumor with 100 cells has a lower
#chance of metastasizing than a tumor with 1 million cells.
#We endow each cell in a tumor with the same probability
#and watch the Monte Carlo simulation of the events.
my $badresult = 0; #begin with a state of no metastasis for bad tumor
my $bad = 1; #begin with 1 bad tumor cell
my $n = 1; #begin on the first generation
my $badoutcome = "No mets for the bad tumor"; #begin with no metastases
$start = time();
while (1) #this will loop forever unless something in block promts exit
{
$bad = 2 * $bad;
print "$bad\n";
srand;
for (1...$bad)
{
$badchance = int(rand(5000)) +1;
if ($badchance == 5000)
{
$badchance = int(rand(10000)) +1;
if ($badchance == 10000)
{
$badoutcome = "$n $bad $badchance\n";
print $badoutcome;
$end = time();
$totaltime = $end - $start;
print "Time for execution is $totaltime seconds\n";
exit;
}
}
}
$n++;
if ($bad > 50000000)
{
print "$badoutcome\n";
$end = time();
$totaltime = $end - $start;
print "Time for execution is $totaltime seconds\n";
exit;
}
}
exit;
rand.pl script
This script does a rough test of Perl's built-in random
number generator. This is necessary because the random
number generator may not pick evenly distributed [random]
numbers for the range desired, when the range is very large.
So, if you have:
$x = range(50000000) you may not get what you expect.
The largest range used in the scripts provided here is
10000 (i.e. rand(10000))
The code line shown is what's being tested by the script.
$x = int(rand(10000)) + 1;
This should select a random number between 1 and 10000.
So if I repeat the random selection 100 million times,
each number, each number from 1 to 5000 should be picked
about the same number of times (i.e. 10000 times).
The script creates a hash consisting of each number from
1 to 10000 as keys and the number of times each number is
chosed by the random number generator as values.
It demonstrates (at least for my system), that each
number is selected about 10,000 times (varies +- about a
thousand).
#!/usr/bin/perl
#rand.pl
#Copyright Jules J. Berman, 2003
#This software was created August 22, 2003
#This software is entered into the public domain and
#can be used freely.
#Berman JJ. Predicting tumor marker outcomes with Monte Carlo
#simulations. Submitted, August 23, 2003, to APIII, Oct 8-10, 2003,
#Pittsburgh, PA.
#The software is provided "as is", without warranty of any kind,
#express or implied, including but not limited to the warranties
#of merchantability, fitness for a particular purpose and
#noninfringement. in no event shall the authors or copyright
#holders be liable for any claim, damages or other liability,
#whether in an action of contract, tort or otherwise, arising
#from, out of or in connection with the software or the use or
#other dealings in the software.
#
#
open (HOLD,">holder.txt")||die"cannot";
while ($n < 100000000)
{
$x = int(rand(10000)) + 1; #pick a number between 1 and a hundred
$randhash{$x}++; #make a hash of the numbers picked and the
$n++; #number of times each is picked
}
foreach $key (sort byval keys %randhash)
{
print HOLD "$randhash{$key} $key\n";
}
sub byval
{
$randhash{$a} <=> $randhash{$b};
}
exit;
key words: tumor outcomes, prospective trial, clinical trial,
statistics, resampling, Efron, api, informatics, pathology