Using SEER public-use datasets
Data Mining the differences in occurrences of neoplasms
in the U.S. African-American and White populations

SEER is the U.S. National Cancer Institute's Surveillance, Epidemiology and End Results program. It is an amazing resource for information about the cancers that occur in the U.S. One of the produces of SEER is the Public Use datasets, which contain de-identified records on over 3.5 million cancers that have occurred between 1973 and 2005.

When you have 3.5 million cancer cases to study, you can draw certain types of inferences that could not possibly be made with the data accumulated at any single medical institution.

Each SEER record is a cancer case, described by a series of 258 (mostly) numbers, in byte-assigned positions, described by a data dictionary document. When you have the byte locations for the data dictionary entries, you can easily write a short script (I like to use Perl, Ruby, or Python) that can extract and compile data any way you wish.

Here is a listing of the ratios of cancer cases, of different types, occurring in the white population and the African-American population in the U.S.

The first column is the case occurrence ratio (white/black). The second column is the number of cases, for each tumor type, in the SEER public use data sets. Tumors occurring with under 40 cases were excluded. The third column is the average age of patients with the neoplasm. The fourth column is the ICD-O(international classification of diseases - oncology) neoplasm term, truncated for space limitation.
White/black No. cases Avg age ICD-0 Diagnosis
  00.113    0000040   040     pigmented dermatofibrosarcoma protuberans
  00.190    0000126   060     adult t-cell leukemia/lymphoma (htlv-1 pos.)
  00.225    0000056   068     granular cell tumor, malignant
  00.227    0000057   061     collecting duct carcinoma
  00.238    0000045   056     thymoma, type ab, malignant
  00.263    0000089   050     ameloblastoma, malignant
  00.271    0000053   053     hypereosinophilic syndrome
  00.314    0000116   051     gastrinoma, malignant
  00.314    0000042   053     odontogenic tumor, malignant
  00.372    0000103   026     alveolar soft part sarcoma
  00.394    0000120   055     atypical medullary carcinoma
  00.405    0000335   051     medullary carcinoma with lymphoid stroma
  00.415    0000053   038     craniopharyngioma
  00.416    0000481   039     hodgkin lymph., nodular lymphocyte predom.
  00.419    0000115   030     precursor t-cell lymphoblastic lymphoma
  00.428    0000200   065     plasma cell leukemia
  00.439    0000832   031     choriocarcinoma [j.b. presumably gestational type]
  00.441    0000050   002     infantile fibrosarcoma
  00.443    0001263   062     gastrointestinal stromal sarcoma
  00.444    0003634   042     dermatofibrosarcoma nos
  00.447    0000040   040     sertoli-leydig cell tumor, poorly differentiated
  00.448    0000106   038     mesenchymal chondrosarcoma
  00.461    0001225   050     pituitary adenoma, nos
  00.461    0000042   066     prolymphocytic leukemia, t-cell type
  00.464    0000959   055     thymoma, malignant
  00.484    0000091   059     polymorphous low grade adenocarcinoma
  00.485    0019957   064     hepatocellular carcinoma nos
  00.494    0000780   053     granulosa cell tumor, malignant
  00.496    0000227   028     chondroblastic osteosarcoma
  00.501    0000779   067     mesodermal mixed tumor
  00.501    0023109   063     squamous cell carcinoma, keratinizing type nos
  00.502    0002766   072     adenocarcinoma, intestinal type
  00.504    0036429   069     multiple myeloma
  00.507    0000641   001     retinoblastoma nos
  00.510    0000482   045     sq. cell carcinoma, keratinizing, nos, in situ
  00.511    0001466   005     nephroblastoma nos
  00.514    0000140   056     pleomorphic rhabdomyosarcoma
  00.515    0000303   061     metaplastic carcinoma, nos
  00.517    0000117   024     precursor t-cell lymphoblastic leukemia
  00.520    0000337   018     alveolar rhabdomyosarcoma
  00.531    0000105   065     adenocarcinoma with neuroendocrine differen.
  00.533    0000117   051     paraganglioma, malignant
  00.541    0003377   060     mycosis fungoides
  00.544    0003088   068     mullerian mixed tumor
  00.550    0000756   014     embryonal rhabdomyosarcoma
  00.551    0007534   053     medullary carcinoma nos
  00.553    0003575   070     tumor cells, malignant
  00.553    0000617   060     renal cell carcinoma, chromophobe type
  00.553    0000299   060     squamous cell carcinoma, small cell, nonkeratinizing type
  00.554    0000514   067     intracystic carcinoma, nos
  00.558    0000118   028     parosteal osteosarcoma
  00.565    0000070   055     juvenile carcinoma of the breast
  00.570    0057571   038     carcinoma in situ nos
  00.571    0000196   049     pheochromocytoma, malignant
  00.572    0000084   047     squamous cell carcinoma in situ with questionable stromal invasion
  00.573    0000156   033     synovial sarcoma, biphasic type
  00.577    0000054   058     malignant myoepithelioma
  00.584    0000114   059     atypical meningioma
  00.585    0000041   000     retinoblastoma, differentiated type
  00.586    0000486   060     epithelioid leiomyosarcoma
  00.596    0000183   065     superficial spreading adenocarcinoma
  00.597    0001267   066     carcinoma, diffuse type
  00.598    0000906   061     meningioma, malignant
  00.602    0000191   058     stromal sarcoma, nos
  00.613    0001094   065     intraductal papillary adenocarcinoma with invasion
  00.617    0000135   063     composite carcinoid
  00.626    0000043   002     retinoblastoma, undifferentiated type
  00.627    0000118   037     giant cell tumor of bone, malignant
  00.629    0001848   033     osteosarcoma nos
  00.630    0016028   060     carcinoid tumor, malignant
  00.631    0000090   024     malignant rhabdoid tumor
  00.633    0000202   058     myeloid sarcoma
  00.634    0000403   057     adenosarcoma
  00.634    0000229   058     meningotheliomatous meningioma
  00.637    0000757   064     papillary squamous cell carcinoma
  00.637    0008089   058     squamous cell carcinoma, large cell, nonkeratinizing type
  00.638    0003873   044     squamous cell carcinoma, microinvasive
  00.639    0000069   021     ganglioglioma
  00.639    0000118   062     epithelial-myoepithelial carcinoma
  00.641    0000586   069     chronic myeloproliferative disease, nos
  00.643    0001258   052     fibrosarcoma nos
  00.654    0000884   051     lymphoepithelial carcinoma
  00.658    0014107   041     kaposi's sarcoma
  00.661    0000690   047     neurofibrosarcoma
  00.662    0000514   065     pleomorphic carcinoma
  00.667    0247826   064     squamous cell carcinoma nos
  00.673    0000082   059     thymic carcinoma, nos
  00.673    0000047   047     leydig cell tumor, malignant
  00.673    0000050   066     adenocarc. in situ in mult. adenomatous polyps
  00.674    0000179   049     mesenchymoma, malignant
  00.678    0014878   068     non-small cell carcinoma
  00.681    0000890   054     anaplastic large cell lymphoma, t-cell and null cell type
  00.682    0002723   065     meningioma nos
  00.682    0000861   042     hodgkin's disease, lymphocytic predominance
  00.683    0000114   033     fibroblastic osteosarcoma
  00.685    0000282   042     epithelioid cell sarcoma
  00.685    0000507   019     primitive neuroectodermal tumor
  00.685    0000123   040     clear cell sarcoma of tendons and aponeuroses
  00.686    0000087   037     small cell sarcoma
  00.686    0000334   063     giant cell sarcoma (except of bone m9250/3)
  00.687    0008070   059     leiomyosarcoma nos
  00.688    0003285   057     inflammatory carcinoma
  00.690    0000555   037     synovial sarcoma nos
  00.692    0000058   052     hodgkin's disease, lymphocytic depletion, diffuse fibrosis
  00.693    0001762   004     neuroblastoma nos
  00.693    0000257   048     megakaryocytic leukemia
  00.697    0000939   063     plasmacytoma, extramedullary
  00.702    0003124   059     sarcoma nos
  00.703    0004558   059     infiltr. duct mixed with other types of carcinoma, in situ
  00.706    0002264   068     carcinosarcoma nos
  00.707    0001065   064     giant cell carcinoma
  00.707    0000053   026     telangiectatic osteosarcoma
  00.707    0000084   014     langerhans cell histiocytosis, disseminated
  00.712    0002593   062     infiltr. duct mixed with other types of carcinoma
  00.727    0000049   022     precursor b-cell lymphoblastic lymphoma
  00.728    0010967   065     signet ring cell carcinoma
  00.735    0000723   064     apocrine adenocarcinoma
  00.745    0000903   055     endometrial stromal sarcoma
  00.750    0001267   060     mature t-cell lymphoma, nos
  00.751    0000085   050     acute myeloid leukemia, t(8;21)(q22;q22)
  00.755    0000869   063     plasma cell tumor, malignant
  00.755    0000194   048     endometrial stromal sarcoma, low grade
  00.761    0010254   064     carcinoma, undifferentiated type nos
  00.762    0003162   062     noninfiltrating intraductal papillary adenocarcinoma
  00.763    0000934   052     cystosarcoma phyllodes, malignant
  00.768    0000295   068     pseudosarcomatous carcinoma
  00.773    0011520   064     acinar cell carcinoma
  00.774    0000584   067     squamous cell carcinoma, spindle cell type
  00.782    0000493   062     basaloid squamous cell carcinoma
  00.785    0029592   065     large cell carcinoma nos
  00.786    0000787   066     spindle cell carcinoma
  00.786    0000277   063     combined hepatocellular carcinoma and cholangiocarcinoma
  00.791    0025451   067     mucin-producing adenocarcinoma
  00.794    0001025   060     spindle cell sarcoma
  00.796    0006891   056     comedocarcinoma nos
  00.796    0000258   062     renal cell carcinoma, sarcomatoid
  00.800    0000123   065     brenner tumor, malignant
  00.805    0016555   068     adenocarcinoma in tubulovillous adenoma
  00.808    0000051   035     prolactinoma
  00.809    0005831   067     adenocarcinoma in situ in tubulovillous adenoma
  00.811    0024527   041     squamous intraepithelial neoplasia, grade iii
  00.812    0000234   058     malignant tumor, small cell type
  00.812    0000248   052     chronic myelogenous leukemia, bcr/abl positive
  00.814    0004050   050     follicular adenocarcinoma nos
  00.824    0000856   065     adenocarcinoma with mixed subtypes
  00.824    0000060   016     choroid plexus papilloma, malignant
  00.828    0000046   053     fascial fibrosarcoma
  00.829    0000843   058     intraductal micropapillary carcinoma
  00.831    0000947   064     granular cell carcinoma
  00.833    0000079   044     papillary cystadenoma, borderline malignancy (c56.9)
  00.839    0000324   061     acute myeloid leukemia, minimal differentiation
  00.841    0000584   020     endodermal sinus tumor
  00.845    0000722   050     neurilemmoma, malignant
  00.845    0000557   067     combined small cell carcinoma
  00.846    0002217   061     paget's disease and infiltrating duct carcinoma of breast
  00.851    0000505   043     rhabdomyosarcoma nos
  00.851    0011193   063     adenosquamous carcinoma
  00.852    0000265   053     fibromyxosarcoma
  00.854    0000139   022     ependymoma, anaplastic type
  00.858    0000066   057     thymoma, type b1, malignant
  00.858    0000045   037     mediastinal large b-cell lymphoma
  00.858    0000388   037     sq. cell carcinoma, lg. cell, non-ker., in situ
  00.859    0000416   038     hodgkin lymphoma, nod. scler., grade 1
  00.861    0000517   047     follicular adenocarcinoma, well differentiated type
  00.868    0000052   065     eccrine adenocarcinoma
  00.868    0063370   038     squamous cell carcinoma in situ nos
  00.870    0037543   064     renal cell carcinoma
  00.871    0001654   066     linitis plastica
  00.873    0001021   060     liposarcoma, well differentiated type
  00.874    0000328   055     adenocarcinoid tumor
  00.874    0000382   061     mixed tumor, malignant nos
  00.877    0002754   056     intraductal and lobular in situ carcinoma
  00.877    0000141   050     follicular adenocarcinoma, trabecular type
  00.878    0000100   061     papillary squamous cell carcinoma, non-invasive
  00.880    0001242   023     teratoma, malignant nos
  00.880    0000080   071     refract. anemia with excess blasts in transformation
  00.883    0005106   064     neuroendocrine carcinoma
  00.884    0182616   072     carcinoma nos
  00.884    0003757   054     mucoepidermoid carcinoma
  00.885    0000179   051     adenocarcinoma in adenomatous polyposis coli
  00.897    0000100   019     desmoplastic medulloblastoma
  00.900    0000690   028     germinoma
  00.904    1021940   068     adenocarcinoma nos
  00.907    0056558   076     neoplasm, malignant
  00.909    0000055   020     clear cell sarcoma of kidney
  00.920    0003319   058     adenoid cystic carcinoma
  00.921    0000082   032     neuroepithelioma nos
  00.923    0010280   061     chronic myeloid leukemia
  00.927    0003078   048     hodgkin's disease nos
  00.927    0000963   035     precursor cell lymphoblastic lymphoma, nos
  00.930    0000165   067     bronchiolo-alveolar carcinoma, non-mucinous
  00.933    0000884   062     acral lentiginous melanoma, malig.
  00.934    0000308   007     ganglioneuroblastoma
  00.937    0000265   055     mucocarcinoid tumor, malignant
  00.937    0000403   066     large cell neuroendocrine carcinoma
  00.946    0000089   057     myxosarcoma
  00.958    0000820   073     refractory anemia
  00.959    0013199   064     malignant lymphoma nos
  00.963    0000315   066     malignant tumor, fusiform cell type
  00.966    0000086   045     hemangioblastoma
  00.967    0000855   057     islet cell carcinoma
  00.968    0001163   013     medulloblastoma nos
  00.968    0000191   052     myxoid chondrosarcoma
  00.970    0001140   047     acute promyelocytic leukemia
  00.972    0000096   056     myxoid leiomyosarcoma
  00.972    0000093   025     pleomorphic xanthoastrocytoma
  00.972    0000087   064     acute panmyelosis with myelofibrosis
  00.974    0002700   050     bowen's disease
  00.981    0000080   063     basal cell adenocarcinoma
  00.982    0000125   065     nodular hidradenoma, malignant
  00.989    0000114   067     sezary's disease
  00.989    0000115   051     round cell liposarcoma
  00.992    0035630   059     intraductal carcinoma, noninfiltrating nos
  00.993    0000358   051     hemangiopericytoma, malignant
  00.996    0003164   064     scirrhous adenocarcinoma
  00.997    0003337   046     glioma, malignant
  01.003    0001045   060     cutaneous t-cell lymphoma, nos
  01.027    0001609   037     burkitt lymphoma, nos
  01.028    0001015   067     lymphoid leukemia nos
  01.032    0000104   048     epithelioid hemangioendothelioma, malignant
  01.036    0000303   004     hepatoblastoma
  01.036    0000998   065     essential thrombocythemia
  01.040    0000390   066     alveolar adenocarcinoma
  01.047    0000094   035     protoplasmic astrocytoma
  01.050    0016754   069     adenocarcinoma in villous adenoma
  01.050    0001041   049     serous cystadenoma, borderline malignancy (c56.9)
  01.058    0000548   038     hodgkin lymphoma, nod. scler., cellular phase
  01.059    0001545   017     pilocytic astrocytoma (c71._) 9421/1
  01.061    0008163   067     adenocarcinoma in situ in adenomatous polyp
  01.065    0000815   066     erythroleukemia
  01.066    0001669   066     small cell carcinoma, intermediate cell
  01.067    0000248   064     giant cell and spindle cell carcinoma
  01.071    0001388   034     ependymoma nos
  01.072    0000624   066     papillary carcinoma in situ
  01.075    0004414   046     hodgkin's disease, mixed cellularity
  01.089    0001323   062     liposarcoma nos
  01.092    0000142   054     mesonephroma, malignant
  01.092    0000212   067     bronchiolo-alveolar carcinoma, mucinous
  01.093    0001167   053     myxoid liposarcoma
  01.100    0050173   066     small cell carcinoma nos
  01.103    0004446   065     carcinoma, anaplastic type nos
  01.111    0000159   052     hemangioendothelioma, malignant
  01.116    0002422   074     myelodysplastic syndrome, nos
  01.117    0333623   061     infiltrating duct carcinoma
  01.120    0000405   064     pleomorphic liposarcoma
  01.132    0005812   062     fibrous histiocytoma, malignant
  01.132    0003638   065     marginal zone b-cell lymphoma, nos
  01.134    0004672   069     adenocarcinoma in situ in villous adenoma
  01.138    0000145   057     mixed type liposarcoma
  01.144    0000084   063     psammomatous meningioma
  01.149    0000166   060     carcinoid tumor, argentaffin, malignant
  01.154    0000096   039     hepatocellular carcinoma, fibrolamellar
  01.162    0000655   074     refractory anemia with sideroblasts
  01.166    0000151   064     adenocarcinoma in mult. adenomatous polyps
  01.166    0001603   048     serous papillary cystic tumor of borderline malignancy (c56.9)
  01.169    0011933   033     hodgkin lymphoma, nodular sclerosis, nos
  01.171    0018131   061     acute myeloid leukemia
  01.173    0000287   063     dedifferentiated liposarcoma
  01.173    0009679   058     comedocarcinoma, noninfiltrating
  01.175    0027965   059     papillary adenocarcinoma nos
  01.178    0000091   045     papillary carcinoma, encapsulated
  01.184    0001275   064     polycythemia vera
  01.189    0020880   068     adenocarcinoma in adenomatous polyp
  01.189    0000827   021     precursor b-cell lymphoblastic leukemia
  01.190    0000215   046     follicular carcinoma, minimally invasive
  01.191    0000461   066     basaloid carcinoma
  01.196    0000178   037     synovial sarcoma, spindle cell type
  01.196    0000362   069     myelosclerosis with myeloid metaplasia
  01.197    0001872   069     myeloid leukemia nos
  01.198    0045778   068     mucinous adenocarcinoma
  01.200    0002839   059     cribriform carcinoma in situ
  01.206    0000516   050     papillary microcarcinoma
  01.208    0000435   036     hodgkin lymphoma, nod. scler., grade 2
  01.220    0000165   061     carcinoma in pleomorphic adenoma
  01.223    0000614   060     acute myeloid leukemia with maturation
  01.234    0001092   062     mixed cell adenocarcinoma
  01.243    0009222   023     precursor cell lymphoblastic leukemia, nos
  01.249    0020557   061     clear cell adenocarcinoma nos
  01.254    0011881   066     bronchiolo-alveolar adenocarcinoma
  01.255    0000115   044     adenocarcinoma, endocervical type
  01.257    0001270   066     ml, lymphoplasmacytic
  01.262    0000144   056     transitional meningioma
  01.263    0004480   058     ml, large b-cell, diffuse, immunoblastic, nos
  01.265    0010260   046     papillary and follicular adenocarcinoma
  01.279    0000172   050     giant cell glioblastoma
  01.287    0000057   055     chromophobe carcinoma
  01.287    0000127   066     neoplasm, uncertain whether benign or malignant
  01.288    0001435   058     oxyphilic adenocarcinoma
  01.295    0000880   065     cribriform carcinoma
  01.300    0000485   056     papillary mucinous cystadenocarcinoma
  01.302    0000149   034     peripheral neuroectodermal tumor
  01.312    0004610   069     cholangiocarcinoma
  01.312    0002006   061     acute myelomonocytic leukemia
  01.313    0000626   056     hodgkin's disease, lymphocytic depletion nos
  01.320    0015063   062     papillary serous cystadenocarcinoma
  01.326    0000241   066     angioimmunoblastic t-cell lymphoma
  01.329    0000117   054     nk/t-cell lymphoma, nasal and nasal-type
  01.336    0000758   068     villous adenocarcinoma
  01.341    0000761   051     adrenal cortical carcinoma
  01.348    0001142   063     cystadenocarcinoma nos
  01.356    0000618   066     paget's disease, mammary
  01.362    0009635   067     ml, small b lymphocytic, nos
  01.363    0003731   065     acute leukemia nos
  01.364    0001353   063     hemangiosarcoma
  01.371    0008335   047     astrocytoma nos
  01.372    0001358   064     cloacogenic carcinoma
  01.376    0010414   054     lobular carcinoma in situ
  01.391    0021695   061     infiltrating duct and lobular carcinoma
  01.397    0017866   064     oat cell carcinoma
  01.407    0002708   063     serous surface papillary carcinoma
  01.414    0000054   065     hepatocellular carcinoma, clear cell type
  01.425    0000285   046     burkitt's tumor
  01.425    0003428   056     mucinous cystadenocarcinoma nos
  01.437    0004682   062     serous cystadenocarcinoma nos
  01.445    0000301   070     prolymphocytic leukemia, nos
  01.451    0000327   067     acute myeloid leuk. with multilineage dysplasia
  01.454    0000156   069     basosquamous carcinoma
  01.455    0037088   063     ml, large b-cell, diffuse
  01.481    0000050   059     solitary fibrous tumor, malignant
  01.487    0001358   063     paget disease and intraductal ca.
  01.487    0001004   050     medullary carcinoma with amyloid stroma
  01.488    0012432   062     malignant lymphoma, non hodgkin's type
  01.491    0004319   065     ml, mixed sm. and lg. cell, diffuse
  01.497    0001673   068     verrucous carcinoma nos
  01.512    0002581   072     leukemia nos
  01.522    0001770   058     acute monocytic leukemia
  01.530    0001669   060     duct carcinoma in situ, solid type
  01.535    0000086   065     adenoid squamous cell carcinoma
  01.535    0000086   062     therapy-related myelodysplastic syndrome, nos
  01.540    0000079   058     thymoma, type b3, malignant
  01.559    0030328   070     chronic lymphoid leukemia
  01.599    0000111   047     papillary mucinous cystadenoma, borderline malignancy (c56.9)
  01.605    0000179   067     splenic marginal zone b-cell lymphoma
  01.617    0001798   074     chronic myelomonocytic leukemia, nos
  01.627    0006684   058     adenocarcinoma in situ
  01.632    0000111   055     fibrous meningioma
  01.641    0000074   061     hodgkin's disease, lymphocytic depletion, reticular type
  01.650    0000763   072     refractory anemia with excess blasts
  01.651    0022382   050     papillary carcinoma nos
  01.658    0000885   061     infiltrating ductular carcinoma
  01.666    0000118   042     burkitt cell leukemia
  01.692    0001301   059     papillary cystadenocarcinoma nos
  01.721    0050243   070     transitional cell carcinoma nos
  01.733    0002320   048     astrocytoma, anaplastic type
  01.738    0000363   064     sweat gland adenocarcinoma
  01.742    0026705   061     endometrioid carcinoma
  01.777    0000500   048     oligodendroglioma, anaplastic type
  01.796    0034336   064     lobular carcinoma nos
  01.818    0000295   061     glioblastoma with sarcomatous component
  01.829    0004897   068     mesothelioma, malignant
  01.846    0000302   052     esthesioneuroblastoma
  01.863    0001932   051     chondrosarcoma nos
  01.868    0000042   061     adenocarcinoma with apocrine metaplasia
  01.905    0000472   063     infiltrating lobular mixed with other types of carc.
  01.924    0000448   059     acute myeloid leukemia without maturation
  01.931    0001012   040     mixed glioma
  01.935    0000159   049     nonencapsulated sclerosing carcinoma
  01.986    0000128   053     spermatocytic seminoma
  01.987    0001486   049     mucinous cystic tumor of borderline malignancy (c56.9)
  01.994    0002890   066     mantle cell lymphoma
  02.002    0000764   042     fibrillary astrocytoma
  02.020    0000184   067     trabecular adenocarcinoma
  02.020    0000046   069     eccrine poroma, malignant
  02.034    0003345   063     malignant lymphoma, nodular nos
  02.046    0000966   031     dysgerminoma
  02.053    0000070   040     myxopapillary ependymoma
  02.057    0006337   062     tubular adenocarcinoma
  02.088    0000978   052     neurilemmoma nos
  02.090    0000219   038     astroblastoma
  02.119    0003414   062     malignant lymphoma, follicular center cell, noncleaved, follicular
  02.171    0019907   061     glioblastoma nos
  02.222    0000239   063     solid carcinoma nos
  02.229    0001585   071     waldenstrom macroglobulinemia
  02.235    0000940   069     transitional cell carcinoma in situ
  02.242    0000122   071     transitional cell carcinoma, spindle cell type
  02.289    0000077   062     atypical carcinoid tumor
  02.333    0002297   041     oligodendroglioma nos
  02.363    0000263   071     refractory cytopenia with multilineage dysplasia
  02.491    0000089   074     myelodysplastic syndr. with 5q deletion syndrome
  02.525    0000054   068     carcinoma simplex
  02.554    0000461   049     gemistocytic astrocytoma
  02.558    0000083   064     small cell carcinoma, fusiform cell type
  02.592    0000086   056     papillary carcinoma, columnar cell
  02.657    0000408   072     klatskin tumor
  02.666    0000147   056     primary cutan. cd30+ t-cell lymphoprolif. disorder
  02.726    0096537   068     papillary transitional cell carcinoma
  02.790    0005812   062     malignant lymphoma, mixed lymphocytic-histiocytic, nodular
  02.891    0000614   055     chordoma
  02.932    0008938   060     malignant lymphoma, follicular center cell, cleaved, follicular
  02.962    0000476   069     fibrous mesothelioma, malignant
  03.094    0002388   030     mixed germ cell tumor
  03.131    0000786   072     basal cell carcinoma nos
  03.245    0002438   069     papillary trans. cell carcinoma, non-invasive
  03.257    0000135   054     malignant mastocytosis
  03.282    0000494   062     skin appendage carcinoma
  03.491    0000261   072     mesothelioma, biphasic type, malignant
  03.570    0000796   072     sebaceous adenocarcinoma
  03.587    0001263   067     epithelioid mesothelioma, malignant
  03.602    0000113   060     queyrat's erythroplasia
  03.602    0000116   065     sclerosing sweat duct carcinoma
  03.694    0002147   058     hairy cell leukemia
  03.695    0004086   059     adenocarcinoma with squamous metaplasia
  03.737    0000042   056     gliomatosis cerebri
  04.040    0000041   064     lymphangiosarcoma
  04.040    0000095   028     germ cell tumor, nonseminomatous
  04.337    0009849   037     seminoma nos
  04.747    0000049   070     osteosarcoma in paget's disease of bone
  04.955    0001717   028     teratocarcinoma
  05.050    0000057   048     ac. myelomonocytic leuk. w abn. mar. eosinophils
  05.529    0003184   030     embryonal carcinoma nos
  05.611    0001088   019     ewing's sarcoma
  05.824    0000556   028     choriocarcinoma combined with teratoma
  06.019    0000627   038     seminoma, anaplastic type
  06.565    0000401   060     epithelioid cell melanoma
  08.055    0000811   071     paget's disease, extramammary (except paget's disease of bone)
  08.206    0001716   074     merkel cell carcinoma
  10.024    0000412   056     malignant melanoma, regressing
  10.706    0000110   052     precancerous melanosis nos
  10.891    0000661   061     mixed epithel. & spindle cell melanoma
  12.623    0048315   056     malignant melanoma nos
  18.888    0000764   060     amelanotic melanoma
  19.367    0000783   066     desmoplastic melanoma, malignant
  21.009    0001281   062     spindle cell melanoma nos
  24.492    0008688   060     nodular melanoma
  29.788    0007272   069     malignant melanoma in hutchinson's melanotic freckle
  31.608    0021999   057     melanoma in situ
  38.463    0040929   052     superficial spreading melanoma
  38.730    0018652   068     hutchinson's melanotic freckle
  47.472    0000477   058     spindle cell melanoma, type b
  62.046    0004547   051     superficial spreading melanoma, in situ
As a rough guide to interpreting the data, neoplasms found in the top third of the list occur disproportionately often in the African-American population. Neoplasms found in the bottom third of the list occur disproportionately often in the white population. Neoplasms in the middle third occur in similar proportions in both sets.

The left hand column is the ratio of occurrence of the tumors in the two populations. Each ratio was calculated as the fraction of cases of the tumor in the white population divided by the fraction of the cases of the tumor in the black population. If the tumor accounted for the same fraction of total cancer cases in the white and black populations, it would have a ratio of 1. It is important to note that we cannot simply find the ratio of the the tumor's occurrence in the white and black populations (because there are many more whites than blacks).

You might be wondering why this question (i.e., tumors in blacks vs tumors in whites) has any clinical importance or biological relevance. It's a good question, and it doesn't have an answer that will satisfy everyone. First, the information helps us avoid diagnostic errors. For example, a pathologist should be wary about making a diagnosis of Ewing's tumor or of superficial spreading melanoma in an African American (in whom these tumors seldom occur). Second, the list generates new hypotheses. We notice that germ cell tumors (including seminomas, teratomas and embryonal carcinomas) occur much for frequently in whites than in the black population. Why is this? Is there some gene that contributes to these tumors, that occurs more frequenly in the white population? We cannot ask such questions if we do not have the kinds of observations included in this list. Third, the list can be generated with the Public Use data sets, which contain race/ethnicity data in case records. We do what we can.

Let's take a look at the first 29 entries from the list. These are the neoplasms that occur more frequently in the U.S. African-American population than the U.S. white population, by a ration of at least 2:1
  
00.113   pigmented dermatofibrosarcoma protuberans
00.190   adult t-cell leukemia/lymphoma (htlv-1 pos.)
00.225   granular cell tumor, malignant
00.227   collecting duct carcinoma
00.238   thymoma, type ab, malignant
00.263   ameloblastoma, malignant
00.271   hypereosinophilic syndrome
00.314   gastrinoma, malignant
00.314   odontogenic tumor, malignant
00.372   alveolar soft part sarcoma
00.394   atypical medullary carcinoma
00.405   medullary carcinoma with lymphoid stroma
00.415   craniopharyngioma
00.416   hodgkin lymph., nodular lymphocyte predom.
00.419   precursor t-cell lymphoblastic lymphoma
00.428   plasma cell leukemia
00.439   choriocarcinoma
00.441   infantile fibrosarcoma
00.443   gastrointestinal stromal sarcoma
00.444   dermatofibrosarcoma nos
00.447   sertoli-leydig cell tumor, poorly differentiated
00.448   mesenchymal chondrosarcoma
00.461   pituitary adenoma, nos
00.461   prolymphocytic leukemia, t-cell type
00.464   thymoma, malignant
00.484   polymorphous low grade adenocarcinoma
00.485   hepatocellular carcinoma nos
00.494   granulosa cell tumor, malignant
00.496   chondroblastic osteosarcoma
With the exception of pigmented dermatofibrosarcoma protuberans (also known as Bednar tumor), there is nothing about these lesions that would lead one to expect them to occur disproportionately in the black population. Most of these tumors have obscure etiologies (hepatocellular carcinoma and adult t-cell leukemia/lymphoma are exceptions). But there you have it.

When we look at the other extreme of the list (tumors that occur more frequently in the white population) we see some surprises and some expected results.
04.040   germ cell tumor, nonseminomatous
04.337   seminoma nos
04.747   osteosarcoma in paget's disease of bone
04.955   teratocarcinoma
05.050   ac. myelomonocytic leuk. w abn. mar. eosinophils
05.529   embryonal carcinoma nos
05.611   ewing's sarcoma
05.824   choriocarcinoma combined with teratoma
06.019   seminoma, anaplastic type
06.565   epithelioid cell melanoma
08.055   paget's disease, extramammary (except paget's disease of bone)
08.206   merkel cell carcinoma
10.024   malignant melanoma, regressing
10.706   precancerous melanosis nos
10.891   mixed epithel. & spindle cell melanoma
12.623   malignant melanoma nos
18.888   amelanotic melanoma
19.367   desmoplastic melanoma, malignant
21.009   spindle cell melanoma nos
24.492   nodular melanoma
29.788   malignant melanoma in hutchinson's melanotic freckle
31.608   melanoma in situ
38.463   superficial spreading melanoma
38.730   hutchinson's melanotic freckle
47.472   spindle cell melanoma, type b
62.046   superficial spreading melanoma, in situ
The melanomas fall in the end of the list (i.e., have the greatest frequency in the white population). There is one important exception:
00.933  acral lentiginous melanoma, malig.
Acral lentiginous melanoma occurs under fingernails, on fingers and toes, and these are unpigmented (or hypopigmented) locations. These tumors occur just about equally in the black and white populations (ratio of 0.933), just as we would expect

There is some internal consistency in the list. Where a tumor appears, it is often closely followed by a variant of the same tumor. This indicates that regardless of any selection biases that might have crept into the population data, closely related tumors seem to aggregate on the list. Here are some examples.

Multiple myeloma and its coding variant both occur disproportionately in the black population and fall in at about the same place on the list:
00.428   plasma cell leukemia
00.504   multiple myeloma
The germ cell carcinomas are all much more commonly found in the white population and are all near-neighbors on the list:
01.986   spermatocytic seminoma
02.046   dysgerminoma
03.094   mixed germ cell tumor
04.955   teratocarcinoma
05.529   embryonal carcinoma nos
05.824   choriocarcinoma combined with teratoma
06.019   seminoma, anaplastic type
Transitional cell tumors are more likely to occur in the white population and are near-neighbors on the list:
01.721   transitional cell carcinoma nos
02.235   transitional cell carcinoma in situ
02.242   transitional cell carcinoma, spindle cell type
The SEER site allows users to make data queries directly. If you would like to search the SEER data with the SEER search engine, the web address is:

http://seer.cancer.gov/canques/index.html

At the SEER site, users cannot make global queries (queries that compare every tumor in the database against every other tumor in the database, by every tumor type, and all at once). Global queries are what data mining is all about, and my web sites are focused on empowering people with the tools to do their own data mining. For this, you need the actual data sets used by the SEER programmers, and you need to be able to do a little bit of your own programming.

For Perl and Ruby programmers, methods and scripts for using SEER and other publicly available biomedical databases, are described in detail in my prior books:

Perl Programming for Medicine and Biology

Ruby Programming for Medicine and Biology

More information on cancer is available in my recently published book, Neoplasms.

I also maintain a blog where I regularly write about data organization, data annotation, data retrieval, and data mining. http://julesberman.blogspot.com

© 2008 Jules Berman

key words: neoplasms, cancer, neoplasia, precancer, tumor, tumour, tumors, tumours, neoplasm, carcinogenesis, carcinogens, tumor genetics

As specified in the Limited-Use Data Agreement, the citation for the SEER data is as follows:
   Surveillance, Epidemiology, and End Results (SEER) Program
   (www.seer.cancer.gov) Limited-Use Data (1973-2005), National Cancer
   Institute, DCCPS, Surveillance Research Program, Cancer Statistics
   Branch, released April 2008, based on the November 2007 submission.
As with all of my scripts, lists, web sites, and blog entries, the following disclaimer applies. This material is provided by its creator, Jules J. Berman, "as is", without warranty of any kind, expressed or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. in no event shall the author or copyright holder be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the material or the use or other dealings in the material.

Last modified: November 27, 2008