Our work with proteins grew out of studies of genetic and physiological adaptation of copepods to temperature stress. First we examined expression of the hsp70 heat shock family of proteins in zooplankton and shellfish (1-4). Later, especially when imaging software became available we digressed into molecular toxicology of a variety of species including zooplankton, soil arthropods, zebrafish, medaka, bacteria and shellfish (5, 7-9). Currently we are using neural network models to analyze patterns (10), following some success with them earlier (6). The basic goal is to isolate the proteins whose expression defines any condition of interest. Pattern differences are learned in one set of data and tested on another to determine if there are generally useful definitive protein indicators.
The raw data for Artificial Neural Network (ANN) models, or for statistical analyses may be from 1D,2D,3D electrophoretic gels or spectral output from mass spectrometry, HPLC or capillary electrophoresis. The first step is to convert the protein data to digits for quantitative analysis. Gel images are aligned statistically or using the imaging software (PDQuest, the Melanie series, Z3 are examples). Next a grid is imposed, with mesh adjusted so that most spots are not occupying more than one square (Due to redundancy not all spots are needed for accurate classification of treatments). The dots on the image are then converted to digits. Spectra are aligned and converted similarly to binary or continuous data. One dimensional gel images are coded using a series of boxes around the bands.
The data now consist of a vector for each protein sample equal in length to the elements in the 1 or 2D arrays, or to the number of vertical segments used to code the spectral peaks.
The success of an ANN model is measured by the proportion of correct treatment classifications made in a blind (test) set. Models can be improved by repeated samplings or a ‘round robin’ experiment. The results are validated by multivariable statistics such as discriminant analysis, cluster and permutation analyses
Summary of Results
- Among 22 strain/uvb treatments of medakafish, correct assignments were made in 63 of 66 trials. Three separate sets of training and test samples were set up in a round robin arrangement. When uvb levels were included in the error the result for strain classification was equally successful (62 of 66 correct) and when strains were ignored the uvb levels were classified 62 of 66 times. The probability of this by chance is zero (10).
- When 48 tomato plants were subjected to 4 levels of salinity over three time periods, the individual plants were assigned correctly to treatment in 45 of 48 cases. This was based solely on their protein profiles and unlike the medakafish the error included significant genetic variance (10).
- Michael O’Neill, of this department, who had a crucial role in the analyses just described, has shown in three other experiments how useful ANN can be in isolating markers. Sixty genes of 4026 available in a microarray assay gave near 100% accuracy in diagnosis and prognosis of lymphoma (12) In a study of protein profiles from prostate cancer patients he found 93% accuracy in prognosis and diagnosis (O’Neill, unpublished data) And he and colleagues found similar results with leukemia data(11).
- Cluster analysis of the medaka data mentioned above showed perfect clustering of treatments. Pairwise permutation tests were significant in every (statistically independent) comparison.
- In collaboration with private consultants Steven Bartell and Lee Shugart we hope to build risk analysis into a dynamic version of the analysis described here (US Patent 2003). We have formed a Protein Indicators Of Risk (PRIOR) group for this purpose.
- Pathway analysis seems to fit with ANN with several software packages on the market. Thence potential drug targets could be studied.
- In collaboration with Alynne McClean of Science with a Mission in Boston, we are investigating cheap portable assays based on the proteome diagnostics, for use in developing countries. Thus health care would be more widely available as a result of early and accurate screening and triage on site by volunteers and staff. There is a growing awareness of the acute health crises in the Third world by government agencies, political leaders, NGOs and influential private citizens.
- Here at home there is a huge need also for better care of returning vets and accident victims, again in the area of accurate diagnosis of hidden injury (particularly head injury). Triage on site is notoriously erroneous with over 50% positive and negative errors in assignments to trauma centers, hospitals and family physicians. These result in huge unnecessary testing costs on the one hand and tragic human costs on the other. The State of Maryland emergency system (MIEMSS) has a well developed trauma decision making protocol which could well incorporate simple upstream screening tests based on protein expression.
- Finally in general the current methods of diagnosis are both primitive and far too expensive. The techniques for accurate diagnosis and prognosis are available and if developed would greatly reduce not only health costs but the suffering attendant on decisions based on poor data.
1. Bond, J.A., C.M. Gonzalez and B.P. Bradley, (1993) Age-dependent expression of heat shock protein in D. magna. Comp. Phy. & Bioc. 106: 93-98.
2. Bradley, B.P. (1993) Measuring phenotypic and genetic variation, demonstrating adaptation. 27th European Marine Biology Symposium, Dublin, Ireland. Sept. 7-11, 1992. JAPAGA, Dublin pp. 3-14.
3. Bradley, B.P., M.A. Lane and C.M. Gonzalez, (1992). A molecular mechanism of adaptation in an estuarine copepod. Neth. J. Sea Research, 30:1-6.
4. Brown, D.C. and B.P. Bradley, (1995). Genetic and physiological regulation of HSP70. Marine Env. Res. 39: 181-184.
5. Bradley, B.P., J.-A. Bond, C.M. Gonzalez and B.E. Tepper, (1994) Complex mixture analysis using protein expression as a qualitative and quantitative tool. Env. Tox. and Chem. 13:1043-1050.
6. Bradley, B.P., Brown, D.C., Iamonte, T.N., Boyd, S.M. and O'Neill, M.C., (1996). Protein Patterns and Toxicity Identification, in "Biomarkers and Risk Assessment" ASTM STP 1306, David A. Bengtson and Diane S. Henshel, Eds., American Society for Testing and Materials, Philadelphia.
7. Bradley, B.P., E.A. Shrader, D.G. Kimmel and J.C.Meiller. (2002). Protein Expression Signatures: an application of proteomics. Marine Environmental Res. 54: 373-377.
8. Kimmel, D.G. and B.P. Bradley. 2001. Temperature and salinity stress in Eurytemora affinis: Defining ecological limits using protein expression. J. Exp. Mar. Biol. Ecol. 266:135-149
9. Shrader, EA, Henry, T.R., Greeley, M.S., and Bradley, B.P. (2003) Proteomics in zebrafish exposed to endocrine disrupting chemicals. Ecotoxicology 12:485-488.
10. Bradley, B.P., B. Kalampanyl and M.C. O'Neill (2008) Protein Expression Profiling in Methods in Molecular Biology (Eds. D. Sheahen and R. Ryther) Humana Press (To appear fall 2008).
11. Choi,YL, Tsukasaki,K.,O’Neill MC et al (2007) A genomic analysis of adult T cell Leukemia. Oncogene 26: 1245-1255
12. O’Neill, M. and Song, L. (2003) Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinformatics 4:13-25.
U.S. Patents 5149634 (Bioassay for Environmental Quality) (1992), 5250413 (Sublethal Bioassay for Environmental Quality) (1993) and 6653135 (Dynamic Protein Signature Assay) (2003)