Life is information. Information is processed both during an organism life-time and across generations by natural selection. In fact, life is probably the most complex information processing system ever created. As Robert Robbins put it, the DNA sequence (i.e code) of an organism is “the result of literally millions of maintenance revisions performed by the worst possible set of kludge–using, spaghetti–coding, opportunistic hackers (i.e. evolution) who delight in clever tricks like writing self–modifying code and relying upon undocumented system quirks”. As a result, deciphering molecular biology is the ultimate dream of a computer scientist.
One of the hardest problems of current genomics lies in the identification and modeling of the promoter regions and transcription factor binding sites that regulate gene expression. Life, it turns out, discovered RISC systems way before computer scientists did and prefers to operate with small but versatile instruction repertoire (i.e. genes) finely tuned by complex cross-regulatory interactions. Rather than being junk-DNA, as they were first labeled, it turns out that the main differences between organisms reside not on their genetic repertoire, but on the intergenic regions that regulate gene expression.
Even though probably good for life, the complexity of intergenic regions is exciting but dire news for those trying to understand them. The large alphabet of possible transcription factors, their loose nature and their seemingly haphazard organization make modeling of these regions an extremely hard task for which conventional methods, relying heavily on statistics and rigid heuristics, are ill suited. The approach I champion lies in the use of nature-inspired tools to tackle the problem of promoter modeling and identification, using few a priori assumptions and a flexible frame to accommodate the diversity of natural promoters. By applying a soft-computing approach that combines neural networks, fuzzy logic, genetic algorithms and swarm theory coupled with traditional information theory methods, I aim at making some sense of the apparent complexity of promoters and at testing model predictions in-vivo to validate the strength of the model.
Robbins, R. J., 1992. Challenges in the human genome
project. IEEE Engineering in Biology and Medicine, (March 1992):25–34.