Computer Science and the Evolution of Genetic Information
Nancy A. Forbes and Laura F. Landweber
When naturalist D’Arcy Thompson described the inner world of the bacterium in his 1917 book, On Growth and Form, he wrote, "…we have come to the edge of a world of which we have no experience, and where all our preconceptions must be recast."1 The sense of wonder Thompson felt as he observed this heretofore unseen subcellular world—and his premonition that it might not fit well into existing biological paradigms—could still apply today when molecular biologists and computer scientists come together to find a common language to explore new questions in biocomputing. Such was the case at a recent meeting at Princeton University, where molecular evolutionary biologists, computer scientists, physicists, and others assembled to discuss various topics, in particular, the evolution of the genome. According to the organizers of "Evolution as Computation", the meeting’s goal was "…to construct a quantitative view of the computations that take place in cells and the combinatorial processes that drive evolution at the molecular level"—in other words, to use the tools and concepts of information science to understand how the complex networks such as gene regulation and genome rearrangement, evolve over time, and how they can be harnessed to test evolution in the laboratory and to build new tools in areas such as cellular engineering. The event was sponsored by the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS), the Alfred P. Sloan Foundation, and the Santa Fe Institute.
Biocomputing has been described as a hybrid field combining information science and biology, where researchers attempt to build computational models of real biological systems, and to use biological systems or processes as metaphors for designing new computational systems. The field itself is not exactly a new one. Areas such as artificial neural networks, genetic algorithms, evolutionary programming, cellular automata, DNA computing, and computational molecular biology could all be considered biocomputing, and some, such as neural nets, have been around since the 1940s. However, in the last few years, new hybrid subjects have emerged, and an important one attempts to view genome evolution through the lens of computer science.
Many in this field believe the time is ripe for these developments, in particular because discoveries over the past 20 years have given us a better understanding of fundamental biological mechanisms at the molecular level. For example, we’ve learned about the structure and function of genes, and how they can be modified by processes such as translocation (when a segment of DNA moves from one part of a DNA strand to another), or by inserting other elements such as retroviruses, by splicing, or by various other mutations. DNA sequencing has told us about the historical evolution of specific proteins and other cellular components, and has introduced a plethora of possible mechanisms for how genes evolve—for example, by chimerism (fusing unrelated parts) or by the exchange of pieces of DNA among species or individuals. What’s more, researchers are now able to see in the laboratory how bacterial and phage populations evolve over time. This valuable store of experience has provided insights that have been useful to computer scientists—for example in the construction of "evolutionary algorithms" or "evolutionary programs." (Generally speaking, evolutionary search algorithms, like genetic algorithms, contain a population of structures that evolve according to "biological" principles of selection, recombination, and mutation, as key elements in their design and implementation). In addition, recent work also sheds light on certain "computational" properties of cells.
Evolution and genetics as information science
James Shapiro, a bacterial geneticist from the University of Chicago and DIMACS conference speaker, believes that not only does information science lend itself well to the study of genetic evolution, but that this approach may be the logical one given what we’ve learned from modern molecular genetics. Says Shapiro, "Modern molecular genetics has revealed phenomena completely unanticipated by classical genetics, which was based largely on Mendelian principles, and has provided insights into questions of genome organization, expression and reorganization that could be described as computation-like. Before, the genome was thought of as a collection of independent units subject to individual evolution by random variation. Nowadays, due to discoveries in molecular genetics, as well as molecular cell and developmental biology, the genome is seen as an interactive, hierarchically organized system of systems—much like the software needed to run a computer—containing multiple codes for protein synthesis and other functions." The work of geneticist Barbara McClintock and her "molecular" followers revealed the presence of dynamic cellular machinery mediating DNA rearrangements, and this helped shape Shapiro’s view that the structure of the genome evolves over time via "…concerted, nonrandom changes in the genome guided by cellular computing networks." In other words, genetic mechanisms and genomic evolution might represent a form of ultrasophisticated cellular information processing whose underlying algorithms are still not understood.
The code of codes
The genetic code is the fundamental basis for all biological codes, and its origin still remains an enigma after roughly 35 years of research. While existing evidence suggests that the genetic code was influenced by physico-chemical interactions between individual amino acids and strings of nucleic acids,2,3 researchers have yet to piece together the stepwise mechanisms by which it evolved over time. Two other groups of DIMACS presenters, Stephen Freeland of Cambridge University, and David Ardell and Guy Sella of Stanford University, have convincingly demonstrated that the code’s present structure was also shaped by natural selection. In this process, the codons—the triplets of nucleotides that map a particular nucleic acid sequence into proteins—are arranged to minimize the negative effects of genetic error, and to optimize the process of "readout" of genes during protein synthesis. By permuting all 20 amino acids across all possible codon sets,4,5 both groups found that the "universal" genetic code—the one found in nearly every organism on earth, whether bacterial or human—falls in the best 0.0001% of all possible codes, and perhaps even better, in terms of its capacity to be an error-correcting code.
Freeland argues that, "…this evidence cannot be explained away as a mere byproduct of biosynthesis to account for how the code evolved," although there are also historical factors, such as the order in which amino acids were incorporated into the code, that may also be at play.6 As a result of all these factors, the natural genetic code usually assigns related amino acids to codons that can be connected to one another by single-point mutations.
For the genetic code to have acquired this capacity for error-correction, it was assumed to be the result of natural selection acting on individuals with different codes over several generations. This is because biologists traditionally have regarded the influence of mutation on the fitness of a genetic code as affecting the transmission of information from parent to offspring. This presents a problem, however, because codes would tend to resist change in order to avoid disrupting the products of essential genes in the cell. Up to now, research has only examined the fitness of genetic codes when mutation affects a single genetic site, namely a codon. However, codon usage in an organism is shaped by a balance of selection, mutation, and genetic drift. Ardell and Sella show how, in a population of individuals with large genomes, this mutation-selection balance can act through codon usage in genes. The net result is that individuals with modified genetic codes will compete in a single generation (Figure 1). This dramatically increases the power of natural selection to optimize the genetic code in its early, formative years, steering its evolution towards its present canonical form.
Figure 1. A graphical representation of two abstract genetic codes each encoding five amino acids (circles a-e). Codons (boxes 1-5) that are neighbors on the ring are likely to mutate to one another. The code on the left is more error-correcting than the one on the right, because similar amino acids, indicated by similar color, are closer to each other on the ring. Assuming equal usage of these amino acids and reasonably large genomes, an individual with the code on the left has an expected 5% fitness advantage over an individual with the code on the right. (Courtesy David Ardell, Stanford University.)
Tackling wholesale genome evolution
While genetic evolution as seen from an information science paradigm was the main topic of the meeting, other subject areas of biological computation, equally as innovative and offering new directions for future research, were also the focus. Biologist Drew Endy of the Molecular Sciences Institute (Berkeley, California) together with microbiologist Ian Molineux of University of Texas at Austin and chemical engineers Lingchong You and John Yin of the University of Wisconsin, Madison, have carried out computer simulations and experiments on a model biological system, the bacteriophage T7, to test the how various permutations in gene and segment order affect the robustness of the wild-type phage genome. Specifically, by permuting different genome arrangements, and measuring their effect on two characteristic properties of the organism—the rate of virus production and distribution of its protein products—they ask the question, in the language of information science, "…what range of system outputs can be created via adjustments at the level of information processing?" Their theoretical results, based on both experiment and computer model, suggest that the T7 phage genome is in at least the best 2% of any permutation of its 122 "genetic elements" (defined as protein-coding genes, RNA processing sites, noncoding regions, and so on). As a result, they can now explore which permutations might lead to genome improvement. By continued careful coupling of computer modeling with experiment, the approach Drew Endy, Deyu Kong, and John Yin have taken could actually form the basis of "a predictive, system-level biology, grounded at the genetic level."7 Together with earlier DIMACS presentations on the genetic code, these examples suggest that evolution may operate generally to optimize a range of objective functions that affect genome fitness by finding solutions to specif
Figure 2.In hypotrichous ciliates such as Stylonychia or Oxytricha, gene unscrambling executes a set of computational instructions to sort jumbled protein-coding regions and to eliminate interspersed non-coding DNA during the assembly of functional genes in the somatic macronucleus. Courtesy Laura Landweber, Princeton University.)
Presenters Ron Weiss, George Homsey, and Tom Knight, all computer scientists from the MIT Artificial Intelligence Lab, are experimenting with biochemical/mechanical mechanisms as a basis for building actual in vivo digital circuits. The group has shown how to represent the different levels of a digital signal—which in conventional computers is composed of electrical currents—as concentrations of DNA-binding proteins that, acting as promoters or repressors, control the rate of production of other DNA-binding proteins. For example, they could produce an inverter (a logic gate that carries out the negation function, such as in NAND or NOR gates) whose input is a given protein A and whose output is a second protein B, by synthesizing a particular gene sequence which tells protein A to repress the production of protein B.
Taken as a whole, research efforts presented at the most recent DIMACS meeting on biocomputing and genomic evolution offer fertile ground for the development of new theoretical insights and future directions in multiple fields, while also posing new ways of viewing old or existing problems. Not only do biological and computational sciences stand to benefit, but continued work in this area may even provide the hands-on engineer with new tools and venues for technological innovation inspired by nature—reminding us, as Shakespeare once mused, that "There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy." (Hamlet, Act I, Scene V).
1. D.W. Thompson, On Growth and Form, Cambridge Univ. Press, Cambridge, UK, 1961, p. 48.
2. G. Vogel, "Tracking the History of the Genetic Code," Science, Vol. 281, No. 329, 17 July 1998; http://www.sciencemag.org/cgi/content/full/281/5375/329.
6. S. Freeland and L.D. Hurst, Proc. Royal Society of London, Vol. B 265, 1998, p. 2111. //Author: What is the title of this article?
9. E. Pennisi, "How the Genome Readies Itself for Evolutionhttp," Science, Vol. 281, 21 Aug 1998, p. 1131; http://www.sciencemag.org/cgi/content/full/281/5380/1131.
Bibliography on DNA, molecular computation and splicing systems:
Site related to microbial engineering:
Site on amorphous computing:
The Molecular Sciences Institute at Berkeley:
Laura F. Landweber’s homepage:
Erik Winfree’s DNA computing page:
Nancy A. Forbesis a physicist and senior member of the technical staff at Litton TASC in Chantilly, Virginia, where she supports the research efforts of the military satellite agency, the National Reconnaissance Office. At the time of writing this article, she worked for Schafer Corp. in Arlington, Virginia. Her work involves the assessment of emerging technologies of interest to the military, including the area of non-semiconductor-based computing (biocomputing, quantum computing, and so forth). She has an MA in physics from Columbia University. She is a contributing editor at AIP’s The Industrial Physicist, and is also a member of the American Physical Society and on the Board of Air Force Science and Technology of the National Academy of Sciences. Contact her at Litton TASC, Chantilly VA 20151; firstname.lastname@example.org
Laura Landweberis a professor of biology in the Department of Ecology and Evolutionary Biology at Princeton University. Her main research interest is the evolution of biological information processing, or complex molecular systems, both in test-tube experiments in the laboratory and in organisms as far ranging as ciliates or trypanosomes. She received her AB from Princeton in molecular biology and her PhD in biology from Harvard’s Department of Cellular and Developmental Biology. She is a fellow-at-large of the Santa Fe Institute and has received Burroughs-Wellcome Fund and Sigma Xi New Investigator Awards for her research which spans the interplay between molecular biology, computer science, chemistry, and evolution.Contact her at Guyot Hall, Rm. 323, Princeton Univ., Princeton, NJ 08544-1003; email@example.com; http://www.princeton.edu/~lfl/.