On predicting evolution

It sucked to be a dog in the late 1970s. It wasn’t just that your owners wanted you to wear cute, doggy-tailored wide-bottomed jeans, it was the fact that your historical enemies, the cats, had unleashed a devastating wave of biological warfare on you. Somewhere in Europe, the evil cats took a virus that was rampant in their own population, feline leukopenia virus, and mutated it to first infect wild animal populations (the sly minks are still suspect) and from there … to you. The wave of parvovirus infection spread around the world within years, quite literally decimating the worldwide dog population. Brave dog scientists helped humans create parvovirus vaccines to stem the tide of infection, and almost every puppy born today is vaccinated as a matter of course.

Of course, despite their feigned innocence it is unlikely that cats were actually able to carry out sophisticated molecular biology techniques. No, we were observing evolution in action, with natural mutations gaining traction in new animal populations. Viruses are mutating all the time, constantly testing the defenses of new prey, a process that Nathan Wolfe has nicely called “viral chatter.” This is the same process that has led to fear of ‘bird flu’ transiting into the human population in a form that will allow facile human-to-human transmission.

There are many levels of mutation that may be required for the zoonotic transfer of viruses, but it’s easy to focus on the first and most important one: entry of a foreign virus into a cell. Viruses normally enter host cells by binding to and exploiting one or more cell surface receptors. The process of binding is mediated by lock-and-key like interactions between the amino acids on the surface of viral proteins and the amino acids on the cell surface receptors. In the case of parvovirus, a relatively few amino acids had to change on the feline leukopenia VP2 protein to allow the virus to enter mink cells, and then again a few more mutations led to entry into dog cells.

When viewed from this perspective, feline bioterrorism is reduced to simple lockpicking. Which current amino acids must change to which new amino acids in order to gain entry into the new cell type? And if you could predict these new interactions, you could predict the evolution of viruses … or even guide their evolution for your own ends.

After mulling it over, there are really two good ways to predict / engineer viral evolution. First, you can look at the historical record, and second, you can do computer modeling. The historical record is of course the vast amount of sequence data available on virtually everything at this point, including viruses and their hosts. It is reasonable to assume that the predator:prey interactions between viruses and not-viruses have been going on for millions of years, and that many of the assaults have been recorded in the genes of both sides. This makes sense to me especially for viruses, which mutate on a fantastic time scale. I think one of the anecdotal stories that really brought this home to me was when I learned that in a given individual the virus would completely evolve around a single drug that was the result of billions of dollars of investment within just weeks. This is why we now give a drug cocktail (HAART) to treat HIV-1; it increases the number of mutations required for evasion to the point where the disease is managed over a person’s lifetime, rather than over weeks.

It was a series of conversations with my colleague Professor Sara Sawyer that has convinced me that we can also look back at viral-wrought changes in the host. Sara examines the positive selection of genes in mammals. She looks for positions in genes that are mutating much faster over time than might be expected (although keep in mind that the timescales are now on the order of thousands of years, rather than a month). She has found a number of such mutations that are best accounted for by assuming that they are the result of our slogging attempts to fight off the wave of viruses that continually confront us. For example, Sara noticed that one of the common cell surface receptors was undergoing accelerated evolution on its periphery. Examination of a crystal structure of the receptor with a viral protein showed that the mutations accumulating in primates were in the same regions that were touched by the viral protein. The genomes of the slow-growing mammals had apparently retained a record of fighting off the fast-moving viruses. But it is more than a record, it is a map, a map of what amino acid residues on the virus can be mutated to interact with what amino acid residues on the cell. It is a potential recipe for zoonosis or bioterrorism, depending on your point of view.

Now, it is amazing that Sara has discovered this faint signal in the sea of sequence data that has washed over our community. But even without it, it probably would have been possible (although much harder) to discover the lockpicking map. The fact is that in concert with the insane amount of sequence data that is accumulating, we have an equally daunting amount of structural data. It is now rare to find a protein whose basal structure cannot be predicted based on other, structurally similar proteins. And if you really want to know the structure of a given protein, the methods for obtaining it have been streamlined to the point where there are factories that literally produce hundreds of structures a month (it used to take an entire graduate student lifetime and then some to get just one structure).

In parallel with the acquisition of structure have come tools for understanding the energetics of that structure. Why does a protein fold this way and not that way? What keeps all the squiggly lines together rather than flying apart? These energetic methods are far from perfect, but again are much better than they used to be. Extraordinary practitioners of the art of computational protein design, like David Baker, Homme Hellinga, and Steve Mayo, have advanced to the point where they can let computers sift through a huge number of protein sequence variants, assessing the energetics of each and predicting new structures.

It is now pretty much like one of those scenes in a movie where a hacker (the person with glasses) clips some small device to an ATM or door or computer interface, and a bunch of numbers run across the face of the device, and then, voila, the electronic lock is picked. The device presumably just quickly ran through all the combinations and found the right one (and the idiot software designer supposedly did not see this coming and install protection against multiple incorrect inputs).

So, while viral evolution is really, really fast, it’s not quite light speed, and we can now nudge it up a notch by providing a map to what mutations need to be made to fit a new cellular receptor. We’ve now been running some of our own simulations of various virus:receptor pairs, and the results are either illuminating or chilling, again depending on your predilections. And with the ability to synthesize genes and even genomes now available, a new, evolutionarily accelerated virus, could potentially be ready for field-testing. Keep in mind, however, that the real threat here was not ’synthetic biology.’ Gene synthesis is just an enabler, the same way NextGen sequencing is now an enabler or molecular biology writ large is an enabler. Heck, it may not even be that much of an enabler: you could argue that a graduate student with a couple of QuickChange mutagenesis kits could get the job done faster. No, the real key is the computation. Bioinformatics is providing a new window not just on what life is, but where it is going. The question becomes: who will get there first? Ooh, that was suitably dramatic.

 

- originally posted on Tuesday, October 5th, 2010