One million human genomes, will it make a difference? The large and growing volume of genome information, from all forms of life, presents unprecedented opportunities for computational biologists. The challenge for our scientific generation is to turn an avalanche of sequence information into meaningful discovery of biological principles, predictive methods, or strategies for molecular manipulation for therapeutic and biofuel discovery.
The Marks lab is a new interdisciplinary lab dedicated to developing rigorous computational approaches to critical challenges in biomedical research, particularly on the interpretation of genetic variation and its impact on basic science and clinical medicine. To address this we develop algorithmic approaches to biological data aimed at teasing out causality from correlative observations, an approach that has been surprisingly successful to date on notoriously hard problems. In particular, we developed methods adapted from statistical physics and graphical modeling to disentangle true contacts from observed evolutionary correlations of residues in protein sequences. Remarkably, these evolutionary couplings, identified from sequence alone, supplied enough information to fold a protein sequence into 3D. The software and methods we developed is available to the biological community on a public server that is quick and easy for non-experts to use. In this evolutionary approach to accurately we have predicted the 3D structure of hundreds of proteins and large pharmaceutically relevant membrane proteins. Many of these were previously of unknown structure and had no homology to known sequences; two of the large membrane proteins have now been experimentally validated. We have now applied this approach genome wide to determine the 3D structure of all protein interactions that have sufficient sequences and can demonstrate the evolutionary signature of alternative conformations.
The vision for the Marks lab is to build computational methods that address three critical challenges (i) protein conformational plasticity in health and disease, (ii) genome-wide evaluation of mutations on disease likelihood, antibiotic resistance and personal drug response, and (iii) synthetic protein design.