Newsletter - Edition 26
Protein Folding, Misfolding and Disease
Oxford Innovation Society lecture, September 1998
Professor Chris Dobson, Director of the Oxford Centre for Molecular
Sciences
Proteins are involved in virtually every biological process in a living system. They are synthesised on ribosomes as linear chains of typically several hundred amino acids in a specific order from information encoded within the cellular DNA. In order to function these chains must fold into the unique native three-dimensional structures that are characteristic of the individual proteins. In a cell, this takes place in a complex and highly crowded molecular environment, and there are several families of cellular proteins whose job is to assist in the folding process of the other proteins that are required by the living organism. These include the well-known 'molecular chaperones' and several classes of proteins which catalyse specific steps in folding, for example, formation and rearrangement of disulphide bonds.
The folding of a protein is a highly complex process. There are nearly 100,000 proteins encoded in the human genome, and there are thought to be more than a thousand fundamentally distinct structural architectures into which folded proteins can be classified. To establish how a newly formed polypeptide sequence finds its way to its correct fold rather than the countless alternatives is one of the greatest challenges in modern structural biology. To respond to the challenge requires interdisciplinary research involving the most advanced techniques available to the chemist and structural biologist. The existence of OCMS has been critical for the advancement of this work in Oxford.
The process of protein folding is quite distinct from the familiar reactions of small molecules. Protein folding involves a complex molecular recognition phenomenon that depends on the cooperative action of a vast number of relatively weak non-covalent interactions involving thousands of atoms. The number of possible conformations of a polypeptide chain is astronomically large. For a chain of 100 amino acids (a small protein), for example, there are about 1050 distinct conformations. If we assume that these conformations can interconvert as rapidly as physical principles allow, a random search of all of them would take about 1030 years! How a protein can find the most stable of these in a typical folding time of 1 s has come to be known as the Levinthal Paradox, and a search for a solution to this problem has dominated thinking about folding for more than 30 years.
Experimental and Theoretical Investigations
In order to investigate the structural changes taking place during folding it has been necessary to devise novel experimental strategies. One of the most important approaches has been to utilise a combination of different stopped and quenched flow methods each capable of probing rapidly the development of different characteristics of the native state. Thus, for example, circular dichroism in the far UV region provides a measure of the appearance of secondary structure during folding, whilst measurements in the near UV monitor the formation of a close packed environment for aromatic residues.
Recent developments in NMR, a field of research in which Oxford has excelled in for many years, have been of particular significance in these studies because this technique has the capability of characterising ensembles of conformations in solution at the level of individual amino acid residues. Of particular interest have been studies within OCMS for carrying out NMR experiments in 'real time' to follow the progress of complex reactions. These approaches are just beginning to contribute substantially to our knowledge of the molecular transitions taking place during folding.
In parallel with these developments in experimental strategies to study protein folding there have been advances in theoretical approaches. Of particular importance have been simulation techniques using models of proteins that are simple enough to permit extensive calculations to be performed, but complex enough to include characteristics of proteins such as the existence of a Levinthal paradox. These approaches have begun to provide the framework for a conceptual understanding of the folding reaction.
Protein folding involves a biased downhill search on an effective energy surface in which native-like interactions between residues are on average more stabilising than non-native ones. Folding of different molecules involves different trajectories on this surface, as the myriad of weak but stabilising interactions between atoms that are characteristic of the native state can be formed in many different orders. The energy surface for folding is said to resemble a 'funnel' in that the number of conformational states accessible to a given sequence becomes progressively narrower as the number of native-like contacts increases, i.e. as the protein folds. A protein molecule can therefore get to the lowest point on the surface far more rapidly than a calculation based on the total number of possible locations on the surface would suggest.
The Link between the Sequence and the Fold
These conceptual advances have resulted in the establishment of a general mechanism for protein folding. The crucial question remains, however, as to how the sequence of a specific protein determines the shape of its energy surface and hence the conformation of its native state and its ability to achieve it. This is not yet fully understood and despite many attempts no one has yet been able to fold a protein correctly in a computer using a series of potentials that describe the interactions between the different amino acid residues.
Nevertheless, the outline of the way in which the sequence defines the fold is beginning to become clear. The distribution of polar (hydrophilic) and non-polar (hydrophobic) residues along the polypeptide chain is now known to be a crucial factor in determining the overall fold of the protein at least in rudimentary form. Once this has been achieved the remainder of the folding process involves a search for the unique close packed structure characteristic of all native states. This appears to be initiated by the formation of a core or nucleus of critical residues brought together by random fluctuations within the incompletely folded structures.
As we learn more about the folding process it becomes possible to utilise this knowledge for practical purposes. An understanding of how proteins fold is the key to being able to predict structure from sequence. This is a critical part of interpreting and utilising information from the various genomic sequencing projects. In addition, we should be better able to design new proteins with potential novel and exciting functions. But less obvious, illustrating the unpredictability of science, has been the fascinating link between folding mechanisms and disease.
Misfolding and Disease
In a cellular environment molecular chaperones help to protect the incompletely folded polypeptide chains from aggregating. Even after the folding process is complete, however, a protein can subsequently experience conditions under which it unfolds, at least partially, and then it is again prone to aggregation. It is becoming clear that the failure of proteins to fold correctly or to remain folded under all appropriate physiological conditions can give rise to a wide range of pathological conditions. Diseases associated with misfolding now include genetic, sporadic and even infectious ailments. Examples include cystic fibrosis, some forms of emphysema, and a variety of senile dementias including Alzheimer's disease. The latter is a member of the family of diseases known as amyloidoses, which also includes the prion diseases such as BSE and CJD, which are associated with the aggregation of normally soluble proteins to insoluble fibrils.
Although relative few proteins are known to be associated with clinical amyloidoses, it seems likely that amyloid fibrils can form with a wide range of proteins under appropriate conditions. These fibrils are ordered aggregates with extensive b-sheet structure resulting from intermolecular hydrogen bonds. Their appearance is closely similar regardless of the structures of the soluble proteins from which they are derived. Recent studies using the interdisciplinary approach pioneered within OCMS are beginning to reveal some of the details of the molecular arrangement of the polypeptide chains within the fibrils, and the mechanism of their formation. This may lead to the development of novel therapeutic strategies for treating or preventing many of the most debilitating of human diseases. Moreover, it is possible that these new fibrillar forms of protein might have properties of considerable interest and value in their own right. Within OCMS we are presently exploring both of these issues as vigorously as we can!

