The tertiary structure of a protein describes the folding of its secondary structural elements and specifies the positions of each atom in the protein, including those of its side chains. This information is deposited in a database and is readily available via the Internet, which allows the tertiary structures of a variety of proteins to be analyzed and compared. The common features of protein tertiary structures reveal much about the biological functions of proteins and their evolutionary origins.
Protein Structures Are Determined by X-Ray Crystallography, Nuclear Magnetic Resonance, and Cryo-Electron Microscopy
X-Ray crystallography is a technique that directly images molecules. X-Rays must be used to do so because, according to optical principles, the uncertainty in locating an object is approximately equal to the wavelength of the radiation used to observe it (covalent bond distances and the wavelengths of the X-rays used in structural studies are both ∼1.5 Å; individual molecules cannot be seen in a light microscope because visible light has a minimum wavelength of 4000 Å).
There is, however, no such thing as an X-ray microscope because there are no X-ray lenses. Rather, a crystal of the molecule to be imaged is exposed to a collimated beam of X-rays and the resulting diffraction pattern, which arises from the regularly repeating positions of atoms in the crystal, is recorded by a radiation detector or, now infrequently, on photographic film. The X-rays used in structural studies are produced by laboratory X-ray generators or, now commonly, by synchrotrons, particle accelerators that produce X-rays of far greater intensity. The intensities of the diffraction peaks (darkness of the spots on a film) are then used to construct mathematically the three-dimensional image of the crystal structure through methods that are beyond the scope of this text. In what follows, we discuss some of the special problems associated with interpreting the X-ray crystal structures of proteins.
X-Rays interact almost exclusively with the electrons in matter, not the nuclei. An X-ray structure is therefore an image of the electron density of the object under study. Such electron density maps are usually presented with the aid of computer graphics as one or more sets of contours, in which a contour represents a specific level of electron density in the same way that a contour on a topographic map indicates locations that have a particular altitude.
Most Protein Crystal Structures Exhibit Less than Atomic Resolution
The molecules in protein crystals, as in other crystalline substances, are arranged in regularly repeating three-dimensional lattices. Protein crystals, however, differ from those of most small organic and inorganic molecules in being highly hydrated; they are typically 40 to 60% water by volume. The aqueous solvent of crystallization is necessary for the structural integrity of the protein crystals, because water is required for the structural integrity of native proteins themselves.
The large solvent content of protein crystals gives them a soft, jellylike consistency so that their molecules usually lack the rigid order characteristic of crystals of small molecules such as NaCl or glycine. The molecules in a protein crystal are typically disordered by more than an angstrom, so the corresponding electron density map lacks information concerning structural details of smaller size. The crystal is therefore said to have a resolution limit of that size. Protein crystals typically have resolution limits in the range 1.5 to 3.0 Å, although some are better ordered (have higher resolution; that is, a lesser resolution limit) and many are less ordered (have lower resolution).
Because an electron density map of a protein must be interpreted in terms of its atomic positions, the accuracy, and even the feasibility, of a crystal structure analysis depends on the crystal’s resolution limit. Indeed, the inability to obtain crystals of sufficiently high resolution is a major limiting factor in determining the X-ray crystal structure of a protein or other macromolecule.
At 6-Å resolution, the presence of a molecule the size of diketopiperazine is difficult to discern. At 2.0-Å resolution, its individual atoms cannot yet be distinguished, although its molecular shape has become reasonably evident. At 1.5-Å resolution, which roughly corresponds to a bond distance, individual atoms become partially resolved. At 1.1-Å resolution, atoms are clearly visible.
Most protein crystal structures are too poorly resolved for their electron density maps to reveal clearly the positions of individual atoms. Nevertheless, the distinctive shape of the polypeptide backbone usually permits it to be traced, which, in turn, allows the positions and orientations of its side chains to be deduced. Yet side chains of comparable size and shape, such as those of Leu, Ile, Thr, and Val, cannot always be differentiated (hydrogen atoms, having only one electron, are visible only if the resolution limit is less than 1.2 Å). Consequently, a protein structure cannot be elucidated from its electron density map alone, but knowing the primary structure of the protein permits the sequence of amino acid residues to be fitted to the electron density map. Mathematical refinement can then reduce the uncertainty in the crystal structure’s atomic positions to as little as 0.1 Å.
Most Crystalline Proteins Maintain Their Native Conformations
Does the structure of a protein in a crystal accurately reflect the structure of the protein in solution, where globular proteins normally function? Several lines of evidence indicate that crystalline proteins assume very nearly the same structures that they have in solution:
1. A protein molecule in a crystal is essentially in solution because it is bathed by solvent of crystallization over all of its surface except for the few, generally small, patches that contact neighboring protein molecules.
2. In cases when different crystal forms of a protein have been analyzed, or when a crystal structure has been compared to a solution structure (determined by NMR), the molecules have virtually identical conformations. Evidently, crystal packing forces do not greatly perturb the structures of protein molecules.
3. Many enzymes are catalytically active in the crystalline state. Because the activity of an enzyme is very sensitive to the positions of the groups involved in binding and catalysis, the crystalline enzymes must have conformations that closely resemble their solution conformations.
Protein Structures Can Be Determined by NMR
The basis of nuclear magnetic resonance (NMR) is that certain atomic nuclei, including 1H, 2 H, 13C, 15N, and 31P,when placed in a magnetic field, absorb radio-frequency radiation at frequencies that vary with each type of nucleus, its electronic environment, and its interactions with nearby nuclei. The development of NMR techniques since the mid-1980s, in large part by Richard Ernst and Kurt Wüthrich, has made it possible to determine the three-dimensional structures of globular proteins in aqueous solution.
A protein’s conventional (one-dimensional) proton (1H) NMR spectrum is crowded with overlapping peaks, since even a small protein has hundreds of protons. This problem is addressed by two-dimensional (2D) NMR spectroscopy, which yields additional peaks arising from the interactions of protons that are less than 5 Å apart. Correlation spectroscopy (COSY) provides interatomic distances between protons that are covalently connected through one or two other atoms, such as the H atoms attached to the N and Cα of the same amino acid (corresponding to the ϕ torsion angle). Nuclear Overhauser spectroscopy (NOESY) provides interatomic distances for protons that are close in space, although they may be far apart in the protein sequence.
Interatomic distance measurements between identified pairs of atoms, together with known geometric constraints such as covalent bond distances and angles, group planarity, chirality, and van der Waals radii, are used to compute the protein’s three-dimensional structure. However, because interproton distance measurements are imprecise, they cannot imply a unique structure but rather are consistent with an ensemble of closely related structures. Consequently, an NMR structure of a protein (or another macromolecule) is often presented as a sample of structures that are consistent with the data. The “tightness” of a bundle of such structures is indicative both of the accuracy with which the structure is known, which in the most favorable cases is roughly comparable to that of an X-ray crystal structure with a resolution of 2 to 2.5 Å, and of the conformational fluctuations that the protein undergoes. The proton 2D-NMR spectra of proteins larger than ∼30 kD are so crowded with cross peaks that it is all but impossible to interpret them. This problem has been alleviated by the development of multidimensional NMR techniques (which spreads these peaks into three or four dimensions) in which specific residues of a protein are enriched by genetic engineering techniques with 15N and/or 13C (replacing the far more naturally abundant but NMR-inactive 14N and 12C). Present NMR methods are limited to determining the structures of macromolecules with molecular masses no greater than ∼100 kD, but recent advances in NMR technology suggest that this limit may eventually increase to ∼1000 kD or more.
Around 12,000 NMR structures of proteins and nucleic acids have been determined. In cases in which both the X-ray and NMR structures of a protein are known, they exhibit very few significant differences, thus indicating that crystallization does not perturb the structure of a protein. Moreover, because NMR can probe motions over time scales spanning 10 orders of magnitude, it can also be used to study protein folding and dynamics.
Cryo-Electron Microscopy Directly Images Macromolecular Structures
As explained by the wave-particle duality, electrons, like all particles, have wavelike properties, with a wavelength λ = h/mv, where h is Planck’s constant, m is the particle’s mass, and v is its velocity. Thus, a beam of electrons of sufficiently high energy (E = 1/2mv2) will have a wavelength small enough to image molecules at atomic resolution (recall that the uncertainty in locating an object is approximately equal to the wavelength of the radiation used to observe it).
However, in conventional electron microscopy, samples must be thoroughly dried (to maintain the high vacuum through which the electron beam travels), which greatly distorts biological molecules. This led to the development of cryo-electron microscopy (cryo-EM; Greek: kryos, icy cold), in which a hydrated sample is cooled to near liquid nitrogen temperatures (–196°C) so rapidly (in a few milliseconds) that the water in the sample does not have time to crystallize (which would destroy the sample), but rather assumes a vitreous (glasslike) state. Consequently, the sample remains hydrated and retains its native structure. This, together with ongoing technological and theoretical advances in electron microscopy, has permitted, in recent years, the direct visualization of large molecular complexes, such as ribosomes, at near-atomic resolution (as little as 3.0 Å). Cryo-EM therefore holds great promise for determining the structures of fragile or flexible complexes that are difficult to crystallize and too large to visualize by NMR methods.
Proteins Can Be Depicted in Different Ways. The huge number of atoms in proteins makes it difficult to visualize them using the same sorts of models employed for small organic molecules. Ball-and-stick representations showing all or most atoms in a protein are exceedingly cluttered, and space-filling models obscure the internal details of the protein. Accordingly, computer-generated or artistic renditions are often more useful for representing protein structures. The course of the polypeptide chain can be followed by tracing the positions of its Cα atoms or by representing helices as helical ribbons or cylinders and β sheets as sets of flat arrows pointing from the N- to the C-termini.
Side Chain Location Varies with Polarity
In the years since Kendrew solved the structure of myoglobin, around 110,000 protein structures have been reported. No two are exactly alike, but they exhibit remarkable consistencies. The primary structures of globular proteins generally lack the repeating sequences that support the regular conformations seen in fibrous proteins. However, the amino acid side chains in globular proteins are spatially distributed according to their polarities:
1. The nonpolar residues Val, Leu, Ile, Met, and Phe occur mostly in the interior of a protein, out of contact with the aqueous solvent. The hydrophobic effects that promote this distribution are largely responsible for the three-dimensional structure of native proteins.
2. The charged polar residues Arg, His, Lys, Asp, and Glu are usually located on the surface of a protein in contact with the aqueous solvent. This is because immersing an ion in the virtually anhydrous interior of a protein is energetically unfavorable.
3. The uncharged polar groups Ser, Thr, Asn, Gln, and Tyr are usually on the protein surface but also occur in the interior of the molecule. When buried in the protein, these residues are almost always hydrogen bonded to other groups; in a sense, the formation of a hydrogen bond “neutralizes” their polarity. This is also the case with the polypeptide backbone.
These general principles of side chain distribution are evident in individual elements of secondary structure as well as in whole proteins. Polar side chains tend to extend toward—and thereby help form—the protein’s surface, whereas nonpolar side chains largely extend toward—and thereby occupy—its interior. Turns and loops joining secondary structural elements usually occur at the protein surface. Most proteins are quite compact, with their interior atoms packed together even more efficiently than the atoms in a crystal of small organic molecules. Nevertheless, the atoms of protein side chains almost invariably have low-energy arrangements. Evidently, interior side chains adopt relaxed conformations despite the profusion of intramolecular interactions. Closely packed protein interiors generally exclude water. When water molecules are present, they often occupy specific positions where they can form hydrogen bonds, sometimes acting as a bridge between two hydrogen-bonding protein groups.
Tertiary Structures Contain Combinations of Secondary Structure
Globular proteins—each with a unique tertiary structure—are built from combinations of secondary structural elements. The proportions of α helices and β sheets and the order in which they are connected provide an informative way of classifying and analyzing protein structure.
Certain Combinations of Secondary Structure Form Motifs. Groupings of secondary structural elements, called super secondary structures or motifs, occur in many unrelated globular proteins:
1. The most common form of super secondary structure is the 𝛃𝛂𝛃 motif, in which an α helix connects two parallel strands of a β sheet.
2. Another common super secondary structure, the 𝛃 hairpin motif, consists of antiparallel strands connected by relatively tight reverse turns.
3. In an 𝛂𝛂 motif, two successive antiparallel α helices pack against each other with their axes inclined. This permits energetically favorable intermeshing of their contacting side chains. Similar associations stabilize the coiled coil conformation of α keratin and tropomyosin, although their helices are parallel rather than antiparallel.
4. In the Greek key motif, a β hairpin is folded over to form a 4-stranded antiparallel β sheet.
Most Proteins Can Be Classified as 𝛂, 𝛃, or 𝛂/𝛃
The major types of secondary structural elements occur in globular proteins in varying proportions and combinations. Some proteins, such as E. coli cytochrome b562, consist only of α helices spanned by short connecting links and are therefore classified as 𝛂 proteins Structure Others, such as immunoglobulins, which contain the immunoglobulin fold, are called 𝛃 proteins because they have a large proportion of β sheets and are devoid of α helices. Most proteins, however, including lactate dehydrogenase and carboxypeptidase A, are known as 𝛂/𝛃 proteins because they largely consist of mixtures of both types of secondary structure (proteins, on average, contain ∼31% α helix and ∼28% β sheet).
The α, β, and α/β classes of proteins can be further categorized according to their topology; that is, according to how their secondary structural elements are connected. For example, extended β sheets often roll up to form 𝛃 barrels. Three different types of 8-stranded β barrels, each with a different topology. Two of these are all-β structures containing multiple β hairpin motifs. The third, known as an 𝛂/𝛃 barrel (Fig. 6-30c), can be considered as a set of overlapping βαβ motifs (and is a member of the α/β class of proteins).
Large Polypeptides Form Domains. Polypeptide chains containing more than ∼200 residues usually fold into two or more globular clusters known as domains, which give these proteins a bi- or multilobal appearance. Each subunit of the enzyme glyceraldehyde-3-phosphate dehydrogenase, for example, has two distinct domains.
Most domains consist of 40 to 200 amino acid residues and have an average diameter of ∼25 Å. An inspection of the various protein structures diagrammed in this chapter reveals that domains each consist of two or more layers of secondary structural elements. The reason for this is clear: At least two such layers are required to seal off a domain’s hydrophobic core from its aqueous environment. Thus, polypeptides shorter than 40 residues are unlikely to form stable structures in solution. On the other hand, domains much longer than 200 residues will fold too slowly.
A polypeptide chain wanders back and forth within a domain, but neighboring domains are usually connected by only one or two polypeptide segments. Consequently, many domains are structurally independent units that have the characteristics of small globular proteins. Nevertheless, the domain structure of a protein is not necessarily obvious, as its domains may make such extensive contacts with each other that the protein appears to be a single globular entity.
Domains often have a specific function such as the binding of a small molecule. The dinucleotide NAD+ binds to the N-terminal domain of glyceraldehyde3-phosphate dehydrogenase. Michael Rossmann has shown that a βαβαβ unit, in which the β strands form a parallel sheet with α helical connections, often acts as a nucleotide-binding site. Two of these βαβαβ units combine to form a domain known as a dinucleotide-binding fold, or Rossmann fold. Glyceraldehyde 3-phosphate dehydrogenase’s N-terminal domain contains such a fold, as does lactate dehydrogenase. In some multidomain proteins, binding sites occupy the clefts between domains; that is, small molecules are bound by groups from two domains. In such cases, the relatively pliant covalent connection between the domains allows flexible interactions between the protein and the small molecule.
Structure Is Conserved More Than Sequence
The many thousands of known protein structures, comprising an even greater number of separate domains, can be grouped into families by examining the overall paths followed by their polypeptide chains. Although it is estimated that there are as many as 1400 different protein domain families, approximately 200 different folding patterns account for about half of all known protein structures. The domain is the fundamental unit of protein evolution. Apparently, the most common protein domains are evolutionary sinks— domains that arose and persisted because of their ability (1) to form stable folding patterns; (2) to tolerate amino acid deletions, substitutions, and insertions, thereby making them more likely to survive evolutionary changes; and/or (3) to support essential biological functions.
Polypeptides with similar sequences tend to adopt similar backbone conformations. This is certainly true for evolutionarily related proteins that carry out similar functions. For example, the cytochromes c of different species are highly conserved proteins with closely similar sequences and three-dimensional structures.
Cytochrome c occurs only in eukaryotes, but prokaryotes contain proteins, known as c-type cytochromes, which perform the same general function (that of an electron carrier). The c-type cytochromes from different species exhibit only low degrees of sequence similarity to each other and to eukaryotic cytochromes c. However, their X-ray structures are clearly similar, particularly in polypeptide chain folding and side chain packing in the protein interior. The major structural differences among c-type cytochromes lie in the various polypeptide loops on their surfaces. The sequences of the c-type cytochromes have diverged so far from one another that, in the absence of their X-ray structures, they can be properly aligned only through the use of mathematically sophisticated computer programs. Thus, it appears that the essential structural and functional elements of proteins, rather than their amino acid residues, are conserved during evolution.