What is gene at molecular level?

Although all cells must be able to switch genes on and off in response to changes in their environments, the cells of multicellular organisms have evolved this capacity to an extreme degree and in highly specialized ways to form an organized array of differentiated cell types. In particular, once a cell in a multicellular organism becomes committed to differentiate into a specific cell type, the choice of fate is generally maintained through many subsequent cell generations, which means that the changes in gene expression involved in the choice must be remembered. This phenomenon of cell memory is a prerequisite for the creation of organized tissues and for the maintenance of stably differentiated cell types. In contrast, the simplest changes in gene expression in both eucaryotes and bacteria are only transient; the tryptophan repressor, for example, switches off the tryptophan genes in bacteria only in the presence of tryptophan; as soon as tryptophan is removed from the medium, the genes are switched back on, and the descendants of the cell will have no memory that their ancestors had been exposed to tryptophan. Even in bacteria, however, a few types of changes in gene expression can be inherited stably.

In this section we examine how gene regulatory devices can be combined to create “logic” circuits through which cells differentiate, keep time, remember events in their past, keep time, and adjust the levels of gene expression over whole chromosomes. We begin by considering some of the best-understood genetic mechanisms of cell differentiation, which operate in bacterial and yeast cells.

DNA Rearrangements Mediate Phase Variation in Bacteria

We have seen that cell differentiation in higher eucaryotes usually occurs without detectable changes in DNA sequence. In some procaryotes, in contrast, a stably inherited pattern of gene regulation is achieved by DNA rearrangements that activate or inactivate specific genes. Since changes in DNA sequence are copied faithfully during subsequent DNA replications, an altered state of gene activity will be inherited by all the progeny of the cell in which the rearrangement occurred. Some of these DNA rearrangements are, however, reversible so that occasional individuals switch back to original DNA configurations. The result is an alternating pattern of gene activity that can be detected by observations over long time periods and many generations.

A well-studied example of this differentiation mechanism occurs in Salmonella bacteria and is known as phase variation. Although this mode of differentiation has no known counterpart in higher eucaryotes, it can nevertheless have considerable impact on them because disease-causing bacteria use it to evade detection by the immune system. The switch in Salmonella gene expression is brought about by the occasional inversion of a specific 1000-nucleotide-pair piece of DNA. This change alters the expression of the cell-surface protein flagellin, for which the bacterium has two different genes (Figure 7-64). The inversion is catalyzed by a site-specific recombination enzyme and changes the orientation of a promoter that is within the 1000 nucleotide pairs. With the promoter in one orientation, the bacteria synthesize one type of flagellin; with the promoter in the other orientation, they synthesize the other type. Because inversions occur only rarely, whole clones of bacteria will grow up with one type of flagellin or the other.

What is gene at molecular level?

Figure 7-64

Switching gene expression by DNA inversion in bacteria. Alternating transcription of two flagellin genes in a Salmonella bacterium is caused by a simple site-specific recombination event that inverts a small DNA segment containing a promoter. (A) In one (more...)

Phase variation almost certainly evolved because it protects the bacterial population against the immune response of its vertebrate host. If the host makes antibodies against one type of flagellin, a few bacteria whose flagellin has been altered by gene inversion will still be able to survive and multiply.

Bacteria isolated from the wild very often exhibit phase variation for one or more phenotypic traits. These “instabilities” are usually lost with time from standard laboratory strains of bacteria, and underlying mechanisms have been studied in only a few cases. Not all involve DNA inversion. A bacterium that causes a common sexually transmitted human disease (Neisseria gonorrhoeae), for example, avoids immune attack by means of a heritable change in its surface properties that is generated by gene conversion (discussed in Chapter 5) rather than by inversion. This mechanism transfers DNA sequences from a library of silent “gene cassettes” to a site in the genome where the genes are expressed; it has the advantage of creating many variants of the major bacterial surface protein.

A Set of Gene Regulatory Proteins Determines Cell Type in a Budding Yeast

Because they are so easy to grow and to manipulate genetically, yeasts have served as model organisms for studying the mechanisms of gene control in eucaryotic cells. The common baker's yeast, Saccharomyces cerevisiae, has attracted special interest because of its ability to differentiate into three distinct cell types. S. cerevisiae is a single-celled eucaryote that exists in either a haploid or a diploid state. Diploid cells form by a process known as mating, in which two haploid cells fuse. In order for two haploid cells to mate, they must differ in mating type (sex). In S. cerevisiae there are two mating types, α and a, which are specialized for mating with each other. Each produces a specific diffusible signaling molecule (mating factor) and a specific cell-surface receptor protein. These jointly enable a cell to recognize and be recognized by its opposite cell type, with which it then fuses. The resulting diploid cells, called a/α, are distinct from either parent: they are unable to mate but can form spores (sporulate) when they run out of food, giving rise to haploid cells by the process of meiosis (discussed in Chapter 20).

The mechanisms by which these three cell types are established and maintained illustrate several of the strategies we have discussed for changing the pattern of gene expression. The mating type of the haploid cell is determined by a single locus, the mating-type (MAT) locus, which in an a-type cell encodes a single gene regulatory protein, a1, and in an α cell encodes two gene regulatory proteins, Matα1 and Matα2. The Mata1 protein has no effect in the a-type haploid cell that produces it but becomes important later in the diploid cell that results from mating; meanwhile, the α-type haploid cell produces the proteins specific to its mating type by default. In contrast, the α2 protein acts in the α cell as a transcriptional repressor that turns off the a-specific genes, while the α1 protein acts as a transcriptional activator that turns on the α-specific genes. Once cells of the two mating types have fused, the combination of the a1 and α2 regulatory proteins generates a completely new pattern of gene expression, unlike that of either parent cell. Figure 7-65 illustrates the mechanism by which the mating-type-specific genes are expressed in different patterns in the three cell types. This was among the first examples of combinatorial gene control to be identified, and it remains one of the best understood at the molecular level.

What is gene at molecular level?

Figure 7-65

Control of cell type in yeasts. Yeast cell type is determined by three gene regulatory proteins (α1, α2, and a1) produced by the MAT locus. Different sets of genes are transcribed in haploid cells of type a, in haploid cells of type α, (more...)

Although in most laboratory strains of S. cerevisiae, the a and α cell types are stably maintained through many cell divisions, some strains isolated from the wild can switch repeatedly between the a and α cell types by a mechanism of gene rearrangement whose effects are reminiscent of the DNA rearrangements in N. gonorrhoeae, although the exact mechanism seems to be peculiar to yeast. On either side of the MAT locus in the yeast chromosome, there is a silent locus encoding the mating-type gene regulatory proteins: the silent locus on one side encodes α1 and α2; the silent locus on the other side encodes a1. Approximately every other cell division, the active gene in the MAT locus is excised and replaced by a newly synthesized copy of the silent locus determining the opposite mating type. Because the change involves the removal of one gene from the active “slot” and its replacement by another, this mechanism is called the cassette mechanism. The change is reversible because, although the original gene at the MAT locus is discarded, a silent copy remains in the genome. New DNA copies made from the silent genes function as disposable cassettes that will be inserted in alternation into the MAT locus, which serves as the “playing head” (Figure 7-66).

What is gene at molecular level?

Figure 7-66

Cassette model of yeast mating-type switching. Cassette switching occurs by a gene-conversion process that involves a specialized enzyme (the HO endonuclease) that makes a double-stranded cut at a specific DNA sequence in the MAT locus. The DNA near the (more...)

The silent cassettes are maintained in a transcriptionally inactive form by the same mechanism that is responsible for silencing genes located at the ends of the yeast chromosomes (see Figure 4-47); that is, the DNA at a silent locus is packaged into a highly organized form of chromatin that is resistant to transcription.

Two Proteins That Repress Each Other's Synthesis Determine the Heritable State of Bacteriophage Lambda

The observation that a whole vertebrate or plant can be specified by the genetic information present in a single somatic cell nucleus (see Figure 7-2) eliminates the possibility that an irreversible change in DNA sequence is a major mechanism in the differentiation of higher eucaryotic cells (although such changes are a crucial part of lymphocyte differentiation—discussed in Chapter 24). Reversible DNA sequence changes, resembling those just described for Salmonella and yeasts, in principle could still be responsible for some of the inherited changes in gene expression observed in higher organisms, but there is currently no evidence that such mechanisms are widely used.

Other mechanisms that we have touched upon in this chapter, however, are also capable of producing patterns of gene regulation that can be inherited by subsequent cell generations. Perhaps the simplest example is found in the bacterial virus (bacteriophage) lambda where a switch causes the virus to flip-flop between two stable self-maintaining states. This type of switch can be viewed as a prototype for similar, but more complex, switches that operate in the development of higher eucaryotes.

We mentioned earlier that bacteriophage lambda can in favorable conditions become integrated into the E. coli cell DNA, to be replicated automatically each time the bacterium divides. Alternatively, the virus can multiply in the cytoplasm, killing its host (see Figure 5-81). The switch between these two states is mediated by proteins encoded by the bacteriophage genome. The genome contains a total of about 50 genes, which are transcribed in very different patterns in the two states. A virus destined to integrate, for example, must produce the lambda integrase protein, which is needed to insert the lambda DNA into the bacterial chromosome, but must repress production of the viral proteins responsible for virus multiplication. Once one transcriptional pattern or the other has been established, it is stably maintained.

We cannot discuss the details of this complex gene regulatory system here and instead outline a few of its general features. At the heart of the switch are two gene regulatory proteins synthesized by the virus: the lambda repressor protein (cI protein), which we have already encountered, and the Cro protein. These proteins repress each other's synthesis, an arrangement giving rise to just two stable states (Figure 7-67). In state 1 (the prophage state) the lambda repressor occupies the operator, blocking the synthesis of Cro and also activating its own synthesis. In state 2 (the lytic state) the Cro protein occupies a different site in the operator, blocking the synthesis of repressor but allowing its own synthesis. In the prophage state most of the DNA of the stably integrated bacteriophage is not transcribed; in the lytic state, this DNA is extensively transcribed, replicated, packaged into new bacteriophage, and released by host cell lysis.

What is gene at molecular level?

Figure 7-67

A simplified version of the regulatory system that determines the mode of growth of bacteriophage lambda in the E. coli host cell. In stable state 1 (the prophage state) the bacteriophage synthesizes a repressor protein, which activates its own synthesis (more...)

When the host bacteria are growing well, an infecting virus tends to adopt state 1, allowing the DNA of the virus to multiply along with the host chromosome. When the host cell is damaged, an integrated virus converts from state 1 to state 2 in order to multiply in the cell cytoplasm and make a quick exit. This conversion is triggered by the host response to DNA damage, which inactivates the repressor protein. In the absence of such interference, however, the lambda repressor both turns off production of the Cro protein and turns on its own synthesis, and this positive feedback loop helps to maintain the prophage state.

Gene Regulatory Circuits Can Be Used to Make Memory Devices As Well As Oscillators

Positive feedback loops provide a simple general strategy for cell memory—that is, for the establishment and maintenance of heritable patterns of gene transcription. Figure 7-68 shows the basic principle, stripped to its barest essentials. Variations of this simple strategy are widely used by eucaryotic cells. Several gene regulatory proteins that are involved in establishing the Drosophila body plan (discussed in Chapter 21), for example, stimulate their own transcription, thereby creating a positive feedback loop that promotes their continued synthesis; at the same time many of these proteins repress the transcription of genes encoding other important gene regulatory proteins. In this way, a sophisticated pattern of inherited behavior can be achieved with only a few gene regulatory proteins that reciprocally affect one another's synthesis and activities.

What is gene at molecular level?

Figure 7-68

Schematic diagram showing how a positive feedback loop can create cell memory. Protein A is a gene regulatory protein that activates its own transcription. All of the descendants of the original cell will therefore “remember” that the (more...)

Simple gene regulatory circuits can be combined to create all sorts of control devices, just as simple electronic switching elements in a computer are combined to perform all sorts of complex logical operations. Bacteriophage lambda, as we have seen, provides an example of a circuit that can flip-flop between two stable states. More complex types of regulatory networks are not only found in nature, but can also be designed and constructed in the laboratory. Figure 7-69 shows, for example, how an engineered bacterial cell can switch between three states in a prescribed order, thus functioning as an oscillator or “clock.”

What is gene at molecular level?

Figure 7-69

A simple gene clock designed in the laboratory. (A) Recombinant DNA techniques were used to place the genes for each of three different bacterial repressor proteins under the control of a different repressor. These repressors (denoted A, B, and C in the (more...)

Circadian Clocks Are Based on Feedback Loops in Gene Regulation

Life on Earth evolved in the presence of a daily cycle of day and night, and many present-day organisms (ranging from archaea to plants to humans) have come to possess an internal rhythm that dictates different behaviors at different times of day. These behaviors range from the cyclical change in metabolic enzyme activities of a fungus to the elaborate sleep-wake cycles of humans. The internal oscillators that control such diurnal rhythms are called circadian clocks.

By carrying its own circadian clock, an organism can anticipate the regular daily changes in its environment and take appropriate action in advance. Of course, the internal clock cannot be perfectly accurate, and so it must be capable of being reset by external cues such as the light of day. Thus circadian clocks keep running even when the environmental cues (changes in light and dark) are removed, but the period of this free-running rhythm is generally a little less or a little more than 24 hours. External signals indicating the time of day cause small adjustments in the running of the clock, so as to keep the organism in synchrony with its environment. Following more drastic shifts, circadian cycles become gradually reset (entrained) by the new cycle of light and dark, as anyone who has experienced jet lag can attest.

One might expect that the circadian clock in a complex multicellular creature such as a human would itself be a complex multicellular device, with different groups of cells responsible for different parts of the oscillation mechanism. Remarkably, however, it turns out that in almost all organisms, including humans, the timekeepers are individual cells. Thus, our diurnal cycles of sleeping and waking, body temperature, and hormone release are controlled by a clock that operates in each member of a specialized group of cells (the SCN cells) in the hypothalamus (a part of the brain). Even if these cells are removed from the brain and dispersed in a culture dish, they will continue to oscillate individually, showing a cyclic pattern of gene expression with a period of approximately 24 hours. In the intact body, the SCN cells receive neural cues from the retina, entraining them to the daily cycle of light and dark, and they send information about the time of day to other tissues such as the pineal gland, which relays the time signal to the rest of the body by releasing the hormone melatonin in time with the clock.

Although the SCN cells have a central role as timekeepers in mammals, it has been shown that they are not the only cells in the mammalian body that have an internal circadian rhythm or an ability to reset it in response to light. Similarly, in Drosophila, many different types of cells, including those of the thorax, abdomen, antenna, leg, wing, and testis all continue a circadian cycle when they have been dissected away from the rest of the fly. The clocks in these isolated tissues, like those in the SCN cells, can be reset by externally imposed light and dark cycles.

The working of circadian clocks, therefore, is a fundamental problem in cell biology. Although we do not yet understand all the details, studies in a wide variety of organisms have revealed many of the basic principles and molecular components. For animals, much of what we know has come from searches in Drosophila for mutations that make the fly's circadian clock run fast, or slow, or not all; and this work has led to the discovery that many of the same components are involved in the circadian clock of mammals. The mechanism of the clock in Drosophila is outlined in Figure 7-70. At the heart of the oscillator is a transcriptional feedback loop that has a time delay built into it: accumulation of certain key gene products switches off their transcription, but with a delay, so that—crudely speaking—the cell oscillates between a state where the products are present and transcription is switched off, and one where the products are absent and transcription is switched on.

What is gene at molecular level?

Figure 7-70

Outline of the mechanism of the circadian clock in Drosophila cells. The central feature of the clock is the periodic accumulation and decay of two gene regulatory proteins, Tim (short for timeless, based on the phenotype of a gene mutation) and Per (short (more...)

Despite the relative simplicity of the basic principle behind circadian clocks, the details are complex. One reason for this complexity is that clocks must be buffered against changes in temperature, which typically speed up or slow down macromolecular association. They must also run accurately but be capable of being reset. Although it is not yet understood how biological clocks run at a constant speed despite changes in temperature, the mechanism for resetting the Drosophila clock is the light-induced destruction of one of the key gene regulatory proteins, as indicated in Figure 7-70.

The Expression of a Set of Genes Can Be Coordinated by a Single Protein

Cells need to be able to switch genes on and off individually but they also need to coordinate the expression of large groups of different genes. For example, when a quiescent eucaryotic cell receives a signal to divide, many hitherto unexpressed genes are turned on together to set in motion the events that lead eventually to cell division (discussed in Chapter 18). One way bacteria coordinate the expression of a set of genes is by having them clustered together in an operon under control of a single promoter (see Figure 7-33). In eucaryotes, however, each gene is transcribed from a separate promoter.

How do eucaryotes coordinate gene expression? This is an especially important question because, as we have seen, most eucaryotic gene regulatory proteins act as part of a “committee” of regulatory proteins, all of which are necessary to express the gene in the right cell, at the right time, in response to the proper signals, and to the proper level. How then can a eucaryotic cell rapidly and decisively switch whole groups of genes on or off? The answer is that even though control of gene expression is combinatorial, the effect of a single gene regulatory protein can still be decisive in switching any particular gene on or off, simply by completing the combination needed to maximally activate or repress that gene. This situation is analogous to dialing in the final number of a combination lock: the lock will spring open if the other numbers have been previously entered. Just as the same number can complete the combination for different locks, the same protein can complete the combination for several different genes. If a number of different genes contain the regulatory site for the same gene regulatory protein, it can be used to regulate the expression of all of them.

An example of this in humans is the control of gene expression by the glucocorticoid receptor protein. To bind to regulatory sites in DNA, this gene regulatory protein must first form a complex with a molecule of a glucocorticoid steroid hormone, such as cortisol (see Figures 15-12 and 15-13). This hormone is released in the body during times of starvation and intense physical activity, and among its other activities, it stimulates cells in the liver to increase the production of glucose from amino acids and other small molecules. To make this response, liver cells increase the expression of many different genes, coding for metabolic enzymes and other products. Although these genes all have different and complex control regions, their maximal expression depends on the binding of the hormone-glucocorticoid receptor complex to a regulatory site in the DNA of each gene. When the body has recovered and the hormone is no longer present, the expression of each of these genes drops to its normal level in the liver. In this way a single gene regulatory protein can control the expression of many different genes (Figure 7-71).

What is gene at molecular level?

Figure 7-71

A single gene regulatory protein can coordinate the expression of several different genes. The action of the glucocorticoid receptor is illustrated schematically. On the left is a series of genes, each of which has various gene activator proteins bound (more...)

The effects of the glucocorticoid receptor are not confined to cells of the liver. In other cell types, activation of this gene regulatory protein by hormone also causes changes in the expression levels of many genes; the genes affected, however, are often different from those affected in liver cells. As we have seen, each cell type has an individualized set of gene regulatory proteins, and because of combinatorial control, these critically affect the action of the glucocorticoid receptor. Because the receptor is able to assemble with many different sets of cell-type specific gene regulatory proteins, it can produce a distinct spectrum of effects in different cell types.

Expression of a Critical Gene Regulatory Protein Can Trigger Expression of a Whole Battery of Downstream Genes

The ability to switch many genes on or off coordinately is important not only in the day-to-day regulation of cell function. It is also the means by which eucaryotic cells differentiate into specialized cell types during embryonic development. The development of muscle cells provides a striking example.

A mammalian skeletal muscle cell is a highly distinctive giant cell, formed by the fusion of many muscle precursor cells called myoblasts, and therefore containing many nuclei. The mature muscle cell is distinguished from other cells by a large number of characteristic proteins, including specific types of actin, myosin, tropomyosin, and troponin (all part of the contractile apparatus), creatine phosphokinase (for the specialized metabolism of muscle cells), and acetylcholine receptors (to make the membrane sensitive to nerve stimulation). In proliferating myoblasts these muscle-specific proteins and their mRNAs are absent or are present in very low concentrations. As myoblasts begin to fuse with one another, the corresponding genes are all switched on coordinately as part of a general transformation of the pattern of gene expression.

This entire program of muscle differentiation can be triggered in cultured skin fibroblasts and certain other cell types by introducing any one of a family of helix-loop-helix proteins—the so-called myogenic proteins (MyoD, Myf5, myogenin, and Mrf4)—normally expressed only in muscle cells (Figure 7-72A). Binding sites for these regulatory proteins are present in the regulatory DNA sequences adjacent to many muscle-specific genes, and the myogenic proteins thereby directly activate transcription of many muscle-specific structural genes. In addition, the myogenic proteins stimulate their own transcription as well as that of various other gene regulatory proteins involved in muscle development, creating an elaborate series of positive feedback loops that amplify and maintain the muscle developmental program, even after the initiating signal has dissipated (Figure 7-72B; see also Chapter 22).

What is gene at molecular level?

Figure 7-72

Role of the myogenic regulatory proteins in muscle development. (A) The effect of expressing the MyoD protein in fibroblasts. As shown in this immunofluorescence micrograph, fibroblasts from the skin of a chick embryo have been converted to muscle cells (more...)

It is probable that the fibroblasts and other cell types that are converted to muscle cells by the addition of myogenic proteins have already accumulated a number of gene regulatory proteins that can cooperate with the myogenic proteins to switch on muscle-specific genes. In this view it is a specific combination of gene regulatory proteins, rather than a single protein, that determines muscle differentiation. This idea is consistent with the finding that some cell types fail to be converted to muscle by myogenin or its relatives; these cells presumably have not accumulated the other gene regulatory proteins required.

The conversion of one cell type (fibroblast) to another (skeletal muscle) by a single gene regulatory protein reemphasizes one of the most important principles discussed in this chapter: dramatic differences between cell types—in size, shape, chemistry, and function—can be produced by differences in gene expression.

Combinatorial Gene Control Creates Many Different Cell Types in Eucaryotes

We have already discussed how multiple gene regulatory proteins can act in combination to regulate the expression of an individual gene. But, as the example of the myogenic proteins shows, combinatorial gene control means more than this: not only does each gene have many gene regulatory proteins to control it, but each regulatory protein contributes to the control of many genes. Moreover, although some gene regulatory proteins are specific to a single cell type, most are switched on in a variety of cell types, at several sites in the body, and at several times in development. This point is illustrated schematically in Figure 7-73, which shows how combinatorial gene control makes it possible to generate a great deal of biological complexity with relatively few gene regulatory proteins.

What is gene at molecular level?

Figure 7-73

The importance of combinatorial gene control for development. Combinations of a few gene regulatory proteins can generate many cell types during development. In this simple, idealized scheme a “decision” to make one of a pair of different (more...)

With combinatorial control, a given gene regulatory protein does not necessarily have a single, simply definable function as commander of a particular battery of genes or specifier of a particular cell type. Rather, gene regulatory proteins can be likened to the words of a language: they are used with different meanings in a variety of contexts and rarely alone; it is the well-chosen combination that conveys the information that specifies a gene regulatory event. One requirement of combinatorial control is that many gene regulatory proteins must be able to work together to influence the final rate of transcription. To a remarkable extent, this principle is true: even unrelated gene regulatory proteins from widely different eucaryotic species can cooperate when experimentally introduced into the same cell. This situation reflects both the high degree of conservation of the transcription machinery and the nature of transcriptional activation itself. As we have seen, transcriptional synergy, in which multiple activator proteins can show more than additive effects on the final state of transcription, results from the ability of the transcription machinery to respond to multiple inputs (see Figure 7-47). It seems that the multifunctional, combinatorial mode of action of gene regulatory proteins has put a tight constraint on their evolution: they must interact with other gene regulatory proteins, the general transcription factors, the RNA polymerase holoenzyme, and the chromatin-modifying enzymes.

An important consequence of combinatorial gene control is that the effect of adding a new gene regulatory protein to a cell will depend on the cell's past history, since this history will determine which gene regulatory proteins are already present. Thus during development a cell can accumulate a series of gene regulatory proteins that need not initially alter gene expression. When the final member of the requisite combination of gene regulatory proteins is added, however, the regulatory message is completed, leading to large changes in gene expression. Such a scheme, as we have seen, helps to explain how the addition of a single regulatory protein to a fibroblast can produce the dramatic transformation of the fibroblast into a muscle cell. It also can account for the important difference, discussed in Chapter 21, between the process of cell determination—where a cell becomes committed to a particular developmental fate—and the process of cell differentiation, where a committed cell expresses its specialized character.

The Formation of an Entire Organ Can Be Triggered by a Single Gene Regulatory Protein

We have seen that even though combinatorial control is the norm for eucaryotic genes, a single gene regulatory protein, if it completes the appropriate combination, can be decisive in switching a whole set of genes on or off, and we have seen how this can convert one cell type into another. A dramatic extension of the principle comes from studies of eye development in Drosophila, mice, and humans. Here, a gene regulatory protein (called Ey in flies and Pax-6 in vertebrates) is crucial. When expressed in the proper context, Ey can trigger the formation of not just a single cell type but a whole organ (an eye), composed of different types of cells, all properly organized in three-dimensional space.

The most striking evidence for the role of Ey comes from experiments in fruit flies in which the ey gene is artificially expressed early in development in groups of cells that normally will go on to form leg parts. This abnormal gene expression causes eyes to develop in the middle of the legs (Figure 7-74). The Drosophila eye is composed of thousands of cells, and the question of how a regulatory protein coordinates the specification of a whole array in a tissue is a central topic in developmental biology. As discussed in Chapter 21, it involves cell-cell interactions as well as intracellular gene regulatory proteins. Here, we note that Ey directly controls the expression of many other genes by binding to their regulatory regions. Some of the genes controlled by Ey are themselves gene regulatory proteins that, in turn, control the expression of other genes. Moreover, some of these regulatory genes act back on ey itself to create a positive feedback loop that ensures the continued synthesis of the Ey protein (Figure 7-75). In this way, the action of just one regulatory protein can turn on a cascade of gene regulatory proteins and cell-cell interaction mechanims whose actions result in an organized group of many different types of cells. One can begin to imagine how, by repeated applications of this principle, a complex organism is built up piece by piece.

What is gene at molecular level?

Figure 7-74

Expression of the Drosophila ey gene in precursor cells of the leg triggers the development of an eye on the leg. (A) Simplified diagrams showing the result when a fruit fly larva contains either the normally expressed ey gene (left) or an ey gene that (more...)

What is gene at molecular level?

Figure 7-75

Gene regulatory proteins that specify eye development in Drosophila. toy (twin of eyeless) and ey (eyeless) encode similar gene regulatory proteins, Toy and Ey, either of which, when ectopically expressed, can trigger eye development. In normal eye development, (more...)

Stable Patterns of Gene Expression Can Be Transmitted to Daughter Cells

Once a cell in an organism has become differentiated into a particular cell type, it generally remains specialized in that way, and if it divides, its daughters inherit the same specialized character. For example, liver cells, pigment cells, and endothelial cells (discussed in Chapter 22) divide many times in the life of an individual. This means that the pattern of gene expression specific to a differentiated cell must be remembered and passed on to its progeny through all subsequent cell divisions.

We have already described several ways of ensuring that daughter cells can “remember” what kind of cells they are supposed to be. One of the simplest is through a positive feedback loop (see Figures 7-68, 7-72B and 7–75) where a key gene regulatory protein activates transcription of its own gene (either directly or indirectly) in addition to that of other cell-type specific genes. The simple flip-flop switch shown in Figure 7-67 is a variation on this theme: by inhibiting expression of its own inhibitor, a gene product indirectly activates and maintains its own expression. Another very different way of maintaining cell type in eucaryotes is through the faithful propagation of chromatin structures from parent to daughter cells, as discussed in Chapter 4. Once a differentiated cell type has been specified by gene regulatory proteins, developmental decisions can be reinforced by packaging unexpressed genes into more compacted forms of chromatin and “marking” that chromatin as silent (see Figure 4-35). The chromatin of actively transcribed genes can also be marked and propagated by the same type of mechanism. The packing of selected regions of the genome into condensed chromatin is a genetic regulatory mechanism that is not available to bacteria, and it is thought to allow eucaryotes to maintain extraordinarily stable patterns of gene expression over many generations. This stability is particularly crucial in multicellular organisms, where abnormal gene expression in a single cell can have profound developmental consequences for the entire organism.

If maintenance of the pattern of gene expression depends on the pattern of chromatin packing, how is this chromatin configuration passed on faithfully from one cell to its daughters? Some possibilities have already been discussed in Chapter 4 (see Figure 4-48). One general mechanism depends on the cooperative binding of proteins to DNA (Figure 7-76). When the cell replicates its DNA, each DNA strand can inherit a share of the protein molecules bound to a given segment of the original double helix, and these inherited molecules can then recruit freshly made molecules to reconstruct a complete copy of the original chromatin complex in each daughter cell. This mechanism of cell memory can be based on cooperative binding of specific gene regulatory proteins, or of general chromatin structural components, or of both classes of molecules acting together. Thus, an initial pattern of binding of specific gene regulatory proteins can initiate a pattern of chromatin condensation that is subsequently maintained.

What is gene at molecular level?

Figure 7-76

A general scheme that permits the direct inheritance of states of gene expression during DNA replication. In this hypothetical model, portions of a cooperatively bound cluster of chromosomal proteins are transferred directly from the parental DNA helix (more...)

Yet another strategy of cell memory is based on self-propagating patterns of enzymatic modification of the chromatin proteins (as we saw in Chapter 4) or even of the DNA itself, as we explain later. But first we look more closely at a specific example in which cell memory clearly involves changes of chromatin structure.

Chromosome Wide Alterations in Chromatin Structure Can Be Inherited

We saw in Chapter 4 that chromatin states can be heritable, and that they can be used to establish and preserve patterns of gene expression over great distances along DNA and for many cell generations. A striking example of such long-range effects of chromatin organization occurs in mammals, where an alteration in the chromatin structure of an entire chromosome is used to modulate levels of expression of all genes on that chromosome.

Males and females differ in their sex chromosomes. Females have two X chromosomes, whereas males have one X and one Y chromosome. As a result, female cells contain twice as many copies of X-chromosome genes as do male cells. In mammals, the X and Y sex chromosomes differ radically in gene content: the X chromosome is large and contains more than a thousand genes, whereas the Y chromosome is smaller and contains less than 100 genes. Mammals have evolved a dosage compensation mechanism to equalize the dosage of X chromosome gene products between males and females. Mutations that interfere with dosage compensation are lethal, demonstrating the necessity of maintaining the correct ratio of X chromosome to autosome (non-sex chromosome) gene products.

In mammals dosage compensation is achieved by the transcriptional inactivation of one of the two X chromosomes in female somatic cells, a process known as X-inactivation. Early in the development of a female embryo, when it consists of a few thousand cells, one of the two X chromosomes in each cell becomes highly condensed into a type of heterochromatin. The condensed X chromosome can be easily seen under the light microscope in interphase cells; it was originally called a Barr body and is located near the nuclear membrane. As a result of X-inactivation, two X chromosomes can coexist within the same nucleus exposed to the same transcriptional regulatory proteins, yet differ entirely in their expression.

The initial choice of which X chromosome to inactivate, the maternally inherited one (Xm) or the paternally inherited one (Xp), is random. Once either Xp or Xm has been inactivated, it remains silent throughout all subsequent cell divisions of that cell and its progeny, indicating that the inactive state is faithfully maintained through many cycles of DNA replication and mitosis. Because X- inactivation is random and takes place after several thousand cells have already formed in the embryo, every female is a mosaic of clonal groups of cells in which either Xp or Xm is silenced (Figure 7-77). These clonal groups are distributed in small clusters in the adult animal because sister cells tend to remain close together during later stages of development. For example, X-chromosome inactivation causes the red and black “tortoise-shell” coat coloration of some female cats. In these cats, one X chromosome carries a gene that produces red hair color, and the other X chromosome carries an allele of the same gene that results in black hair color; it is the random X-inactivation that produces patches of cells of two distinctive colors. In contrast to the females, male cats of this genetic stock are either solid red or solid black, depending on which X chromosome they inherit from their mothers. Although X-chromosome inactivation is maintained over thousands of cell divisions, it is not always permanent. In particular, it is reversed during germ cell formation, so that all haploid oocytes contain an active X chromosome and can express X-linked gene products.

What is gene at molecular level?

Figure 7-77

X-inactivation. The clonal inheritance of a condensed inactive X chromosome that occurs in female mammals.

How is an entire chromosome transcriptionally inactivated? X-chromosome inactivation is initiated and spreads from a single site in the middle of the X chromosome, the X-inactivation center (XIC). Portions of the X chromosome that are removed from the XIC and fused to an autosome escape inactivation. In contrast, autosomes that are fused to the XIC of an inactive X chromosome are transcriptionally silenced. The XIC (a DNA sequence of approximately 106 nucleotide pairs) can therefore be considered as a large regulatory element that seeds the formation of heterochromatin and facilitates its bi-directional spread along the entire chromosome. Encoded within the XIC is an unusual RNA molecule, XIST RNA, which is expressed solely from the inactive X chromosome and whose expression is necessary for X-inactivation. It does not get translated into protein; rather the XIST RNA remains in the nucleus, where it eventually coats the inactive X chromosome. The spread of XIST RNA from the XIC over the entire chromosome correlates with the spread of gene silencing, indicating that XIST RNA participates in the formation and spread of heterochromatin (Figure 7-78). In addition to containing XIST RNA, the X-chromosome heterochromatin is characterized by a specific variant of histone 2A, by hypoacetylation of histones H3 and H4, by methylation of a specific position on histone H3 and by methylation of the underlying DNA, a topic we will discuss below. Presumably all these features make the inactive X chromosome unusually resistant to transcription.

What is gene at molecular level?

Figure 7-78

Mammalian X-chromosome inactivation. X-chromosome inactivation begins with the synthesis of XIST (X-inactivation specific transcript) RNA from the XIC (X-inactivation center) locus. The association of XIST RNA with the X chromosome is correlated with (more...)

Many features of mammalian X-chromosome inactivation remain to be discovered. How is the initial decision made as to which X chromosome to inactivate? What mechanism prevents the other X chromosome from also being inactivated? How does XIST RNA coordinate the formation of heterochromatin? How is the inactive chromosome maintained through many cell divisions? We are just beginning to understand this mechanism of gene regulation that is crucial for the survival of our own species.

X-chromosome inactivation in females is only one way that sexually reproducing organisms solve the problem of dosage compensation. In Drosophila, all the genes on the single X chromosome present in male cells are transcribed at two-fold higher levels than their counterparts in female cells. This male-specific “up-regulation” of transcription results from an alteration in chromatin structure over the entire male X chromosome. As in mammals, this alteration involves the association of a specific RNA molecule with the X chromosome; however, in Drosophila, the X-chromosome-associated RNA increases gene activity rather than blocking it. The male X chromosome also contains a specific pattern of histone acetylation which may help to attract the transcription machinery to this chromosome (see Figures 4-35 and 7-46).

Dosage compensation in the nematode worm occurs by a third strategy. Here, the two sexes are male (with one X chromosome) and hermaphrodite (with two X chromosomes), and dosage compensation occurs by a two-fold “down-regulation” of transcription from each of the two X chromosomes in cells of the hermaphrodite. This is brought about through chromosome-wide structural changes in the X chromosomes of hermaphrodites (Figure 7-79). These changes involve the X-specific assembly of proteins, some of which are shared with the condensins that helps condense chromosomes during mitosis (see Figures 4-56 and 18-3).

What is gene at molecular level?

Figure 7-79

Localization of dosage compensation proteins to the X chromosomes of C. elegans hermaphrodite (XX) nuclei. Many nuclei from a developing embryo are visible in this image. Total DNA is stained blue with the DNA-intercalating dye DAPI, and the Sdc-2 protein (more...)

Although the strategies for dosage compensation differ between mammals, flies, and worms, they all involve structural alterations over the entire X chromosome. It is likely that features of chromosome structure that are quite general were adapted and harnessed during evolution to overcome a highly specific problem in gene regulation encountered by sexually reproducing animals.

The Pattern of DNA Methylation Can Be Inherited When Vertebrate Cells Divide

Thus far, we have emphasized the regulation of gene transcription by proteins that associate with specific DNA sequences. However, DNA itself can be covalently modified, and in the following sections we shall see that this, too, provides opportunities for the regulation of gene expression. In vertebrate cells the methylation of cytosine seems to provide an important mechanism for distinguishing genes that are active from those that are not. The methylated form of cysteine, 5-methylcytosine (5-methyl C), has the same relation to cytosine that thymine has to uracil and the modification likewise has no effect on base-pairing (Figure 7-80). The methylation in vertebrate DNA is restricted to cytosine (C) nucleotides in the sequence CG, which is base-paired to exactly the same sequence (in opposite orientation) on the other strand of the DNA helix. Consequently, a simple mechanism permits the existing pattern of DNA methylation to be inherited directly by the daughter DNA strands. An enzyme called maintenance methyltransferase acts preferentially on those CG sequences that are base-paired with a CG sequence that is already methylated. As a result, the pattern of DNA methylation on the parental DNA strand serves as a template for the methylation of the daughter DNA strand, causing this pattern to be inherited directly following DNA replication (Figure 7-81).

What is gene at molecular level?

Figure 7-80

Formation of 5-methylcytosine occurs by methylation of a cytosine base in the DNA double helix. In vertebrates this event is confined to selected cytosine (C) nucleotides located in the sequence CG.

What is gene at molecular level?

Figure 7-81

How DNA methylation patterns are faithfully inherited. In vertebrate DNAs a large fraction of the cytosine nucleotides in the sequence CG are methylated (see Figure 7-80). Because of the existence of a methyl-directed methylating enzyme (the maintenance (more...)

The stable inheritance of DNA methylation patterns can be explained by maintenance DNA methyltransferases. DNA methylation patterns, however, are dynamic during vertebrate development. Shortly after fertilization there is a genome-wide wave of demethylation, when the vast majority of methyl groups are lost from the DNA. This demethylation may occur either by suppression of maintenance DNA methyltransferase activity, resulting in the passive loss of methyl groups during each round of DNA replication, or by a specific demethylating enzyme. Later in development, at the time that the embryo implants in the wall of the uterus, new methylation patterns are established by several de novo DNA methyltransferases that modify specific unmethylated CG dinucleotides. Once the new patterns of methylation are established, they can be propagated through rounds of DNA replication by the maintenance methyl transferases. Mutations in either the maintenance or the de novo methyltransferases result in early embryonic death in mice, indicating that establishing and maintaining correct methylation patterns is crucial for normal development.

Vertebrates Use DNA Methylation to Lock Genes in a Silent State

In vertebrates DNA methylation is found primarily on transcriptionally silent regions of the genome, such as the inactive X chromosome or genes that are inactivated in certain tissues, suggesting that it plays a role in gene silencing. Vertebrate cells contain a family of proteins that bind methylated DNA. These DNA-binding proteins, in turn, interact with chromatin remodeling complexes and histone deacetylases that condense chromatin so it becomes transcriptionally inactive. In spite of this, DNA methylation is not sufficient to signal the inactivation of a gene, as the following examples demonstrate. Plasmid DNA encoding a muscle-specific actin gene can be prepared in vitro in both fully methylated and fully unmethylated forms, using bacterial proteins that methylate or demethylate DNA. When these two versions of the plasmid are introduced into cultured muscle cells, the methylated plasmid is transcribed at the same high rate as the unmethylated copy. Moreover, when a silent, methylated gene is turned on during the normal course of development, methylation is lost only after the gene has been transcribed for some time. Finally, during X chromosome inactivation, condensation and silencing occur before an increase in levels of DNA methylation can be detected. These results all suggest that methylation reinforces transcriptional repression that is initially established by other mechanisms. DNA methylation seems to be used in vertebrates mainly to ensure that once a gene is turned off, it stays off completely (Figure 7-82).

What is gene at molecular level?

Figure 7-82

How DNA methylation may help turn off genes. The binding of gene regulatory proteins and the general transcription machinery near an active promoter may prevent DNA methylation by excluding de novo methylases. If most of these proteins dissociate from (more...)

Experiments designed to test whether a DNA sequence that is transcribed at high levels in one vertebrate cell type is transcribed at all in another have demonstrated that rates of gene transcription can differ between two cell types by a factor of more than 106. Thus unexpressed vertebrate genes are much less “leaky” in terms of transcription than are unexpressed genes in bacteria, in which the largest known differences in transcription rates between expressed and unexpressed gene states are about 1000-fold. DNA methylation of unexpressed vertebrate genes, with the consequent changes in their chromatin structures, accounts for at least part of this difference. Leaky transcription of the many thousands of genes that are normally turned off completely in each vertebrate cell may be the cause of early embryonic death in mice that lack the maintenance DNA methyltransferase.

Transcriptional silencing in vertebrate genomes is also particularly important to repress the proliferation of transposable elements (see Figure 4-17). While coding sequences make up only a few percent of a typical vertebrate genome, transposable elements can comprise nearly half of these genomes. As we saw in Chapter 5, transposable elements can make copies of themselves and insert these copies elsewhere in the genome, potentially disrupting genes or important regulatory sequences. By suppressing the transcription of transposable elements, DNA methylation limits their spread and thereby maintains the integrity of the genome. In addition to these varied uses, DNA methylation is also required for at least one special type of cellular memory, as we discuss next.

Genomic Imprinting Requires DNA Methylation

Mammalian cells are diploid, containing one set of genes inherited from the father and one set from the mother. In a few cases the expression of a gene has been found to depend on whether it is inherited from the mother or the father, a phenomenon called genomic imprinting. The gene for insulin-like growth factor-2 (Igf2) is one example of an imprinted gene. Igf2 is required for prenatal growth, and mice that do not express Igf2 are born half the size of normal mice. Only the paternal copy of Igf2 is transcribed. As a result, mice with a mutated paternally derived Igf2 gene are stunted, while mice with a defective maternally derived Igf2 gene are normal.

During the formation of germ cells, genes subject to imprinting are marked by methylation according to whether they are present in a sperm or an egg. In this way, the parental origin of the gene can be subsequently detected in the embryo; DNA methylation is thus used as a mark to distinguish two copies of a gene that may be otherwise identical (Figure 7-83). Because imprinted genes are not affected by the wave of demethylation that takes place shortly after fertilization (see p. 430), this mark enables somatic cells to “remember” the parental origin of each of the two copies of the gene and to regulate their expression accordingly. In most cases, the methyl imprint silences nearby gene expression using the mechanisms shown in Figure 7-82. In some cases, however, the methyl imprint can activate expression of a gene. In the case of Igf2, for example, methylation of an insulator element (see Figure 7-61) on the paternally derived chromosome blocks its function and allows a distant enhancer to activate transcription of the Igf2 gene. On the maternally derived chromosome, the insulator is not methylated and the Igf2 gene is therefore not transcribed (Figure 7-84).

What is gene at molecular level?

Figure 7-83

Imprinting in the mouse. The top portion of the figure shows a pair of homologous chromosomes in the somatic cells of two adult mice, one male and one female. In this example, both mice have inherited the top homolog from their father and the bottom homolog (more...)

What is gene at molecular level?

Figure 7-84

Mechanism of imprinting of the mouse Igf2 gene. On chromosomes inherited from the female, a protein called CTCF binds to an insulator, (see Figure 7-61) blocking communication between the enhancer (green) and the Igf2 gene (orange). Igf2 is therefore (more...)

Imprinting is an example of an epigenetic change, that is, a heritable change in phenotype that does not result from a change in DNA nucleotide sequence. Why imprinting should exist at all is a mystery. In vertebrates, it is restricted to placental mammals, and all the imprinted genes are involved in fetal development. One idea is that imprinting reflects a middle ground in the evolutionary struggle between males to produce larger offspring and females to limit offspring size. Whatever its purpose might be, imprinting provides startling evidence that features of DNA other than its sequence of nucleotides can be inherited.

CG-rich Islands Are Associated with About 20,000 Genes in Mammals

Because of the way DNA repair enzymes work, methylated C nucleotides in the genome tend to be eliminated in the course of evolution. Accidental deamination of an unmethylated C gives rise to U, which is not normally present in DNA and thus is recognized easily by the DNA repair enzyme uracil DNA glycosylase, excised, and then replaced with a C (as discussed in Chapter 5). But accidental deamination of a 5-methyl C cannot be repaired in this way, for the deamination product is a T and so indistinguishable from the other, nonmutant T nucleotides in the DNA. Although a special repair system exists to remove these mutant T nucleotides, many of the deaminations escape detection, so that those C nucleotides in the genome that are methylated tend to mutate to T over evolutionary time.

During the course of evolution, more than three out of every four CGs have been lost in this way, leaving vertebrates with a remarkable deficiency of this dinucleotide. The CG sequences that remain are very unevenly distributed in the genome; they are present at 10 to 20 times their average density in selected regions, called CG islands, that are 1000 to 2000 nucleotide pairs long. These islands, with some important exceptions, seem to remain unmethylated in all cell types. They often surround the promoters of the so-called housekeeping genes—those genes that code for the many proteins that are essential for cell viability and are therefore expressed in most cells (Figure 7-85). In addition, some tissue-specific genes, which code for proteins needed only in selected types of cells, are also associated with CG islands.

What is gene at molecular level?

Figure 7-85

The CG islands surrounding the promoter in three mammalian housekeeping genes. The yellow boxes show the extent of each island. As for most genes in mammals (see Figure 6-25), the exons (dark red) are very short relative to the introns (light red). (Adapted (more...)

The distribution of CG islands (also called CpG islands to distinguish the CG dinucleotides from CG nucleotide pairs) can be explained if we assume that CG methylation was adopted in vertebrates primarily as a way of maintaining DNA in a transcriptionally inactive state (Figure 7-82 and Figure 7-86). In vertebrates, new methyl-C to T mutations can be transmitted to the next generation only if they occur in the germ line, the cell lineage that gives rise to sperm or eggs. Most of the DNA in vertebrate germ cells is inactive and highly methylated. Over long periods of evolutionary time, the methylated CG sequences in these inactive regions have presumably been lost through spontaneous deamination events that were not properly repaired. However promoters of genes that remain active in the germ cell lineages (including most housekeeping genes) are kept unmethylated, and therefore spontaneous deaminations of Cs that occur within them can be accurately repaired. Such regions are preserved in modern day vertebrate cells as CG islands. In addition, any mutation of a CG sequence in the genome that destroyed the function or regulation of a gene in the adult would be selected against, and some CG islands are simply the result of a higher than normal density of critical CG sequences.

What is gene at molecular level?

Figure 7-86

A mechanism to explain both the marked overall deficiency of CG sequences and their clustering into CG islands in vertebrate genomes. A black line marks the location of a CG dinucleotide in the DNA sequence, while a red “lollipop” indicates (more...)

The mammalian genome contains an estimated 20,000 CG islands. Most of the islands mark the 5′ ends of transcription units and thus, presumably, of genes. The presence of CG islands often provides a convenient way of identifying genes in the DNA sequences of vertebrate genomes.

Summary

The many types of cells in animals and plants are created largely through mechanisms that cause different genes to be transcribed in different cells. Since many specialized animal cells can maintain their unique character through many cell division cycles and even when grown in culture, the gene regulatory mechanisms involved in creating them must be stable once established and heritable when the cell divides. These features endow the cell with a memory of its developmental history. Bacteria and yeasts provide unusually accessible model systems in which to study gene regulatory mechanisms. One such mechanism involves a competitive interaction between two gene regulatory proteins, each of which inhibits the synthesis of the other; this can create a flip-flop switch that switches a cell between two alternative patterns of gene expression. Direct or indirect positive feedback loops, which enable gene regulatory proteins to perpetuate their own synthesis, provide a general mechanism for cell memory. Negative feedback loops with programmed delays form the basis for cellular clocks.

In eucaryotes the transcription of a gene is generally controlled by combinations of gene regulatory proteins. It is thought that each type of cell in a higher eucaryotic organism contains a specific combination of gene regulatory proteins that ensures the expression of only those genes appropriate to that type of cell. A given gene regulatory protein may be active in a variety of circumstances and typically is involved in the regulation of many genes.

In addition to diffusible gene regulatory proteins, inherited states of chromatin condensation are also used by eucaryotic cells to regulate gene expression. An especially dramatic case is the inactivation of an entire X chromosome in female mammals. In vertebrates DNA methylation also functions in gene regulation, being used mainly as a device to reinforce decisions about gene expression that are made initially by other mechanisms. DNA methylation also underlies the phenomenon of genomic imprinting in mammals, in which the expression of a gene depends on whether it was inherited from the mother or the father.

What is a molecular level?

When a scientist studies things on a molecular level, she's looking at them up close, examining their cells. The adjective molecular comes up most often in biology and chemistry, and it always describes the very smallest units that make up organisms or elements.

What is evolution at the molecular level?

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes.

Where are genes located at a molecular level?

Genes are contained in chromosomes, which are in the cell nucleus. A chromosome contains hundreds to thousands of genes.

What is meant by molecular genetics?

Molecular genetics (MG) is a scientific discipline concerned with the structure and function of genes at the molecular level and includes the technique of genetic engineering, which can be defined as the direct manipulation of an organism's genome.