Page 50 - Genetics_From_Genes_to_Genomes_6th_FULL_Part3
P. 50
344 Chapter 10 Genome Annotation
Figure 10.3 Homology map for a 100 kb region of the enhancer elements (see Fig. 8.11) that help determine when
human genome. Regions in black are homologous between the and where nearby genes are transcribed into mRNA.
human genome and the genome of the indicated species. Most DNA
sequences conserved between humans and zebrafish are found in
protein-coding exons. Some sequences outside of the exons are The Most Direct Method to Find Genes
also constrained evolutionarily, suggesting that they may play Is to Locate Transcribed Regions
functional roles that are currently unknown. (UTR: untranslated
region; CDS: protein-coding sequence).
Many genes encode proteins while some others, such as the
genes for rRNAs and tRNAs, do not. However, all genes
are transcribed into RNAs, even if some RNAs are not
translated. If you knew the sequence of the RNA produced
from a gene, it would be easy to find that gene in genomic
DNA simply by looking for the DNA sequence comple-
mentary to the RNA. This approach in fact works well for
RNAs that can be purified in large amounts like rRNAs
With a computerized genome visualization tool, it (which can be isolated from other RNAs because they form
becomes possible to explore DNA sequence conservation part of the ribosome).
directly along the genome, as well as across evolutionary In contrast, most mRNAs are so relatively rare in cells
time. An example of cross-species homology analysis is that they cannot be purified readily. Moreover, although
shown in Fig. 10.3 for a 100 kb region containing four technologies for determining the nucleotide sequence of
genes. The bottom row of the figure displays the locations RNAs do exist, they are less widely available and much
and exon/intron structures of the four genes in the human more difficult to perform than the methods available for
genome. Above this row are homology maps for three sequencing DNA. As a result, the easiest way to study
representative vertebrate species; highly conserved DNA mRNAs is to copy them into DNA, to clone the resultant
sequences are indicated with dark lines or blocks. DNA molecules, and then to sequence these clones by the
As anticipated from the close relationship between hu- same methods already described for genomic DNA.
man and chimpanzee species, nearly complete conservation
of human sequences exists across the entire region in a chimp
genome. In other mammals, represented here by the mouse, Making cDNA libraries
conservation is also apparent across the entire region, but the To produce DNA clones from mRNA sequences, research-
pattern is choppy, indicating small regions of conservation ers rely on a series of in vitro reactions that mimics part of
interspersed with small, nonconserved regions. the life cycle of viruses known as retroviruses. Retrovi-
As we move farther across the phylogenetic landscape ruses, which include among their ranks the HIV virus that
to fish, we can distinguish sequences subject to evolution- causes AIDS, carry their genetic information in molecules
ary constraints more clearly from those that are not. Note in of RNA. As part of their gene-transmission kit, retroviruses
particular that large parts of the coding regions of three of also contain the unusual enzyme known as RNA-dependent
the four genes are highly conserved in all the species exam- DNA polymerase, or simply reverse transcriptase (review
ined (Fig. 10.3). This conservation suggests that the protein the Genetics and Society Box in Chapter 8 entitled HIV and
products of the three genes are crucial to the survival of all Reverse Transcription). After infecting a cell, a retrovirus
vertebrates. However, a homolog of the fourth gene is not uses reverse transcriptase to copy its single strand of RNA
found in zebrafish, indicating that its function is dispensable into a strand of complementary DNA, often abbreviated as
to fish. Regions of homology between the human and mouse cDNA. The reverse transcriptase, which can also function
or zebrafish genomes are much less frequent in introns, in as a DNA-dependent DNA polymerase, then makes a sec-
the noncoding parts of exons (corresponding to the 5′ and 3′ ond strand of DNA complementary to this first cDNA
UTRs of the genes), and in the spaces between genes. strand (and equivalent in sequence to the original RNA tem-
Sequence conservation over long evolutionary periods, plate). Finally, this double-stranded DNA copy of the retro-
such as the time since humans last shared a common ances- viral RNA chromosome integrates into the host cell’s
tor with mice or fish, therefore usually predicts the location genome. Although the designation cDNA originally meant a
of genes. However, exceptions do exist: Conserved DNA single strand of DNA complementary to an RNA molecule,
sequences can be observed rarely at locations outside of the it now refers to any DNA—single- or double-stranded—
coding regions. The fact that these features are so well con- derived from an RNA template.
served suggests strongly that they have a function that is Let’s see how you could use reverse transcriptase to
subject to evolutionary constraints—even if in most cases make cDNA copies of all the mRNAs that are transcribed
we do not yet know what these functions may be. Scientists in a particular cell type such as red blood cell precursors.
are actively exploring the potential roles of these conserved You would first isolate by simple chemical means the total
noncoding sequences; for example, some might represent population of RNA molecules in these cells (Fig. 10.4a).