Page 48 - Genetics_From_Genes_to_Genomes_6th_FULL_Part3
P. 48
342 Chapter 10 Genome Annotation
10.1 Finding the Genes in Genomes frame that continues without a stop for significantly more
than 21 triplets, there is a good chance that the DNA in this
region is not a random set of nucleotides, but instead actually
learning objectives encodes amino acids within a protein (Fig. 10.1).
This method is useful but far from foolproof. Genomes
1. Explain why a long open reading frame suggests the are so large that regions that do not correspond to genes
existence of a protein-coding exon. might rarely contain a long ORF by chance. On the other
2. Describe how scientists predict the location of genes by hand, because many genes in higher eukaryotes are inter-
identifying sequences conserved in the genomes of rupted by introns, some protein-coding exons are so small
widely divergent species. that they would not be identified as ORFs unless other in-
3. Discuss the use of reverse transcriptase in the formation was available.
construction of a cDNA library. One type of additional information that could poten-
4. Compare the information that can be obtained from tially aid computer programs in identifying genes is the fact
genomic and cDNA libraries. that the splice acceptor and splice donor sites at intron/exon
boundaries are composed of characteristic consensus se-
quences (review Fig. 8.15). Genome analysis programs can
Genes are the key functional elements of genomes. In this thus connect potential exons together and see if a long ORF
section, we focus on methods to locate genes within ge- suggestive of a gene would result.
nomic DNA sequences. You will see that information use-
ful for the annotation of the genes within the human genome
can be found in the sequence of the genome itself, the se- Whole-Genome Comparisons
quences of the genomes of species other than humans, and Distinguish Genomic Elements
from the characterization of RNA molecules in human Conserved by Natural Selection
cells. These methods have successfully located and charac-
terized more than 27,000 genes in the human genome, but The whole-genome shotgun approach to the sequencing of
in spite of all of these efforts, the task is still incomplete; genomes described in Chapter 9 has been so successful that
some genes undoubtedly remain to be found. scientists have already deciphered the genomes of thou-
sands of different species. Researchers can exploit this tre-
mendous amount of information to look for regions of
Open Reading Frames (ORFs) Help DNA that are similar in diverse organisms. Such regions
Locate Protein-Coding Genes usually, though not always, correspond to genes.
The justification for comparing genomes goes all the
One way to look specifically for regions that might corre- way back to Charles Darwin. Nearly a century before the
spond to the exons of protein-coding genes is to scan DNA double helix was discovered, he proposed the evolu-
genomic DNA sequences for long open reading frames tion of species from now-extinct ancestors by a process of
(ORFs); that is, stretches of nucleotides that have a reading descent with modification. We now know that the actual
frame of triplets uninterrupted by a stop codon. As you re- entity undergoing descent with modification is the DNA
member from Chapter 8’s discussion of the genetic code, sequence that defines an organism’s genome. The modifi-
3
the four nucleotides can be arranged into 4 = 64 possible cations are random mutations that occur in DNA. Natural
triplets, of which three (TAA, TAG, and TGA written as selection is the process whereby mutations that confer an
DNA) signify stop. Thus, as a very rough estimate, if you advantage to the individuals carrying them will spread
looked at any random sequence of DNA starting at any one throughout a population, while deleterious mutations will
nucleotide, you would on average run into a stop codon after disappear. The challenge is to trace such molecular evolu-
about 64/3 ≈ 21 triplets. If that nucleotide begins a reading tion at the DNA level.
Figure 10.1 Open reading frames (ORFs). Any sequence of DNA can be read in any of six different reading frames (three from one
strand, three from the other strand). Reading frames uninterrupted by stop codons (red) are ORFs. A long ORF suggests that the region may
be part of a protein-coding exon. In this example, only one reading frame (Frame 5) is open.
Frame 1 5' . . .CCG ATG CTG AAT AGC GTA GAG GTT AGG TAA TCA TCA. . . 3'
Frame 2 5' . . . CGA TGC TGA ATA GCG TAG AGG TTA GGT AAT CAT CA. . . 3'
Frame 3 5' . . . GAT GCT GAA TAG CGT AGA GGT TAG GTA ATC ATC A. . . 3'
3' . . .GGC TAC GAC TTA TCG CAT CTC CAA TCC ATT AGT AGT . . . 5' Frame 4
3' . . .GG CTA CGA CTT ATC GCA TCT CCA ATC CAT TAG TAG . . . 5' Frame 5
3' . . .G GCT ACG ACT TAT CGC ATC TCC AAT CCA TTA GTA . . . 5' Frame 6