Page 52 - Genetics_From_Genes_to_Genomes_6th_FULL_Part3
P. 52

346    Chapter 10   Genome Annotation


              should also understand that the cDNA library made from red   transcribed regions (genes). The idea is very simple: You
              blood cell precursors would contain many clones correspond-  would determine the sequences of many cDNA clones, and
              ing to mRNAs that are highly expressed in this tissue, but   then compare these cDNA sequences with that of the ge-
              only a few clones that reflect genes that are expressed rarely.  nome. Regions of identity between cDNA and genome repre-
                                                                   sent the exons of genes, and the sequence of a complete
              Genomic versus cDNA libraries                        cDNA (copied from a full-length mRNA) allows you to de-
              Figure 10.5 compares genomic and cDNA libraries. The   termine the exon/intron structure of the corresponding gene.
              clones within genomic libraries represent all regions of   Although the basic idea of annotating genomes by com-
              DNA equally and show what the intact genome looks like   paring cDNA and genomic sequences is straightforward,
              in the region of each clone. The clones in cDNA libraries   putting it into practice on a large scale is not trivial. Because
              reveal which parts of the genome contain the information   certain genes are expressed only rarely or only in certain tis-
              used in making proteins in specific tissues. The prevalence   sue types, genomic scientists need to sequence millions of
              of the mRNAs for specific genes also gives some indica-  cDNA clones in multiple cDNA libraries made from mRNA
              tion, though imperfect, of the relative amounts of the vari-  derived from many diverse types of tissue. For this reason, it
              ous proteins made in those cells.                    is likely that some infrequently  expressed genes may not yet
                  As described previously, one of the main purposes of   be recognized as genes in genome  databases.
              making cDNA libraries is to annotate genomes by finding
                                                                   cDNAs and alternative splicing
              Figure 10.5  A comparison of genomic and cDNA        Alternative splicing presents an additional challenge for
              libraries. Every tissue in a multicellular organism can generate   genome annotation, one that is particularly important for
              the same genomic library, and the DNA fragments in that library   predicting the amino acid sequences of the proteome—
              collectively carry all the DNA of the genome. On average, the   that is, all the proteins made in an organism. The problem
              clones of a genomic library represent every locus an equal   is that a single primary transcript can be spliced in a variety
              number of times. By contrast, different tissues in a multicellular
              organism generate different cDNA libraries. Clones of a cDNA   of different ways, some of which can result in different
              library represent only the fraction of the genome that is   proteins being made by a single gene (review Fig. 8.17).
              transcribed in that tissue. The frequency with which particular    The sequencing of many individual cDNA clones pro-
              fragments appear in a cDNA library is proportional to the level of   vides a solution to this issue caused by alternative splicing
              the corresponding mRNA in that tissue.               because each cDNA clone represents an individual mature
              Random 100 kb genomic region                         mRNA. Analysis of these cDNAs is aided by the fact that
              kb 0    12    24   36    48   60    72   84    96    alternative splicing of a primary transcript often occurs in a
                   Intron     Exon                                 cell type–specific manner, allowing different kinds of cells
                                                                   to generate different (though related) proteins. This fact pro-
                        Gene A           Gene B      Gene C        vides another reason why geneticists need to sequence
                      expressed only   expressed in  expressed only  cDNAs from libraries made using mRNAs from a variety of
                        in brain        all tissues   in liver     different tissues. The cDNA sequences will reveal which
              Clones from a genomic library with 20 kb inserts that come from    exons appear in the processed mRNAs in particular cell
              this region                                          types, and thus will predict the amino acid sequences of the
                                   Contains part of gene A         proteins present in those tissues (Fig. 10.6).
                                   Contains parts of genes B and C
                                   Contains all of gene C          Figure 10.6  Alternative splicing complicates human
                                                                   genome annotation. Exons (orange) and introns (red) in the
                                   Contains only last exon of gene A  primary transcript can be alternatively spliced, often in cell type-
                                                                   specific ways; as a result, the same gene can express different
               Clones from cDNA libraries                          proteins. Researchers analyze alternative splicing by sequencing
                                                                   multiple cDNA clones from libraries made from each of many
                     Brain cDNA library   Liver cDNA library       different tissues.
                      B        A : B = 1:9  B       B :C = 4:7                       ATG              TAG
                      B                   C                                 Gene      1  Intron 2  Intron 3  Intron 4
                      A                   B         B
                      B                   C         C              Primary transcript  1  Intron 2  Intron 3  Intron 4
                      B                   C                                       AUG UAG   AUG UAG AUG UAG  AUG UAG
                      B                   C                               mRNAs     12 34     13 4     12 4    1 4
                      B                   C
                      B                   B                                         Skin      Eyes    Heart   Kidney
                      B                   C                               Proteins
                      B
   47   48   49   50   51   52   53   54   55   56   57