Page 73 - Genetics_From_Genes_to_Genomes_6th_FULL_Part3
P. 73

11.1 Variation Among Genomes   367


                       Figure 11.4  Comparison of three personal genomes.   much as 1% in healthy people. For example, the genomes
                       Single nucleotide substitutions in the genomes of J. Craig Venter,   of Watson and Venter vary by small additions or subtrac-
                       James D. Watson, and an anonymous Chinese man (YH), all relative   tions of genetic material—insertions or deletions—at over
                       to the human RefSeq. A substitution is counted once whether the   100,000 genomic sites.
                       individual is homozygous or heterozygous for that variant. Numbers
                       of substitutions unique to each man’s genome are in nonoverlapping
                       portions of each circle. Variants not in the human RefSeq but   Most DNA Polymorphisms Do Not
                       shared by two of the three individuals are shown in the double
                       overlap regions. The central three-way overlap indicates variants   Influence Phenotype
                       shared by all three men.
                                              YH                           Some of the millions of DNA polymorphisms between the
                                                                           genomes of Watson and Venter must be responsible for the
                                                                           phenotypic differences that distinguish them as individu-
                                             978,370                       als.  But  in reality  only  a  small  fraction of  these  DNA
                                                                             sequence changes actually impacts phenotype. Only about
                                                                           5000 of the millions of differences between these two peo-
                                        435,493   509,175
                                                                           ple alter the amino acid sequences of proteins. This fact
                                                                           makes sense because:
                                             1,151,059
                                    924,333           1,096,873            (1)  less than 2% of the human genome consists of codons
                                             564,716                           within genes;
                                                                           (2)  even when they occur, many mutations of codons are
                                                                               silent (that is, they don’t change the amino acid); and
                                     Venter          Watson                (3)  if a particular mutation is not silent and has deleteri-
                                                                               ous effects, natural selection could often lead to its
                                                                               disappearance from the human population.
                       Extensive DNA Variation Distinguishes
                                                                               In addition to the approximately 5000 amino-acid-
                       Individuals Within a Species                        altering mutations, a few thousand other polymorphisms be-
                       The genomes of James Watson, co-discoverer of the DNA   tween these two genomes likely affect gene expression, for
                       double helix; J. Craig Venter, a pioneer of DNA sequenc-  example the frequency of transcription or the efficiency of
                       ing; and an anonymous Chinese man reveal in total more   primary transcript splicing to produce mRNA. But even after
                       than 5.6 million single nucleotide differences from the   accounting for these, we are left with the conclusion that the
                       standard human genome (the GenBank RefSeq; see      vast majority of sequence differences between genomes are
                         Chapter 10) (Fig. 11.4). Each man’s diploid genome con-  anonymous DNA polymorphisms affecting neither the na-
                       tains about 1 million unique DNA polymorphisms (that is,   ture nor the amounts of any protein in the body. (You will see
                       sequence differences) not shared by either of the other men,   later that nonanonymous DNA polymorphisms do affect
                       while the remaining approximately 2.6 million polymor-  gene expression, and thus can affect phenotype.)
                       phisms are shared in the genomes of two or in some cases   Figure 11.5 shows the actual distribution of polymor-
                       all three of these individuals.                     phisms that distinguish Watson and Venter from the human
                          Not only does no single wild-type human genome se-  RefSeq within a 400 kb genomic region. This part of the
                       quence exist, there is even no such thing as a wild-type   genome includes the cystic fibrosis transmembrane recep-
                       human genome length. Deletions, insertions, and duplica-  tor gene (CFTR), mutations in which cause cystic fibrosis,
                       tions of DNA result in genome lengths that differ by as   and two other genes. You can see that almost all of the


                       Figure 11.5  SNP distribution in a 400 kb region. This part of chromosome 7 (from base pairs 116,700,001 to 117,100,000)
                       contains CFTR and two other genes. Vertical marks indicate locations at which a genome is either heterozygous or homozygous for a single
                       nucleotide polymorphism (SNP) different from the human RefSeq. Two rows show SNPs that were read from the personal genomes of Watson
                       and Venter. The third track compiles all SNPs from all human genomes analyzed that were deposited in the central SNP database as of 2009.
   68   69   70   71   72   73   74   75   76   77   78