Page 60 - Genetics_From_Genes_to_Genomes_6th_FULL_Part3
P. 60

354    Chapter 10   Genome Annotation


              Genome Sequence Studies Affirm                       available to researchers is staggering. Scientists must there­
              Evolution from a Common Ancestor                     fore rely on computers to store and help interpret this vast
                                                                   supply of information. The digital language used by com­
              Comparisons of complete genomic sequences from       puters for information storage and processing is ideally
              nearly 10,000 species to date have resoundingly sup­  suited to handle the digital A, C, G, T code that exists natu­
              ported the ideas that began with Darwin and Mendel: All   rally in genomes. These four values can be represented in
              living  organisms have similar genetic components for ac­  two digits of binary code (00, 01, 10, and 11).
              complishing basic cellular processes.  This conclusion   Keeping pace with the 1980s revolution in biological
              strongly supports the idea that we and other living organ­  data generation fostered by the advent of automated DNA
              isms are all descendants of a single, fortuitous life­pro­  sequencing, a parallel revolution was occurring in informa­
              ducing bio chemistry. The similarity of basic genetic   tion technology. The Internet came into existence along
              components also  affirms that the analysis of appropriate   with personal computers that were linked together to estab­
              biological systems in model organisms can provide fun­  lish rapid transmission of electronic data from one lab to
              damental insights into how the corresponding systems   another. It was a straightforward task to channel the output
              function in humans.                                  of DNA sequencer machines directly into electronic stor­
                                                                   age media, from which sequences were available for analy­
                                                                   sis and transmission to other scientists.
                essential concepts                                     The GenBank database, established by the National
                •  Even the most complex genomes have surprisingly few   Institutes of Health in 1982, still serves as the most widely
                  genes (about 27,000 in the human genome).        used online repository of sequence data. The information is
                •  Gene density varies considerably within a genome,   generated in molecular biology laboratories around the
                  reflecting differences in intron size and in the spacing   world, which deposit their sequences into GenBank
                  between genes.                                     electronically. From its establishment, the GenBank data­
                •  In most regions of the genome, the orientations of   base has doubled in size roughly every 18 months, so that
                  individual genes and thus the direction of gene   by 2016 it contained more than 300 billion annotated nu­
                  transcription appears to be chosen at random.    cleotides of sequence information. One of the great powers
                •  New genes can arise during evolution through: (i) exon   of GenBank is that anyone in the world with an Internet
                  shuffling, which can alter the domain structure of proteins;    connection can access this incredible storehouse of infor­
                  (ii) duplication and divergence that generates gene families;   mation easily.
                  and (iii) de novo mutations in intergenic DNA sequences.
                •  Combinatorial strategies at the DNA level and RNA level,
                  as well as posttranslational modifications of proteins,   Bioinformatics Provides Tools for
                  allow the production of highly diversified gene products   Visualizing and Analyzing Genomes
                  even from a single gene.
                •  Genome comparisons affirm that all present life   Bioinformatics  is  the science of  using  computational
                  descended from a single common ancestor.         methods—specialized software—to decipher the biolog­
                                                                   ical meaning of information contained within organismal
                                                                   systems. This section provides some examples of bio­
                                                                   informatics tools that can be accessed through any web
               10.3   Bioinformatics: Information                  browser to examine and interpret publicly available
              Technology and Genomes                                 genome data.


                                                                   The species RefSeq
                learning objectives
                                                                   Comparisons of experimental data involving DNA se­
                1.  Explain the relevance of a species RefSeq to   quences generated by different laboratories depend on the
                   bioinformatic studies.                          use of a universally agreed­upon standard for analysis.
                2.  Describe the uses of BLAST searches in comparative   This role is played by a species reference sequence, abbre­
                   genomics.                                       viated as RefSeq. A RefSeq is a single, complete, anno­
                                                                   tated version of the species genome. RefSeqs are
                                                                   maintained by the National Center for Biotechnology In­
              At the time of this writing in 2016, the genomes of more   formation (NCBI: http://www.ncbi.nlm.nih.gov), which
              than 8000 species including our own have already been   was established in 1988 to oversee GenBank and other
              characterized by DNA sequence analysis, and this number   public databases of biological information and to develop
              is increasing continually. The amount of sequence data   bioinformatic applications for analyzing, systematizing,
   55   56   57   58   59   60   61   62   63   64   65