Page 60 - Genetics_From_Genes_to_Genomes_6th_FULL_Part3
P. 60
354 Chapter 10 Genome Annotation
Genome Sequence Studies Affirm available to researchers is staggering. Scientists must there
Evolution from a Common Ancestor fore rely on computers to store and help interpret this vast
supply of information. The digital language used by com
Comparisons of complete genomic sequences from puters for information storage and processing is ideally
nearly 10,000 species to date have resoundingly sup suited to handle the digital A, C, G, T code that exists natu
ported the ideas that began with Darwin and Mendel: All rally in genomes. These four values can be represented in
living organisms have similar genetic components for ac two digits of binary code (00, 01, 10, and 11).
complishing basic cellular processes. This conclusion Keeping pace with the 1980s revolution in biological
strongly supports the idea that we and other living organ data generation fostered by the advent of automated DNA
isms are all descendants of a single, fortuitous lifepro sequencing, a parallel revolution was occurring in informa
ducing bio chemistry. The similarity of basic genetic tion technology. The Internet came into existence along
components also affirms that the analysis of appropriate with personal computers that were linked together to estab
biological systems in model organisms can provide fun lish rapid transmission of electronic data from one lab to
damental insights into how the corresponding systems another. It was a straightforward task to channel the output
function in humans. of DNA sequencer machines directly into electronic stor
age media, from which sequences were available for analy
sis and transmission to other scientists.
essential concepts The GenBank database, established by the National
• Even the most complex genomes have surprisingly few Institutes of Health in 1982, still serves as the most widely
genes (about 27,000 in the human genome). used online repository of sequence data. The information is
• Gene density varies considerably within a genome, generated in molecular biology laboratories around the
reflecting differences in intron size and in the spacing world, which deposit their sequences into GenBank
between genes. electronically. From its establishment, the GenBank data
• In most regions of the genome, the orientations of base has doubled in size roughly every 18 months, so that
individual genes and thus the direction of gene by 2016 it contained more than 300 billion annotated nu
transcription appears to be chosen at random. cleotides of sequence information. One of the great powers
• New genes can arise during evolution through: (i) exon of GenBank is that anyone in the world with an Internet
shuffling, which can alter the domain structure of proteins; connection can access this incredible storehouse of infor
(ii) duplication and divergence that generates gene families; mation easily.
and (iii) de novo mutations in intergenic DNA sequences.
• Combinatorial strategies at the DNA level and RNA level,
as well as posttranslational modifications of proteins, Bioinformatics Provides Tools for
allow the production of highly diversified gene products Visualizing and Analyzing Genomes
even from a single gene.
• Genome comparisons affirm that all present life Bioinformatics is the science of using computational
descended from a single common ancestor. methods—specialized software—to decipher the biolog
ical meaning of information contained within organismal
systems. This section provides some examples of bio
informatics tools that can be accessed through any web
10.3 Bioinformatics: Information browser to examine and interpret publicly available
Technology and Genomes genome data.
The species RefSeq
learning objectives
Comparisons of experimental data involving DNA se
1. Explain the relevance of a species RefSeq to quences generated by different laboratories depend on the
bioinformatic studies. use of a universally agreedupon standard for analysis.
2. Describe the uses of BLAST searches in comparative This role is played by a species reference sequence, abbre
genomics. viated as RefSeq. A RefSeq is a single, complete, anno
tated version of the species genome. RefSeqs are
maintained by the National Center for Biotechnology In
At the time of this writing in 2016, the genomes of more formation (NCBI: http://www.ncbi.nlm.nih.gov), which
than 8000 species including our own have already been was established in 1988 to oversee GenBank and other
characterized by DNA sequence analysis, and this number public databases of biological information and to develop
is increasing continually. The amount of sequence data bioinformatic applications for analyzing, systematizing,