Page 61 - Genetics_From_Genes_to_Genomes_6th_FULL_Part3
P. 61
10.4 A Comprehensive Example: The Hemoglobin Genes 355
and disseminating the data. A RefSeq need not be derived Query is the sequence you already know; here, the amino
from a single individual, and it need not contain the most acid sequence of the Drosophila protein written in the one-
common genetic variants found in species members. letter code. The Subject is the homologous sequence found
Rather, it is simply an arbitrary, but well-characterized, by the BLAST program; in this case, the related human
example against which all newly obtained sequences from protein. The row between the Query and the Subject indi-
that species can be compared. cates the conserved amino acids, with a + symbol denoting
conservative amino acid replacements (missense substitu-
tions in which an amino acid is replaced by a different
Visualizing genes and genomes amino acid with similar chemical properties).
Several web-based programs have been developed that To appreciate the power of bioinformatics programs
allow a user to examine visual representations of genome such as the Genome Browser and BLAST search tool, you
data. One such program is the UCSC Genome Browser really need to access and use them yourself. Problems 23
(https://genome.ucsc.edu/) that visualizes RefSeq genes and 24 at the end of this chapter involve some simple exer-
and their associated annotations, showing features such as cises that will place a few of these vast genomic databases
exon/intron structure and the location of protein-coding at your disposal.
regions. Fig. 10.3 showed an example of the Genome
Browser output, focusing on a 100 kb region of the human
genome containing four genes. The transcription units are essential concepts
indicated at the bottom of the figure with large blue arrows
that depict the extent of the gene, the direction of tran- • Bioinformatics applications that are freely accessible
scription, and each gene’s exon/intron structure (exons online provide gateways for the exploration of genomic
represented as wider than the introns). Researchers can data.
adjust their view of the browser to show many additional • Genome browsers show the arrangement and structure of
genomic features of interest, such as alternative splice genes within RefSeq genomes.
variants, the location of repetitive DNA sequences, simi- • A BLAST search allows rapid, automated matching of
larities with the genomes of other organisms, and the loca- particular DNA or amino acid sequences across multiple
tion of possible transcriptional regulatory elements. species for analysis of evolutionary relationships.
BLAST Searches Automate the Finding
of Homologous Sequences
10.4 A Comprehensive Example:
Suppose that you have identified a gene, for example from The Hemoglobin Genes
the fruit fly Drosophila, that is of interest to you. You
would like to know whether the human genome contains a
homolog of this fly gene. One tool you could use is an learning objectives
NCBI program called BLAST (Basic Local Alignment
Search Tool), which allows you to find nucleotide or amino 1. Discuss why it is advantageous for humans to produce
acid sequences related to any given nucleotide or amino different hemoglobins at different stages of
acid sequence. Figure 10.18 displays a typical output of a development.
BLAST search, in this case looking for human proteins that 2. Explain how the clustering of hemoglobin genes
share similarity with a Drosophila protein of interest. The impacts the cellular strategy to regulate their
expression.
3. Predict the phenotypic severity of particular mutations
Figure 10.18 Output from a BLAST search. The program in the α and β clusters.
was asked to find a human protein related to a protein in
Drosophila. The Query shows part of the sequence of the fly
protein (from amino acids 688–720); the Subject (Sbjct) indicates
the corresponding amino acids in the human protein found by the The vivid red color of our blood arises from its life-sustaining
search. Some of these amino acids are identical in the fly and ability to carry oxygen. This ability, in turn, derives from
human proteins. Positions marked with a plus (+) are conservative
substitutions in which the substituted amino acids have similar billions of red blood cells, each one packed with close to
chemical properties. At some positions the amino acids are very 280 million molecules of the protein pigment known as
different, suggesting that the identities of these particular amino hemoglobin (Fig. 10.19a).
acids are not crucial to protein function. A normal adult hemoglobin molecule consists of four
Query 688 GPLTASYK S EID KH LIRA LFQ TDDW R AAIK T QI 720 polypeptide chains—two alpha (α) and two beta (β) globins—
GPL A++ S E+K LIRA LFQ T++ R A A+ +I
Sbjct 583 GPLAAAFS S EVS KA LIRA LFQ TNER R AA AL AKI 615 each surrounding an iron-containing small molecular structure