Using Entrez to retrieve nucleotide and protein sequences
In this exercise we will retrieve
a DNA sequence from the genBank database using the Entrez tool. Entrez is a
search and retrieval system that integrates information from databases at NCBI
(more info).
In this example we will look
for a gene that encodes the glutamate racemase enzyme (Glutamate racemase converts
L glutamate to D glutamate) from the Gram positive microorganism Bacillus
subtilis. We will assume that we don’t have any more information about
the gene we are looking for, even its name.
(In some cases we may know
a lot more about the gene including its name and accession number - this makes
the search a lot easier)
Bring up the Entrez tool by
clicking here.
- Step
1: Set the Search menu entry to nucleotide to indicate that we want to
search for a nucleotide (show
me).
- Step
2: Firstly, we will pull out any entries containing ‘glutamate racemase’.
Type in glutamate racemase into the text box after the ‘for’ label (show
me).
- Step
3: Click on the Go button and you will retrieve all the DNA sequence entries
containing ‘glutamate racemase’ (show
me).
- Step
4: Now we will search for DNA sequences from Bacillus subtilis.
Type Bacillus subtilis into the text box after the ‘for’ label. However,
we will now limit out search to just the organism name (rather than all the
text in each entry) to cut down the number of hits. Click on the ‘limits’
button below the text box and set the ‘Limited to’ drop down menu to Organism.
(show
me).
- Step
5: Click on the go button and you will be presented with a large list
of DNA sequences from Bacillus subtilis. Make sure that the limits
box is not ticked (you’ll need to untick it) and click on the history button
below the text box (show
me)
- Step
6: We will now combine our two searches to give all the database entries
that contain the text glutamate racemase from the organism Bacillus subtilis.
The first search we did has been named #1 (glutamate racemase) and the second
(Bacillus subtilis), #2. To combine these two searches type in ‘#1
AND #2’ in to the text field box and press ‘GO’ (show
me)
- Step
7: You will now have a narrowed down list (4) of database entries from
Bacillus subtilis that contain the gene sequence of glutamate racemase.
Entries 1, 3 and 4 are large, multi-gene segments. Pick entry number two,
by clicking on the underlined blue text, AB003685. (show
me)
- Step
8: The database entry for glutamate racemase from B. subtilis will
now be displayed. The default format for the display of sequence is the most
easily to read but most web based tool require sequence data in FASTA format
(What
is FastA format?). To display the sequence in FASTA format change the
drop down list after the VIEW button to display FASTA and press the VIEW button
(show
me).
- Step
9: The sequence is now in FASTA format (show
me). Highlight the sequence (staring at the >) and copy it into a new
Word document. We will need this for use with the other tools.
There are many powerful ways
to retrieve sequences using Entrez – we’ve only scratched the surface. Feel
free to explore. Papers and protein sequences can be retrieved in much the same
fashion.
Back
to section 1