Using Entrez to retrieve nucleotide and protein sequences
In this exercise we will retrieve 
  a DNA sequence from the genBank database using the Entrez tool. Entrez is a 
  search and retrieval system that integrates information from databases at NCBI 
  (more info). 
In this example we will look 
  for a gene that encodes the glutamate racemase enzyme (Glutamate racemase converts 
  L glutamate to D glutamate) from the Gram positive microorganism Bacillus 
  subtilis.  We will assume that we don’t have any more information about 
  the gene we are looking for, even its name. 
(In some cases we may know 
  a lot more about the gene including its name and accession number - this makes 
  the search a lot easier)
Bring up the Entrez tool by 
  clicking here. 
  
  - Step 
    1: Set the Search menu entry to nucleotide to indicate that we want to 
    search for a nucleotide (show 
    me).
 
  - Step 
    2: Firstly, we will pull out any entries containing ‘glutamate racemase’. 
    Type in glutamate racemase into the text box after the ‘for’ label (show 
    me).
 
  - Step 
    3: Click on the Go button and you will retrieve all the DNA sequence entries 
    containing ‘glutamate racemase’ (show 
    me).
 
  - Step 
    4: Now we will search for DNA sequences from Bacillus subtilis. 
    Type Bacillus subtilis into the text box after the ‘for’ label.  However, 
    we will now limit out search to just the organism name (rather than all the 
    text in each entry) to cut down the number of hits. Click on the ‘limits’ 
    button below the text box and set the ‘Limited to’ drop down menu to Organism. 
    (show 
    me). 
 
  - Step 
    5: Click on the go button and you will be presented with a large list 
    of DNA sequences from Bacillus subtilis.  Make sure that the limits 
    box is not ticked (you’ll need to untick it) and click on the history button 
    below the text box (show 
    me)
 
  - Step 
    6: We will now combine our two searches to give all the database entries 
    that contain the text glutamate racemase from the organism Bacillus subtilis. 
    The first search we did has been named #1 (glutamate racemase) and the second 
    (Bacillus subtilis), #2. To combine these two searches type in ‘#1 
    AND #2’ in to the text field box and press ‘GO’ (show 
    me)
 
  - Step 
    7: You will now have a narrowed down list (4) of database entries from 
    Bacillus subtilis that contain the gene sequence of glutamate racemase. 
    Entries 1, 3 and 4 are large, multi-gene segments. Pick entry number two, 
    by clicking on the underlined blue text, AB003685. (show 
    me)
 
  - Step 
    8: The database entry for glutamate racemase from B. subtilis will 
    now be displayed. The default format for the display of sequence is the most 
    easily to read but most web based tool require sequence data in FASTA format 
    (What 
    is FastA format?). To display the sequence in FASTA format change the 
    drop down list after the VIEW button to display FASTA and press the VIEW button 
    (show 
    me). 
 
  - Step 
    9: The sequence is now in FASTA format (show 
    me). Highlight the sequence (staring at the >) and copy it into a new 
    Word document. We will need this for use with the other tools.
 
There are many powerful ways 
  to retrieve sequences using Entrez – we’ve only scratched the surface. Feel 
  free to explore. Papers and protein sequences can be retrieved in much the same 
  fashion.
Back 
  to section 1