Move back to the EmblEntry format |
At the bottom is DNA sequence, which should look something like this:
>embl|AF457141|AF457141 Mus musculus Pax6 paired-less isoform mRNA, complete cds. ttaaactctgggcaggtcctcgcgtagaacccggttgtcagatctgctacttccccccga gaagcggctttgagaagtgtgggaaccagcgccaccagactcacctgacaccccagcctc ggctcacagatggctgccagcaacaggaaggagggggagagaacaccaactccatcagtt ctaacggagaagactcggatgaagctcagatgcgacttcagctgaagcggaagctgcaaa gaaatagaacatcttttacccaagagcagattgaggctctggagaaagagtttgagagga cccattatccagatgtgtttgcccgggaaagactagcagccaaaatagatctacctgaag caagaatacaggtatggttttctaatcgaagggccaaatggagaagagaagagaaactga ggaaccagagaagacaggccagcaacactcctagtcacattcctatcagcagcagcttca gtaccagtgtctaccagccaatcccacagcccaccacacctgtctcctccttcacatcag gttccatgttgggccgaacagacaccgccctcaccaacacgtacagtgctttgccaccca tgcccagcttcaccatggcaaacaacctgcctatgcaacccccagtccccagtcagacct cctcatactcgtgcatgctgcccaccagcccgtcagtgaatgggcggagttatgatacct acacccctccgcacatgcaaacacacatgaacagtcagcccatgggcacctcggggacca cttcaacaggactcatttcacctggagtgtcagttcccgtccaagttcccgggagtgaac ctgacatgtctcagtactggcctcgattacagtaaagagagaaggagagagcatgtgatc gagagaggaaattgtgttcactctgccaatgactatgtggacacagcagttgggtattca ggaaagaaagagaaatggcggt
FASTA format is a widely used method for sequence file formatting. It begins with a single-line description of the sequence, followed by lines of sequence data. The sequence data can be nucleic acid (DNA/RNA) or Amino Acid sequence (for a protein).
The description line is indicated by a greater-than (">") symbol in the first column. All lines of text are usually 80 characters or shorter terminated by a carriage return.
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes. The nucleic acid codes are:
A --> adenosine M --> A C (amino) C --> cytidine S --> G C (strong) G --> guanine W --> A T (weak) T --> thymidine B --> G T C U --> uridine D --> G A T R --> G A (purine) H --> A C T Y --> T C (pyrimidine) V --> G C A K --> G T (keto) N --> A G C T (any)
The accepted amino acid codes are:
A alanine P proline B aspartate or asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate or glutamine L leucine X any M methionine * translation stop N asparagine - gap of indeterminate length