| Move back to the EmblEntry format |
At the bottom is DNA sequence, which should look something like this:
>embl|AF457141|AF457141 Mus musculus Pax6 paired-less isoform mRNA, complete cds. ttaaactctgggcaggtcctcgcgtagaacccggttgtcagatctgctacttccccccga gaagcggctttgagaagtgtgggaaccagcgccaccagactcacctgacaccccagcctc ggctcacagatggctgccagcaacaggaaggagggggagagaacaccaactccatcagtt ctaacggagaagactcggatgaagctcagatgcgacttcagctgaagcggaagctgcaaa gaaatagaacatcttttacccaagagcagattgaggctctggagaaagagtttgagagga cccattatccagatgtgtttgcccgggaaagactagcagccaaaatagatctacctgaag caagaatacaggtatggttttctaatcgaagggccaaatggagaagagaagagaaactga ggaaccagagaagacaggccagcaacactcctagtcacattcctatcagcagcagcttca gtaccagtgtctaccagccaatcccacagcccaccacacctgtctcctccttcacatcag gttccatgttgggccgaacagacaccgccctcaccaacacgtacagtgctttgccaccca tgcccagcttcaccatggcaaacaacctgcctatgcaacccccagtccccagtcagacct cctcatactcgtgcatgctgcccaccagcccgtcagtgaatgggcggagttatgatacct acacccctccgcacatgcaaacacacatgaacagtcagcccatgggcacctcggggacca cttcaacaggactcatttcacctggagtgtcagttcccgtccaagttcccgggagtgaac ctgacatgtctcagtactggcctcgattacagtaaagagagaaggagagagcatgtgatc gagagaggaaattgtgttcactctgccaatgactatgtggacacagcagttgggtattca ggaaagaaagagaaatggcggt
FASTA format is a widely used method for sequence file formatting. It begins with a single-line description of the sequence, followed by lines of sequence data. The sequence data can be nucleic acid (DNA/RNA) or Amino Acid sequence (for a protein).
The description line is indicated by a greater-than (">") symbol in the first column. All lines of text are usually 80 characters or shorter terminated by a carriage return.
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes. The nucleic acid codes are:
A --> adenosine M --> A C (amino)
C --> cytidine S --> G C (strong)
G --> guanine W --> A T (weak)
T --> thymidine B --> G T C
U --> uridine D --> G A T
R --> G A (purine) H --> A C T
Y --> T C (pyrimidine) V --> G C A
K --> G T (keto) N --> A G C T (any)
The accepted amino acid codes are:
A alanine P proline
B aspartate or asparagine Q glutamine
C cystine R arginine
D aspartate S serine
E glutamate T threonine
F phenylalanine U selenocysteine
G glycine V valine
H histidine W tryptophan
I isoleucine Y tyrosine
K lysine Z glutamate or glutamine
L leucine X any
M methionine * translation stop
N asparagine - gap of indeterminate length