Top: Index Previous: Properties Up: Index

CSC8303 -- Bioinformatics Programming in Java

This is the sole assignment for the CSC8303 course. Do not worry if you are unsure of the techniques needed, they will become clear during the lectures and practicals.

You have been asked, as a part of a larger bioinformatics project, to design a package that parses EMBL files. As your work will be utilised by other members of the lab it important you test your API thoroughly. You should know how to find an EMBL file; please ask if you are not sure. It is important that you comment your code and provide suitable documentation (5 marks). Finally you must follow a suitable style for your code, including appropriate variable names, indenting and spacing, up to 25% of available marks will be awarded for clearly written code, conforming to standard coding conventions.

You are required to carry out all of steps 1-3 and one part of step 4

EMBL files adhere to a clear, well defined structure, they have two character tags at the start of a line, followed by three spaces, then a value. For example, the first line of the file is the ID line, and includes the entry ID as well as the sequence length.

  1. Develop an API for representing the information in an EMBL file. This must include information on the EMBL ID, the species information as a list of taxon classifications, and the DNA sequence. In terms of Java classes, you need to produce at least EMBL.java and Sequence.java, with EMBL classes having a Sequence object as a field. (8 marks)
  2. Develop a class with a main method that takes the path of an EMBL file as a command line parameter. This class should use the file path passed to the main method to create a Scanner object, and then parses the EMBL file into the classes you have defined in 1. (4 marks)
  3. The Sequence class should implement the java.lang.CharSequence interface. It should store the sequence as a List of java.lang.Character objects. In particular, charAt(int) should extract the appropriate Character from the list and return the equivalent char (6 marks)
  4. Finally carry out one of the following: (6 marks)
       OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Gorilla.
       OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo.

hence, the list "Eukaryota" "Metazoa" "Chordata" "Craniata" "Vertebrata" "Euteleostomi" "Mammalia" "Eutheria" "Euarchontoglires" "Primates" "Catarrhini" "Hominidae" should be returned

Files should be uploaded as Java source files. Any files that you wish to support the code should be .txt files.


Top: Index Previous: Properties Up: Index