captau2006002.jpg
Home
Methods
Methods
Contact
Supplementary Materials
Introduction
Sources of Data
For this study, protein sequences and control data of known variants for 7 genetic diseases were required.
References
Sources of Tools
For this study, 7 MSA tools, 2 MSA Benchmarks, 6 PON tools and 5 other tools were evaluated and used.
Alignments
In order to test which MSA gave the optimal , it was required to create every possible MSA size and combination of the sequences input. These were created using the modified CombinationGenerator program. Each MSA consisted of H. sapiens and n other sequences where n is greater than 1. This resulted in different MSAs to be run in conjunction with the PON tools.
PON Benchmark
The PON benchamrk consists of the Matthews Correlation Coeffecient (MCC) and the PM% (the percentage of predicted mutations). The MCC looks at the true predictions as well as the false predictions. The MCC returns a value bewteen -1 and +1 where +1 coeffecient means that 100% of the predictions are correct, 0 coeffecient means that 50% are correct and -1 coeffecient means that 0% are correct.
Automation
The majority of this study was automated using a Linux system, with Bash scripts, Java classes and Perl modules for the automating of the optimization of the parameters.
Results
Conclusion
captau2006001.gif
Copyright (C) 2011 Jennifer D. Warrender, Newcastle Univeristy
Background
Next Page
Discussion
Optimization of parameters for the assessment of Unclassified Disease Gene Sequence Variants