Methods

Home

Methods

Contact

Supplementary Materials

Introduction

Sources of Data

For this study, protein sequences and control data of known variants for 7 genetic diseases were required.

References

Sources of Tools

For this study, 7 MSA tools, 2 MSA Benchmarks, 6 PON tools and 5 other tools were evaluated and used.

Alignments

In order to test which MSA gave the optimal , it was required to create every possible MSA size and combination of the sequences input. These were created using the modified CombinationGenerator program. Each MSA consisted of H. sapiens and n other sequences where n is greater than 1. This resulted in different MSAs to be run in conjunction with the PON tools.

PON Benchmark

The PON benchamrk consists of the Matthews Correlation Coeffecient (MCC) and the PM% (the percentage of predicted mutations). The MCC looks at the true predictions as well as the false predictions. The MCC returns a value bewteen -1 and +1 where +1 coeffecient means that 100% of the predictions are correct, 0 coeffecient means that 50% are correct and -1 coeffecient means that 0% are correct.

Automation

The majority of this study was automated using a Linux system, with Bash scripts, Java classes and Perl modules for the automating of the optimization of the parameters.

Results

Conclusion

Background

Discussion

Optimization of parameters for the assessment of Unclassified Disease Gene Sequence Variants