Matching entries: 0
settings...
AuthorTitleYearJournal/ProceedingsReftypeDOI/URL
Amir, E.-A.D., Kalisman, N. and Keasar, C. Differentiable, multi-dimensional, knowledge-based energy terms for torsion angle probabilities and propensities 2008 Proteins: Structure, Function, and Bioinformatics
Vol. 72(1), pp. 62-73 
article DOI  
Abstract: Rotatable torsion angles are the major degrees of freedom in proteins. Adjacent angles are highly correlated and energy terms that rely on these correlations are intensively used in molecular modeling. However, the utility of torsion based terms is not yet fully exploited. Many of these terms do not capture the full scale of the correlations. Other terms, which rely on lookup tables, cannot be used in the context of force-driven algorithms because they are not fully differentiable. This study aims to extend the usability of torsion terms by presenting a set of high-dimensional and fully-differentiable energy terms that are derived from high-resolution structures. The set includes terms that describe backbone conformational probabilities and propensities, side-chain rotamer probabilities, and an elaborate term that couples all the torsion angles within the same residue. The terms are constructed by cubic spline interpolation with periodic boundary conditions that enable full differentiability and high computational efficiency. We show that the spline implementation does not compromise the accuracy of the original database statistics. We further show that the side-chain relevant terms are compatible with established rotamer probabilities. Despite their very local characteristics, the new terms are often able to identify native and native-like structures within decoy sets. Finally, force-based minimization of NMR structures with the new terms improves their torsion angle statistics with minor structural distortion (0.5 A RMSD on average). The new terms are freely available in the MESHI molecular modeling package. The spline coefficients are also available as a documented MATLAB file.
BibTeX:
@article{Amir2008,
  author = {Amir, El-Ad David and Kalisman, Nir and Keasar, Chen},
  title = {Differentiable, multi-dimensional, knowledge-based energy terms for torsion angle probabilities and propensities},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2008},
  volume = {72},
  number = {1},
  pages = {62--73},
  note = {continues torsion angles estimation, Ramachadran plot + AA propensities, density function + spline interpolation},
  doi = {http://dx.doi.org/10.1002/prot.21896}
}
Andreeva, A., Howorth, D., Chandonia, J.-M., Brenner, S.E., Hubbard, T.J.P., Chothia, C. and Murzin, A.G. Data growth and its impact on the SCOP database: new developments 2008 Nucl. Acids Res.
Vol. 36(suppl1), pp. D419-D425 
article DOI  
Abstract: The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.
BibTeX:
@article{Andreeva2008,
  author = {Andreeva, Antonina and Howorth, Dave and Chandonia, John-Marc and Brenner, Steven E. and Hubbard, Tim J. P. and Chothia, Cyrus and Murzin, Alexey G.},
  title = {Data growth and its impact on the SCOP database: new developments},
  journal = {Nucl. Acids Res.},
  year = {2008},
  volume = {36},
  number = {suppl1},
  pages = {D419--D425},
  doi = {http://dx.doi.org/10.1093/nar/gkm993}
}
Anfinsen, C. Principles that Govern the Folding of Protein Chains 1973 Science
Vol. 181(4096), pp. 223-30 
article DOI  
Abstract: Anfinsen's refolding experiment.
BibTeX:
@article{Anfinsen1973,
  author = {Anfinsen, Christian},
  title = {Principles that Govern the Folding of Protein Chains},
  journal = {Science},
  year = {1973},
  volume = {181},
  number = {4096},
  pages = {223--30},
  doi = {http://dx.doi.org/10.1126/science.181.4096.223}
}
Angeline, P. and Pollack, J. Evolutionary Module Acquisition 1993 Proceedings of the Second Annual Conference on Evolutionary Programming, pp. 154-163  inproceedings  
BibTeX:
@inproceedings{Angeline1993,
  author = {Peter Angeline and Jordan Pollack},
  title = {Evolutionary Module Acquisition},
  booktitle = {Proceedings of the Second Annual Conference on Evolutionary Programming},
  year = {1993},
  pages = {154--163}
}
Bacardit, J., Stout, M., Hirst, J., Valencia, A., Smith, R. and Krasnogor, N. Automated Alphabet Reduction for Protein Datasets 2009 BMC Bioinformatics
Vol. 10(1), pp. 6 
article DOI  
Abstract: BACKGROUND:We investigate automated and generic alphabet reduction techniques for protein structure prediction datasets. Reducing alphabet cardinality without losing key biochemical information opens the door to potentially faster machine learning, data mining and optimization applications in structural bioinformatics. Furthermore, reduced but informative alphabets often result in, e.g., more compact and human-friendly classification/clustering rules. In this paper we propose a robust and sophisticated alphabet reduction protocol based on mutual information and state-of-the-art optimization techniques.RESULTS:We applied this protocol to the prediction of two protein structural features: contact number and relative solvent accessibility. For both features we generated alphabets of two, three, four and five letters. The five-letter alphabets gave prediction accuracies statistically similar to that obtained using the full amino acid alphabet. Moreover, the automatically designed alphabets were compared against other reduced alphabets taken from the literature or human-designed, outperforming them. The differences between our alphabets and the alphabets taken from the literature were quantitatively analyzed. All the above process had been performed using a primary sequence representation of proteins. As a final experiment, we extrapolated the obtained five-letter alphabet to reduce a, much richer, protein representation based on evolutionary information for the prediction of the same two features. Again, the performance gap between the full representation and the reduced representation was small, showing that the results of our automated alphabet reduction protocol, even if they were obtained using a simple representation, are also able to capture the crucial information needed for state-of-the-art protein representations.CONCLUSION:Our automated alphabet reduction protocol generates competent reduced alphabets tailored specifically for a variety of protein datasets. This process is done without any domain knowledge, using information theory metrics instead. The reduced alphabets contain some unexpected (but sound) groups of amino acids, thus suggesting new ways of interpreting the data.
BibTeX:
@article{Bacardit2009,
  author = {Bacardit, Jaume and Stout, Michael and Hirst, Jonathan and Valencia, Alfonso and Smith, Robert and Krasnogor, Natalio},
  title = {Automated Alphabet Reduction for Protein Datasets},
  journal = {BMC Bioinformatics},
  year = {2009},
  volume = {10},
  number = {1},
  pages = {6},
  doi = {http://dx.doi.org/10.1186/1471-2105-10-6}
}
Bacardit, J., Stout, M., Hirst, J.D., Sastry, K., Llorà, X. and Krasnogor, N. Automated alphabet reduction method with evolutionary algorithms for protein structure prediction 2007 Proceedings of the 9th annual conference on Genetic and evolutionary computation, pp. 346-353  inproceedings DOI  
Abstract: This paper focuses on automated procedures to reduce the dimensionality ofprotein structure prediction datasets by simplifying the way in which the primary sequence of a protein is represented. The potential benefits ofthis procedure are faster and easier learning process as well as the generationof more compact and human-readable classifiers.The dimensionality reduction procedure we propose consists on the reductionof the 20-letter amino acid (AA) alphabet, which is normally used to specify a protein sequence, into a lower cardinality alphabet. This reduction comes about by a clustering of AA types accordingly to their physical and chemical similarity. Our automated reduction procedure is guided by a fitness function based on the Mutual Information between the AA-based input attributes of the dataset and the protein structure featurethat being predicted. To search for the optimal reduction, the Extended Compact Genetic Algorithm (ECGA) was used, and afterwards the results of this process were fed into (and validated by) BioHEL, a genetics-based machine learningtechnique. BioHEL used the reduced alphabet to induce rules forprotein structure prediction features. BioHEL results are compared to two standard machine learning systems. Our results show that it is possible to reduce the size of the alphabet used for prediction fromtwenty to just three letters resulting in more compact, i.e. interpretable,rules. Also, a protein-wise accuracy performance measure suggests that the loss of accuracy acrued by this substantial alphabet reduction is not statistically significant when compared to the full alphabet.
BibTeX:
@inproceedings{Bacardit2007,
  author = {Bacardit, Jaume and Stout, Michael and Hirst, Jonathan D. and Sastry, Kumara and Llorà, Xavier and Krasnogor, Natalio},
  title = {Automated alphabet reduction method with evolutionary algorithms for protein structure prediction},
  booktitle = {Proceedings of the 9th annual conference on Genetic and evolutionary computation},
  year = {2007},
  pages = {346--353},
  doi = {http://dx.doi.org/10.1145/1276958.1277033}
}
Bacardit, J., Stout, M., Krasnogor, N., Hirst, J. and Blazewicz, J. Coordination Number Prediction using Learning Classifier Systems: Performance and Interpretability 2006 Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO '06), pp. 247-254  inproceedings DOI  
BibTeX:
@inproceedings{Bacardit2006,
  author = {Bacardit, J. and Stout, M. and Krasnogor, N. and Hirst, J.D. and Blazewicz, J.},
  title = {Coordination Number Prediction using Learning Classifier Systems: Performance and Interpretability},
  booktitle = {Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO '06)},
  publisher = {ACM Press},
  year = {2006},
  pages = {247-254},
  doi = {http://dx.doi.org/10.1145/1143997.1144041}
}
Barthel, D., Hirst, J.D., Blazewicz, J. and Krasnogor, N. ProCKSI: A Decision Support System for Protein (Structure) Comparison, Knowledge, Similarity and Information 2007 BMC Bioinformatics
Vol. 8(1), pp. 416 
article DOI  
Abstract: BACKGROUND: We introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Similarity Metric (USM), the Maximum Contact Map Overlap (MaxCMO) of protein structures and other external methods such as the DaliLite and the TM-align methods, the Combinatorial Extension (CE) of the optimal path, and the FAST Align and Search Tool (FAST). Additionally, ProCKSI allows the user to upload a user-defined similarity matrix supplementing the methods mentioned, and computes a similarity consensus in order to provide a rich, integrated, multicriteria view of large datasets of protein structures. RESULTS: We present ProCKSI's architecture and workflow describing its intuitive user interface, and show its potential on three distinct test-cases. In the first case, ProCKSI is used to evaluate the results of a previous CASP competition, assessing the similarity of proposed models for given targets where the structures could have a large deviation from one another. To perform this type of comparison reliably, we introduce a new consensus method. The second study deals with the verification of a classification scheme for protein kinases, originally derived by sequence comparison by Hanks and Hunter, but here we use a consensus similarity measure based on structures. In the third experiment using the Rost and Sander dataset (RS126), we investigate how a combination of different sets of similarity measures influences the quality and performance of ProCKSI's new consensus measure. ProCKSI performs well with all three datasets, showing its potential for complex, simultaneous multi-method assessment of structural similarity in large protein datasets. Furthermore, combining different similarity measures is usually more robust than relying on one single, unique measure. CONCLUSION: Based on a diverse set of similarity measures, ProCKSI computes a consensus similarity profile for the entire protein set. All results can be clustered, visualised, analysed and easily compared with each other through a simple and intuitive interface.ProCKSI is publicly available at http://www.procksi.net for academic and non-commercial use.
BibTeX:
@article{Barthel2007,
  author = {Daniel Barthel and Jonathan D. Hirst and Jacek Blazewicz and Natalio Krasnogor},
  title = {ProCKSI: A Decision Support System for Protein (Structure) Comparison, Knowledge, Similarity and Information},
  journal = {BMC Bioinformatics},
  year = {2007},
  volume = {8},
  number = {1},
  pages = {416},
  doi = {http://dx.doi.org/10.1186/1471-2105-8-416}
}
Battey, J.N.D., Kopp, J., Bordoli, L., Read, R.J., Clarke, N.D. and Schwede, T. Automated server predictions in CASP7 2007 Proteins: Structure, Function, and Bioinformatics
Vol. 69(S8), pp. 68-82 
article DOI  
Abstract: With each round of CASP (Critical Assessment of Techniques for Protein Structure Prediction), automated prediction servers have played an increasingly important role. Today, most protein structure prediction approaches in some way depend on automated methods for fold recognition or model building. The accuracy of server predictions has significantly increased over the last years, and, in CASP7, we observed a continuation of this trend. In the template-based modeling category, the best prediction server was ranked third overall, i.e. it outperformed all but two of the human participating groups. This server also ranked among the very best predictors in the free modeling category as well, being clearly beaten by only one human group. In the high accuracy (HA) subset of TBM, two of the top five groups were servers. This article summarizes the contribution of automated structure prediction servers in the CASP7 experiment, with emphasis on 3D structure prediction, as well as information on their prediction scope and public availability.
BibTeX:
@article{Battey2007,
  author = {Battey, James N. D. and Kopp, Jürgen and Bordoli, Lorenza and Read, Randy J. and Clarke, Neil D. and Schwede, Torsten},
  title = {Automated server predictions in CASP7},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2007},
  volume = {69},
  number = {S8},
  pages = {68--82},
  doi = {http://dx.doi.org/10.1002/prot.21761}
}
Battiti, R. and Brunato, M. Reactive Search: Machine Learning For Memory-Based Heuristics 2005 (DIT-05-058)  techreport  
BibTeX:
@techreport{Battiti2005,
  author = {Battiti, Roberto and Brunato, Mauro},
  title = {Reactive Search: Machine Learning For Memory-Based Heuristics},
  year = {2005},
  number = {DIT-05-058}
}
Ben-David, M., Noivirt-Brik, O., Paz, A., Prilusky, J., Sussman, J.L. and Levy, Y. Assessment of CASP8 structure predictions for template free targets 2009 Proteins: Structure, Function, and Bioinformatics
Vol. 77(S9), pp. 50-65 
article DOI  
Abstract: The biennial CASP experiment is a crucial way to evaluate, in an unbiased way, the progress in predicting novel 3D protein structures. In this article, we assess the quality of prediction of template free models, that is, ab initio prediction of 3D structures of proteins based solely on the amino acid sequences, that is, proteins that did not have significant sequence identity to any protein in the Protein Data Bank. There were 13 targets in this category and 102 groups submitted predictions. Analysis was based on the GDT_TS analysis, which has been used in previous CASP experiments, together with a newly developed method, the OK_Rank, as well as by visual inspection. There is no doubt that in recent years many obstacles have been removed on the long and elusive way to deciphering the protein-folding problem. Out of the 13 targets, six were predicted well by a number of groups. On the other hand, it must be stressed that for four targets, none of the models were judged to be satisfactory. Thus, for template free model prediction, as evaluated in this CASP, successes have been achieved for most targets; however, a great deal of research is still required, both in improving the existing methods and in development of new approaches.
BibTeX:
@article{Ben-David2009,
  author = {Ben-David, Moshe and Noivirt-Brik, Orly and Paz, Aviv and Prilusky, Jaime and Sussman, Joel L. and Levy, Yaakov},
  title = {Assessment of CASP8 structure predictions for template free targets},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2009},
  volume = {77},
  number = {S9},
  pages = {50--65},
  doi = {http://dx.doi.org/10.1002/prot.22591}
}
Benjamini, Y. and Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing 1995 Journal of the Royal Statistical Society. Series B (Methodological)
Vol. 57(1), pp. 289-300 
article URL 
Abstract: The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses-the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferroni-type procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.
BibTeX:
@article{Benjamini1995,
  author = {Benjamini, Yoav and Hochberg, Yosef},
  title = {Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing},
  journal = {Journal of the Royal Statistical Society. Series B (Methodological)},
  year = {1995},
  volume = {57},
  number = {1},
  pages = {289--300},
  url = {http://www.jstor.org/stable/2346101}
}
Berman, H.M. The Protein Data Bank: a historical perspective 2008 Acta Crystallographica Section A
Vol. 64(1), pp. 88-95 
article DOI  
Abstract: The Protein Data Bank began as a grassroots effort in 1971. It has grown from a small archive containing a dozen structures to a major international resource for structural biology containing more than 40000 entries. The interplay of science, technology and attitudes about data sharing have all played a role in the growth of this resource.
BibTeX:
@article{Berman2008,
  author = {Berman, Helen M.},
  title = {The Protein Data Bank: a historical perspective},
  journal = {Acta Crystallographica Section A},
  year = {2008},
  volume = {64},
  number = {1},
  pages = {88--95},
  doi = {http://dx.doi.org/10.1107/S0108767307035623}
}
Birzele, F., Gewehr, J.E., Csaba, G. and Zimmer, R. Vorolign--fast structural alignment using Voronoi contacts 2007 Bioinformatics
Vol. 23(2), pp. e205-211 
article DOI  
Abstract: Summary: Vorolign, a fast and flexible structural alignment method for two or more protein structures is introduced. The method aligns protein structures using double dynamic programming and measures the similarity of two residues based on the evolutionary conservation of their corresponding Voronoi-contacts in the protein structure. This similarity function allows aligning protein structures even in cases where structural flexibilities exist. Multiple structural alignments are generated from a set of pairwise alignments using a consistency-based, progressive multiple alignment strategy. Results: The performance of Vorolign is evaluated for different applications of protein structure comparison, including automatic family detection as well as pairwise and multiple structure alignment. Vorolign accurately detects the correct family, superfamily or fold of a protein with respect to the SCOP classification on a set of difficult target structures. A scan against a database of >4000 proteins takes on average 1 min per target. The performance of Vorolign in calculating pairwise and multiple alignments is found to be comparable with other pairwise and multiple protein structure alignment methods. Availability: Vorolign is freely available for academic users as a web server at http://www.bio.ifi.lmu.de/Vorolign Contact: fabian.birzele@ifi.lmu.de Supplementary information: Datasets used throughout the article are available at http://www.bio.ifi.lmu.de/Vorolign/supplement.html
BibTeX:
@article{Birzele2007,
  author = {Birzele, Fabian and Gewehr, Jan E. and Csaba, Gergely and Zimmer, Ralf},
  title = {Vorolign--fast structural alignment using Voronoi contacts},
  journal = {Bioinformatics},
  year = {2007},
  volume = {23},
  number = {2},
  pages = {e205--211},
  doi = {http://dx.doi.org/10.1093/bioinformatics/btl294}
}
Blum, C. and Roli, A. Metaheuristics in combinatorial optimization: Overview and conceptual comparison 2003 ACM Computing Surveys
Vol. 35(3), pp. 268-308 
article DOI  
BibTeX:
@article{Blum2003,
  author = {Blum, Christian and Roli, Andrea},
  title = {Metaheuristics in combinatorial optimization: Overview and conceptual comparison},
  journal = {ACM Computing Surveys},
  publisher = {ACM},
  year = {2003},
  volume = {35},
  number = {3},
  pages = {268--308},
  doi = {http://dx.doi.org/10.1145/937503.937505}
}
Boas, F.E. and Harbury, P.B. Potential energy functions for protein design 2007 Current Opinion in Structural Biology
Vol. 17(2)Theory and simulation / Macromolecular assemblages, pp. 199-204 
article DOI  
Abstract: Different potential energy functions have predominated in protein dynamics simulations, protein design calculations, and protein structure prediction. Clearly, the same physics applies in all three cases. The differences in potential energy functions reflect differences in how the calculations are performed. With improvements in computer power and algorithms, the same potential energy function should be applicable to all three problems. In this review, we examine energy functions currently used for protein design, and look to the molecular mechanics field for advances that could be used in the next generation of design algorithms. In particular, we focus on improved models of the hydrophobic effect, polarization and hydrogen bonding.
BibTeX:
@article{Boas2007,
  author = {Boas, F Edward and Harbury, Pehr B},
  title = {Potential energy functions for protein design},
  booktitle = {Theory and simulation / Macromolecular assemblages},
  journal = {Current Opinion in Structural Biology},
  year = {2007},
  volume = {17},
  number = {2},
  pages = {199--204},
  doi = {http://dx.doi.org/10.1016/j.sbi.2007.03.006}
}
Boniecki, M., Rotkiewicz, P., Skolnick, J. and Kolinski, A. Protein fragment reconstruction using various modeling techniques 2003 Journal of Computer-Aided Molecular Design
Vol. 17(11), pp. 725-738 
article DOI  
Abstract: Recently developed reduced models of proteins with knowledge-based force fields have been applied to a specific case of comparative modeling. From twenty high resolution protein structures of various structural classes, significant fragments of their chains have been removed and treated as unknown. The remaining portions of the structures were treated as fixed – i.e., as templates with an exact alignment. Then, the missed fragments were reconstructed using several modeling tools. These included three reduced types of protein models: the lattice SICHO (Side Chain Only) model, the lattice CABS (Cα + Cβ + Side group) model and an off-lattice model similar to the CABS model and called REFINER. The obtained reduced models were compared with more standard comparative modeling tools such as MODELLER and the SWISS-MODEL server. The reduced model results are qualitatively better for the higher resolution lattice models, clearly suggesting that these are now mature, competitive and complementary (in the range of sparse alignments) to the classical tools of comparative modeling. Comparison between the various reduced models strongly suggests that the essential ingredient for the sucessful and accurate modeling of protein structures is not the representation of conformational space (lattice, off-lattice, all-atom) but, rather, the specificity of the force fields used and, perhaps, the sampling techniques employed. These conclusions are encouraging for the future application of the fast reduced models in comparative modeling on a genomic scale.
BibTeX:
@article{Boniecki2003,
  author = {Boniecki, Michal and Rotkiewicz, Piotr and Skolnick, Jeffrey and Kolinski, Andrzej},
  title = {Protein fragment reconstruction using various modeling techniques},
  journal = {Journal of Computer-Aided Molecular Design},
  year = {2003},
  volume = {17},
  number = {11},
  pages = {725--738},
  doi = {http://dx.doi.org/10.1023/B:JCAM.0000017486.83645.a0}
}
Bonneau, R., Strauss, C.E.M., Rohl, C.A., Chivian, D., Bradley, P., Malmstrom, L., Robertson, T. and Baker, D. De Novo Prediction of Three-dimensional Structures for Major Protein Families 2002 Journal of Molecular Biology
Vol. 322(1), pp. 65-78 
article DOI  
Abstract: We use the Rosetta de novo structure prediction method to produce three-dimensional structure models for all Pfam-A sequence families with average length under 150 residues and no link to any protein of known structure. To estimate the reliability of the predictions, the method was calibrated on 131 proteins of known structure. For approximately 60% of the proteins one of the top five models was correctly predicted for 50 or more residues, and for approximately 35%, the correct SCOP superfamily was identified in a structure-based search of the Protein Data Bank using one of the models. This performance is consistent with results from the fourth critical assessment of structure prediction (CASP4). Correct and incorrect predictions could be partially distinguished using a confidence function based on a combination of simulation convergence, protein length and the similarity of a given structure prediction to known protein structures. While the limited accuracy and reliability of the method precludes definitive conclusions, the Pfam models provide the only tertiary structure information available for the 12% of publicly available sequences represented by these large protein families.
BibTeX:
@article{Bonneau2002,
  author = {Bonneau, Richard and Strauss, Charlie E. M. and Rohl, Carol A. and Chivian, Dylan and Bradley, Phillip and Malmstrom, Lars and Robertson, Tim and Baker, David},
  title = {De Novo Prediction of Three-dimensional Structures for Major Protein Families},
  journal = {Journal of Molecular Biology},
  year = {2002},
  volume = {322},
  number = {1},
  pages = {65--78},
  doi = {http://dx.doi.org/10.1016/S0022-2836(02)00698-8}
}
Bonneau, R., Tsai, J., Ruczinski, I., Chivian, D., Rohl, C., Strauss, C. and Baker, D. Rosetta in CASP4: Progress in ab initio protein structure prediction 2001 Proteins: Structure, Function, and Genetics
Vol. 45(S5), pp. 119-26 
article DOI  
BibTeX:
@article{Bonneau2001,
  author = {Bonneau, Richard and Tsai, Jerry and Ruczinski, Ingo and Chivian, Dylan and Rohl, Carol and Strauss, Charlie and Baker, David},
  title = {Rosetta in CASP4: Progress in ab initio protein structure prediction},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {2001},
  volume = {45},
  number = {S5},
  pages = {119--26},
  doi = {http://dx.doi.org/10.1002/prot.1170}
}
Borate, B., Chesler, E., Langston, M., Saxton, A. and Voy, B. Comparison of threshold selection methods for microarray gene co-expression matrices 2009 BMC Research Notes
Vol. 2(1), pp. 240 
article DOI  
Abstract: BACKGROUND: Network and clustering analyses of microarray co-expression correlation data often require application of a threshold to discard small correlations, thus reducing computational demands and decreasing the number of uninformative correlations. This study investigated threshold selection in the context of combinatorial network analysis of transcriptome data. FINDINGS: Six conceptually diverse methods - based on number of maximal cliques, correlation of control spots with expressed genes, top 1% of correlations, spectral graph clustering, Bonferroni correction of p-values, and statistical power - were used to estimate a correlation threshold for three time-series microarray datasets. The validity of thresholds was tested by comparison to thresholds derived from Gene Ontology information. Stability and reliability of the best methods were evaluated with block bootstrapping.Two threshold methods, number of maximal cliques and spectral graph, used information in the correlation matrix structure and performed well in terms of stability. Comparison to Gene Ontology found thresholds from number of maximal cliques extracted from a co-expression matrix were the most biologically valid. Approaches to improve both methods were suggested. CONCLUSION: Threshold selection approaches based on network structure of gene relationships gave thresholds with greater relevance to curated biological relationships than approaches based on statistical pair-wise relationships.
BibTeX:
@article{Borate2009,
  author = {Borate, Bhavesh and Chesler, Elissa and Langston, Michael and Saxton, Arnold and Voy, Brynn},
  title = {Comparison of threshold selection methods for microarray gene co-expression matrices},
  journal = {BMC Research Notes},
  year = {2009},
  volume = {2},
  number = {1},
  pages = {240},
  doi = {http://dx.doi.org/10.1186/1756-0500-2-240}
}
Bourne, P.E. Structural Bioinformatics 2003 , pp. 499-505  inbook DOI  
BibTeX:
@inbook{Bourne2003,
  author = {Philip E. Bourne},
  title = {Structural Bioinformatics},
  publisher = {Wiley-Liss},
  year = {2003},
  pages = {499--505},
  doi = {http://dx.doi.org/10.1002/0471721204.ch24}
}
Bower, M.J., Cohen, F.E. and Dunbrack, R.L. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool 1997 Journal of Molecular Biology
Vol. 267(5), pp. 1268-1282 
article DOI  
Abstract: Modeling by homology is the most accurate computational method for translating an amino acid sequence into a protein structure. Homology modeling can be divided into two sub-problems, placing the polypeptide backbone and adding side-chains. We present a method for rapidly predicting the conformations of protein side-chains, starting from main-chain coordinates alone. The method involves using fewer than ten rotamers per residue from a backbone-dependent rotamer library and a search to remove steric conflicts. The method is initially tested on 299 high resolution crystal structures by rebuilding side-chains onto the experimentally determined backbone structures. A total of 77% of [chi]1 and 66% of [chi]1+2 dihedral angles are predicted within 40[degree sign] of their crystal structure values. We then tested the method on the entire database of known structures in the Protein Data Bank. The predictive accuracy of the algorithm was strongly correlated with the resolution of the structures. In an effort to simulate a realistic homology modeling problem, 9424 homology models were created using three different modeling strategies. For prediction purposes, pairs of structures were identified which shared between 30% and 90% sequence identity. One strategy results in 82% of [chi]1 and 72% [chi]1+2 dihedral angles predicted within 40 degrees of the target crystal structure values, suggesting that movements of the backbone associated with this degree of sequence identity are not large enough to disrupt the predictive ability of our method for non-native backbones. These results compared favorably with existing methods over a comprehensive data set.
BibTeX:
@article{Bower1997,
  author = {Bower, Michael J. and Cohen, Fred E. and Dunbrack, Roland L.},
  title = {Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool},
  journal = {Journal of Molecular Biology},
  year = {1997},
  volume = {267},
  number = {5},
  pages = {1268--1282},
  doi = {http://dx.doi.org/10.1006/jmbi.1997.0926}
}
Bradley, P., Chivian, D., Meiler, J., Misura, K., Rohl, C., Schief, W., Wedemeyer, W., Schueler-Furman, O., Murphy, P., Schonbrun, J., Strauss, C. and Baker, D. Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation 2003 Proteins: Structure, Function, and Genetics
Vol. 53(S6), pp. 457-68 
article DOI  
BibTeX:
@article{Bradley2003,
  author = {Bradley, Philip and Chivian, Dylan and Meiler, Jens and Misura, Kira and Rohl, Carol and Schief, William and Wedemeyer, William and Schueler-Furman, Ora and Murphy, Paul and Schonbrun, Jack and Strauss, Charles and Baker, David},
  title = {Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {2003},
  volume = {53},
  number = {S6},
  pages = {457--68},
  doi = {http://dx.doi.org/10.1002/prot.10552}
}
Bradley, P., Malmström, L., Qian, B., Schonbrun, J., Chivian, D., Kim, D.E., Meiler, J., Misura, K.M. and Baker, D. Free modeling with Rosetta in CASP6 2005 Proteins: Structure, Function, and Bioinformatics
Vol. 61(S7), pp. 128-34 
article DOI  
BibTeX:
@article{Bradley2005,
  author = {Bradley, Philip and Malmström, Lars and Qian, Bin and Schonbrun, Jack and Chivian, Dylan and Kim, David E. and Meiler, Jens and Misura, Kira M.S. and Baker, David},
  title = {Free modeling with Rosetta in CASP6},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2005},
  volume = {61},
  number = {S7},
  pages = {128--34},
  doi = {http://dx.doi.org/10.1002/prot.20729}
}
Bradley, P., Misura, K.M.S. and Baker, D. Toward High-Resolution de Novo Structure Prediction for Small Proteins 2005 Science
Vol. 309(5742), pp. 1868-1871 
article DOI  
Abstract: The prediction of protein structure from amino acid sequence is a grand challenge of computational molecular biology. By using a combination of improved low- and high-resolution conformational sampling methods, improved atomically detailed potential functions that capture the jigsaw puzzle-like packing of protein cores, and high-performance computing, high-resolution structure prediction (<1.5 angstroms) can be achieved for small protein domains (<85 residues). The primary bottleneck to consistent high-resolution prediction appears to be conformational sampling.
BibTeX:
@article{Bradley2005a,
  author = {Bradley, Philip and Misura, Kira M. S. and Baker, David},
  title = {Toward High-Resolution de Novo Structure Prediction for Small Proteins},
  journal = {Science},
  year = {2005},
  volume = {309},
  number = {5742},
  pages = {1868--1871},
  doi = {http://dx.doi.org/10.1126/science.1113801}
}
Branton, D., Deamer, D.W., Marziali, A., Bayley, H., Benner, S.A., Butler, T., Di Ventra, M., Garaj, S., Hibbs, A., Huang, X., Jovanovich, S.B., Krstic, P.S., Lindsay, S., Ling, X.S., Mastrangelo, C.H., Meller, A., Oliver, J.S., Pershin, Y.V., Ramsey, J.M., Riehn, R., Soni, G.V., Tabard-Cossa, V., Wanunu, M., Wiggin, M. and Schloss, J.A. The potential and challenges of nanopore sequencing 2008 Nature Biotechnology
Vol. 26(10), pp. 1146-1153 
article DOI  
BibTeX:
@article{Branton2008,
  author = {Branton, Daniel and Deamer, David W and Marziali, Andre and Bayley, Hagan and Benner, Steven A and Butler, Thomas and Di Ventra, Massimiliano and Garaj, Slaven and Hibbs, Andrew and Huang, Xiaohua and Jovanovich, Stevan B and Krstic, Predrag S and Lindsay, Stuart and Ling, Xinsheng Sean and Mastrangelo, Carlos H and Meller, Amit and Oliver, John S and Pershin, Yuriy V and Ramsey, J Michael and Riehn, Robert and Soni, Gautam V and Tabard-Cossa, Vincent and Wanunu, Meni and Wiggin, Matthew and Schloss, Jeffery A},
  title = {The potential and challenges of nanopore sequencing},
  journal = {Nature Biotechnology},
  publisher = {Nature Publishing Group},
  year = {2008},
  volume = {26},
  number = {10},
  pages = {1146--1153},
  doi = {http://dx.doi.org/10.1038/nbt.1495}
}
Brooks, B., Bruccoleri, R., Olafson, B., States, D., Swaminathan, S. and Karplus, M. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations 1983 Journal of Computational Chemistry
Vol. 4(2), pp. 187-217 
article DOI  
BibTeX:
@article{Brooks1983,
  author = {Brooks, Bernard and Bruccoleri, Robert and Olafson, Barry and States, David and Swaminathan, S and Karplus, Martin},
  title = {CHARMM: A program for macromolecular energy, minimization, and dynamics calculations},
  journal = {Journal of Computational Chemistry},
  year = {1983},
  volume = {4},
  number = {2},
  pages = {187--217},
  doi = {http://dx.doi.org/10.1002/jcc.540040211}
}
Brylinski, M., Konieczny, L. and Roterman, I. Hydrophobic collapse in (in silico) protein folding 2006 Computational Biology and Chemistry
Vol. 30(4), pp. 255-267 
article DOI  
Abstract: A model of hydrophobic collapse, which is treated as the driving force for protein folding, is presented. This model is the superposition of three models commonly used in protein structure prediction: (1) `oil-drop' model introduced by Kauzmann, (2) a lattice model introduced to decrease the number of degrees of freedom for structural changes and (3) a model of the formation of hydrophobic core as a key feature in driving the folding of proteins. These three models together helped to develop the idea of a fuzzy-oil-drop as a model for an external force field of hydrophobic character mimicking the hydrophobicity-differentiated environment for hydrophobic collapse. All amino acids in the polypeptide interact pair-wise during the folding process (energy minimization procedure) and interact with the external hydrophobic force field defined by a three-dimensional Gaussian function. The value of the Gaussian function usually interpreted as a probability distribution is treated as a normalized hydrophobicity distribution, with its maximum in the center of the ellipsoid and decreasing proportionally with the distance versus the center. The fuzzy-oil-drop is elastic and changes its shape and size during the simulated folding procedure.
BibTeX:
@article{Brylinski2006,
  author = {Brylinski, Michal and Konieczny, Leszek and Roterman, Irena},
  title = {Hydrophobic collapse in (in silico) protein folding},
  journal = {Computational Biology and Chemistry},
  year = {2006},
  volume = {30},
  number = {4},
  pages = {255--267},
  doi = {http://dx.doi.org/10.1016/j.compbiolchem.2006.04.007}
}
Burke, E., Gustafson, S. and Kendall, G. Diversity in genetic programming: an analysis of measures and correlation with fitness 2004 Evolutionary Computation, IEEE Transactions on
Vol. 8(1)Evolutionary Computation, IEEE Transactions on, pp. 47-62 
article DOI  
Abstract: This paper examines measures of diversity in genetic programming. The goal is to understand the importance of such measures and their relationship with fitness. Diversity methods and measures from the literature are surveyed and a selected set of measures are applied to common standard problem instances in an experimental study. Results show the varying definitions and behaviors of diversity and the varying correlation between diversity and fitness during different stages of the evolutionary process. Populations in the genetic programming algorithm are shown to become structurally similar while maintaining a high amount of behavioral differences. Conclusions describe what measures are likely to be important for understanding and improving the search process and why diversity might have different meaning for different problem domains.
BibTeX:
@article{Burke2004,
  author = {Burke, E.K. and Gustafson, S. and Kendall, G.},
  title = {Diversity in genetic programming: an analysis of measures and correlation with fitness},
  booktitle = {Evolutionary Computation, IEEE Transactions on},
  journal = {Evolutionary Computation, IEEE Transactions on},
  year = {2004},
  volume = {8},
  number = {1},
  pages = {47--62},
  doi = {http://dx.doi.org/10.1109/TEVC.2003.819263}
}
Burke, E., Gustafson, S., Kendall, G. and Krasnogor, N. Advanced Population Diversity Measures in Genetic Programming 2002
Vol. 24397th International Conference Parallel Problem Solving from Nature, pp. 341-350 
inproceedings DOI  
BibTeX:
@inproceedings{Burke2002,
  author = {E.K. Burke and S. Gustafson and G. Kendall and N. Krasnogor},
  title = {Advanced Population Diversity Measures in Genetic Programming},
  booktitle = {7th International Conference Parallel Problem Solving from Nature},
  publisher = {Springer Berlin / Heidelberg},
  year = {2002},
  volume = {2439},
  pages = {341--350},
  doi = {http://dx.doi.org/10.1007/3-540-45712-7_33}
}
Camoglu, O., Can, T. and Singh, A.K. Integrating multi-attribute similarity networks for robust representation of the protein space 2006 Bioinformatics
Vol. 22(13), pp. 1585-1592 
article DOI  
Abstract: Motivation: A global view of the protein space is essential for functional and evolutionary analysis of proteins. In order to achieve this, a similarity network can be built using pairwise relationships among proteins. However, existing similarity networks employ a single similarity measure and therefore their utility depends highly on the quality of the selected measure. A more robust representation of the protein space can be realized if multiple sources of information are used. Results: We propose a novel approach for analyzing multi-attribute similarity networks by combining random walks on graphs with Bayesian theory. A multi-attribute network is created by combining sequence and structure based similarity measures. For each attribute of the similarity network, one can compute a measure of affinity from a given protein to every other protein in the network using random walks. This process makes use of the implicit clustering information of the similarity network, and we show that it is superior to naive, local ranking methods. We then combine the computed affinities using a Bayesian framework. In particular, when we train a Bayesian model for automated classification of a novel protein, we achieve high classification accuracy and outperform single attribute networks. In addition, we demonstrate the effectiveness of our technique by comparison with a competing kernel-based information integration approach. Availability: Source code is available upon request from the primary author. Contact: orhan@cs.ucsb.edu Supplementary Information: Supplementary data are available on Bioinformatic online.
BibTeX:
@article{Camoglu2006,
  author = {Camoglu, Orhan and Can, Tolga and Singh, Ambuj K.},
  title = {Integrating multi-attribute similarity networks for robust representation of the protein space},
  journal = {Bioinformatics},
  year = {2006},
  volume = {22},
  number = {13},
  pages = {1585--1592},
  doi = {http://dx.doi.org/10.1093/bioinformatics/btl130}
}
Canutescu, A.A., Shelenkov, A.A. and Dunbrack Roland L., J. A graph-theory algorithm for rapid protein side-chain prediction 2003 Protein Sci
Vol. 12(9), pp. 2001-2014 
article DOI URL 
Abstract: Fast and accurate side-chain conformation prediction is important for homology modeling, ab initio protein structure prediction, and protein design applications. Many methods have been presented, although only a few computer programs are publicly available. The SCWRL program is one such method and is widely used because of its speed, accuracy, and ease of use. A new algorithm for SCWRL is presented that uses results from graph theory to solve the combinatorial problem encountered in the side-chain prediction problem. In this method, side chains are represented as vertices in an undirected graph. Any two residues that have rotamers with nonzero interaction energies are considered to have an edge in the graph. The resulting graph can be partitioned into connected subgraphs with no edges between them. These subgraphs can in turn be broken into biconnected components, which are graphs that cannot be disconnected by removal of a single vertex. The combinatorial problem is reduced to finding the minimum energy of these small biconnected components and combining the results to identify the global minimum energy conformation. This algorithm is able to complete predictions on a set of 180 proteins with 34,342 side chains in <7 min of computer time. The total chi1 and chi1 + 2 dihedral angle accuracies are 82.6% and 73.7% using a simple energy function based on the backbone-dependent rotamer library and a linear repulsive steric energy. The new algorithm will allow for use of SCWRL in more demanding applications such as sequence design and ab initio structure prediction, as well addition of a more complex energy function and conformational flexibility, leading to increased accuracy.
BibTeX:
@article{Canutescu2003,
  author = {Canutescu, Adrian A. and Shelenkov, Andrew A. and Dunbrack, Roland L., Jr.},
  title = {A graph-theory algorithm for rapid protein side-chain prediction},
  journal = {Protein Sci},
  year = {2003},
  volume = {12},
  number = {9},
  pages = {2001--2014},
  url = {http://dunbrack.fccc.edu/SCWRL3.php},
  doi = {http://dx.doi.org/10.1110/ps.03154503}
}
Caprara, A., Carr, R., Istrail, S., Lancia, G. and Walenz, B. 1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap 2004 Journal of Computational Biology
Vol. 11(1), pp. 27-52 
article DOI  
BibTeX:
@article{Caprara2004,
  author = {Caprara,Alberto and Carr,Robert and Istrail,Sorin and Lancia,Giuseppe and Walenz,Brian},
  title = {1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap},
  journal = {Journal of Computational Biology},
  year = {2004},
  volume = {11},
  number = {1},
  pages = {27-52},
  doi = {http://dx.doi.org/10.1089/106652704773416876}
}
Caprara, A. and Lancia, G. Structural alignment of large--size proteins via lagrangian relaxation 2002 RECOMB '02: Proceedings of the sixth annual international conference on Computational biology, pp. 100-108  inproceedings DOI  
Abstract: We illustrate a new approach to the Contact Map Overlap problem for the comparison of protein structures. The approach is based on formulating the problem as an integer linear program and then relaxing in a Lagrangian way a suitable set of constraints. This relaxation is solved by computing a sequence of simple alignment problems, each in quadratic time, and near--optimal Lagrangian multipliers are found by subgradient optimization. By our approach we achieved a substantial speedup over the best existing methods. We were able to solve optimally for the first time instances for PDB proteins with about 1000 residues and 2000 contacts. Moreover, within a few hours we compared 780 pairs in a testbed of 40 large proteins, finding the optimal solution in 150 cases. Finally, we compared 10,000 pairs of proteins from a test set of 269 proteins in the literature, which took a couple of days on a PC.
BibTeX:
@inproceedings{Caprara2002,
  author = {Caprara, Alberto and Lancia, Giuseppe},
  title = {Structural alignment of large--size proteins via lagrangian relaxation},
  booktitle = {RECOMB '02: Proceedings of the sixth annual international conference on Computational biology},
  publisher = {ACM Press},
  year = {2002},
  pages = {100--108},
  doi = {http://dx.doi.org/10.1145/565196.565209}
}
Carr, R., Hart, W., Krasnogor, N., Hirst, J., Burke, E.K. and Smith, J. Alignment Of Protein Structures With A Memetic Evolutionary Algorithm 2002 GECCO '02: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1027-1034  inproceedings  
BibTeX:
@inproceedings{Carr2002,
  author = {Robert Carr and William Hart and Natalio Krasnogor and Jonathan Hirst and Edmund K. Burke and James Smith},
  title = {Alignment Of Protein Structures With A Memetic Evolutionary Algorithm},
  booktitle = {GECCO '02: Proceedings of the Genetic and Evolutionary Computation Conference},
  publisher = {Morgan Kaufmann Publishers},
  year = {2002},
  pages = {1027--1034}
}
Carr, R., Lancia, G. and Istrail, S. Branch-and Cut Algorithms for Independent Set Problems: Integrality Gap and an Application to Protein Structure Alignment 2000 (SAND2000-2171)  techreport  
BibTeX:
@techreport{Carr2000,
  author = {R. Carr and G. Lancia and S. Istrail},
  title = {Branch-and Cut Algorithms for Independent Set Problems: Integrality Gap and an Application to Protein Structure Alignment},
  year = {2000},
  number = {SAND2000-2171}
}
Case, D., Cheatham, I.T., Darden, T., Gohlke, H., Luo, R., Merz, J.K., Onufriev, A., Simmerling, C., Wang, B. and Woods, R. The Amber biomolecular simulation programs 2005 Journal of Computational Chemistry
Vol. 26(16), pp. 1668-88 
article DOI  
BibTeX:
@article{Case2005,
  author = {Case, David and Cheatham, III Thomas and Darden, Tom and Gohlke, Holger and Luo, Ray and Merz, Jr Kenneth and Onufriev, Alexey and Simmerling, Carlos and Wang, Bing and Woods, Robert},
  title = {The Amber biomolecular simulation programs},
  journal = {Journal of Computational Chemistry},
  year = {2005},
  volume = {26},
  number = {16},
  pages = {1668--88},
  doi = {http://dx.doi.org/10.1002/jcc.20290}
}
Chang, C.-C. and Lin, C.-J. LIBSVM: A library for support vector machines 2011 ACM Transactions on Intelligent Systems and Technology
Vol. 2(3), pp. 27:1-27:27 
article DOI  
BibTeX:
@article{Chang2011,
  author = {Chang, Chih-Chung and Lin, Chih-Jen},
  title = {LIBSVM: A library for support vector machines},
  journal = {ACM Transactions on Intelligent Systems and Technology},
  year = {2011},
  volume = {2},
  number = {3},
  pages = {27:1--27:27},
  doi = {http://dx.doi.org/10.1145/1961189.1961199}
}
Chen, H. and Zhou, H.-X. Prediction of solvent accessibility and sites of deleterious mutations from protein sequence 2005 Nucleic Acids Research
Vol. 33(10), pp. 3193-3199 
article DOI  
Abstract: Residues that form the hydrophobic core of a protein are critical for its stability. A number of approaches have been developed to classify residues as buried or exposed. In order to optimize the classification, we have refined a suite of five methods over a large dataset and proposed a metamethod based on an ensemble average of the individual methods, leading to a two-state classification accuracy of 80%. Many studies have suggested that hydrophobic core residues are likely sites of deleterious mutations, so we wanted to see to what extent these sites can be predicted from the putative buried residues. Residues that were most confidently classified as buried were proposed as sites of deleterious mutations. This proposition was tested on six proteins for which sites of deleterious mutations have previously been identified by stability measurement or functional assay. Of the total of 130 residues predicted as sites of deleterious mutations, 104 (or 80%) were correct.
BibTeX:
@article{Chen2005,
  author = {Chen, Huiling and Zhou, Huan-Xiang},
  title = {Prediction of solvent accessibility and sites of deleterious mutations from protein sequence},
  journal = {Nucleic Acids Research},
  year = {2005},
  volume = {33},
  number = {10},
  pages = {3193--3199},
  doi = {http://dx.doi.org/10.1093/nar/gki633}
}
Cheng, J., Randall, A.Z., Sweredoski, M.J. and Baldi, P. SCRATCH: a protein structure and structural feature prediction server 2005 Nucl. Acids Res.
Vol. 33(suppl_2), pp. W72-76 
article DOI  
Abstract: SCRATCH is a server for predicting protein tertiary structure and structural features. The SCRATCH software suite includes predictors for secondary structure, relative solvent accessibility, disordered regions, domains, disulfide bridges, single mutation stability, residue contacts versus average, individual residue contacts and tertiary structure. The user simply provides an amino acid sequence and selects the desired predictions, then submits to the server. Results are emailed to the user. The server is available at http://www.igb.uci.edu/servers/psss.html.
BibTeX:
@article{Cheng2005,
  author = {Cheng, J. and Randall, A. Z. and Sweredoski, M. J. and Baldi, P.},
  title = {SCRATCH: a protein structure and structural feature prediction server},
  journal = {Nucl. Acids Res.},
  year = {2005},
  volume = {33},
  number = {suppl_2},
  pages = {W72--76},
  note = {Uses a set of own features predictors (secondary structure, solvent accesibility, contact maps, domains). Started in CASP as balid-group-server.},
  doi = {http://dx.doi.org/10.1093/nar/gki396}
}
Chew, L. and Kedem, K. Finding Consensus Shape for a Protein Family 2002 18th ACM Symp. on Computational Geometry  inproceedings DOI  
BibTeX:
@inproceedings{Chew2002,
  author = {L.P. Chew and K. Kedem},
  title = {Finding Consensus Shape for a Protein Family},
  booktitle = {18th ACM Symp. on Computational Geometry},
  year = {2002},
  doi = {http://dx.doi.org/10.1145/513400.513408}
}
Chiu, T.-L. and Goldstein, R.A. How to generate improved potentials for protein tertiary structure prediction: A lattice model study 2000 Proteins: Structure, Function, and Genetics
Vol. 41(2), pp. 157-163 
article DOI  
Abstract: Success in the protein structure prediction problem relies heavily on the choice of an appropriate potential function. One approach toward extracting these potentials from a database of known protein structures is to maximize the -score of the database proteins, which represents the ability of the potential to discriminate correct from random conformations. These optimization methods model the entire distribution of alternative structures, reducing their ability to concentrate on the lowest energy structures most competitive with the native state and resulting in an unfortunate tendency to underestimate the repulsive interactions. This leads to reduced accuracy and predictive ability. Using a lattice model, we demonstrate how we can weight the distribution to suppress the contributions of the high-energy conformations to the -score calculation. The result is a potential that is more accurate and more likely to yield correct predictions than other -score optimization methods as well as potentials of mean force. Proteins 2000;41:157-163. � 2000 Wiley-Liss, Inc.
BibTeX:
@article{Chiu2000,
  author = {Chiu, Ting-Lan and Goldstein, Richard A.},
  title = {How to generate improved potentials for protein tertiary structure prediction: A lattice model study},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {2000},
  volume = {41},
  number = {2},
  pages = {157--163},
  doi = {http://dx.doi.org/10.1002/1097-0134(20001101)41:2<157::AID-PROT10>3.0.CO;2-W}
}
Chivian, D. CASP7 server ranking for FM category (GDT MM) 2006   webpage URL 
BibTeX:
@webpage{URL_CASP7-rank_Chivian,
  author = {David Chivian},
  title = {CASP7 server ranking for FM category (GDT MM)},
  year = {2006},
  url = {http://robetta.bakerlab.org/CASP7_eval/CASP7.FR_A-NF.Best-GDT_MM.html}
}
Chivian, D., Kim, D., Malmström, L., Schonbrun, J., Rohl, C. and Baker, D. Prediction of CASP6 structures using automated robetta protocols 2005 Proteins: Structure, Function, and Bioinformatics
Vol. 61(S7), pp. 157-66 
article DOI  
BibTeX:
@article{Chivian2005,
  author = {Chivian, Dylan and Kim, David and Malmström, Lars and Schonbrun, Jack and Rohl, Carol and Baker, David},
  title = {Prediction of CASP6 structures using automated robetta protocols},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2005},
  volume = {61},
  number = {S7},
  pages = {157--66},
  doi = {http://dx.doi.org/10.1002/prot.20733}
}
Coutsias, E.A., Seok, C. and Dill, K.A. Using quaternions to calculate RMSD 2004 Journal of Computational Chemistry
Vol. 25(15), pp. 1849-1857 
article DOI  
Abstract: A widely used way to compare the structures of biomolecules or solid bodies is to translate and rotate one structure with respect to the other to minimize the root-mean-square deviation (RMSD). We present a simple derivation, based on quaternions, for the optimal solid body transformation (rotation-translation) that minimizes the RMSD between two sets of vectors. We prove that the quaternion method is equivalent to the well-known formula due to Kabsch. We analyze the various cases that may arise, and give a complete enumeration of the special cases in terms of the arrangement of the eigenvalues of a traceless, 4x4 symmetric matrix. A key result here is an expression for the gradient of the RMSD as a function of model parameters. This can be useful, for example, in finding the minimum energy path of a reaction using the elastic band methods or in optimizing model parameters to best fit a target structure.
BibTeX:
@article{Coutsias2004,
  author = {Coutsias, Evangelos A. and Seok, Chaok and Dill, Ken A.},
  title = {Using quaternions to calculate RMSD},
  journal = {Journal of Computational Chemistry},
  year = {2004},
  volume = {25},
  number = {15},
  pages = {1849--1857},
  doi = {http://dx.doi.org/10.1002/jcc.20110}
}
Coy, S.P., Golden, B.L., Runger, G.C. and Wasil, E.A. Using Experimental Design to Find Effective Parameter Settings for Heuristics 2001 Journal of Heuristics
Vol. 7(1), pp. 77-97 
article DOI  
Abstract: In this paper, we propose a procedure, based on statistical design of experiments and gradient descent, that finds effective settings for parameters found in heuristics. We develop our procedure using four experiments. We use our procedure and a small subset of problems to find parameter settings for two new vehicle routing heuristics. We then set the parameters of each heuristic and solve 19 capacity-constrained and 15 capacity-constrained and route-length-constrained vehicle routing problems ranging in size from 50 to 483 customers. We conclude that our procedure is an effective method that deserves serious consideration by both researchers and operations research practitioners.
BibTeX:
@article{Coy2001,
  author = {Coy, Steven P. and Golden, Bruce L. and Runger, George C. and Wasil, Edward A.},
  title = {Using Experimental Design to Find Effective Parameter Settings for Heuristics},
  journal = {Journal of Heuristics},
  publisher = {Kluwer Academic Publishers},
  year = {2001},
  volume = {7},
  number = {1},
  pages = {77--97},
  doi = {http://dx.doi.org/10.1023/A:1026569813391}
}
Cozzetto, D., Giorgetti, A., Raimondo, D. and Tramontano, A. The Evaluation of Protein Structure Prediction Results 2008 Molecular Biotechnology
Vol. 39(1), pp. 1-8 
article DOI  
Abstract: Methods for protein structure prediction are flourishing and becoming widely available to both experimentalists and computational biologists. However, how good are they? What is their range of applicability and how can we know which method is better suited for the task at hand? These are the questions that this review tries to address, by describing the worldwide Critical Assessment of techniques for protein Structure Prediction (CASP) initiative and focusing on the specific problems of assessing the quality of a protein 3D model.
BibTeX:
@article{Cozzetto2008a,
  author = {Cozzetto, Domenico and Giorgetti, Alejandro and Raimondo, Domenico and Tramontano, Anna},
  title = {The Evaluation of Protein Structure Prediction Results},
  journal = {Molecular Biotechnology},
  publisher = {Humana Press Inc.},
  year = {2008},
  volume = {39},
  number = {1},
  pages = {1-8},
  doi = {http://dx.doi.org/10.1007/s12033-007-9023-6}
}
Cozzetto, D. and Tramontano, A. Advances and pitfalls in protein structure prediction 2008 Curr. Protein Pept. Sci.
Vol. 6(9), pp. 567-77 
article DOI  
BibTeX:
@article{Cozzetto2008,
  author = {Cozzetto, D. and Tramontano, A.},
  title = {Advances and pitfalls in protein structure prediction},
  journal = {Curr. Protein Pept. Sci.},
  year = {2008},
  volume = {6},
  number = {9},
  pages = {567-77},
  doi = {http://dx.doi.org/10.2174/138920308786733958}
}
Crescenzi, P., Goldman, D., Papadimitriou, C.H., Piccolboni, A. and Yannakakis, M. On the Complexity of Protein Folding 1998 Journal of Computational Biology
Vol. 5(3), pp. 423-466 
article URL 
BibTeX:
@article{Crescenzi1998,
  author = {Pierluigi Crescenzi and Deborah Goldman and Christos H. Papadimitriou and Antonio Piccolboni and Mihalis Yannakakis},
  title = {On the Complexity of Protein Folding},
  journal = {Journal of Computational Biology},
  year = {1998},
  volume = {5},
  number = {3},
  pages = {423-466},
  url = {http://citeseer.ist.psu.edu/31865.html}
}
Cristobal, S., Zemla, A., Fischer, D., Rychlewski, L. and Elofsson, A. A study of quality measures for protein threading models 2001 BMC Bioinformatics
Vol. 2(1), pp. 5 
article DOI  
Abstract: BACKGROUND Prediction of protein structures is one of the fundamental challenges in biology today. To fully understand how well different prediction methods perform, it is necessary to use measures that evaluate their performance. Every two years, starting in 1994, the CASP (Critical Assessment of protein Structure Prediction) process has been organized to evaluate the ability of different predictors to blindly predict the structure of proteins. To capture different features of the models, several measures have been developed during the CASP processes. However, these measures have not been examined in detail before. In an attempt to develop fully automatic measures that can be used in CASP, as well as in other type of benchmarking experiments, we have compared twenty-one measures. These measures include the measures used in CASP3 and CASP2 as well as have measures introduced later. We have studied their ability to distinguish between the better and worse models submitted to CASP3 and the correlation between them. RESULTS Using a small set of 1340 models for 23 different targets we show that most methods correlate with each other. Most pairs of measures show a correlation coefficient of about 0.5. The correlation is slightly higher for measures of similar types. We found that a significant problem when developing automatic measures is how to deal with proteins of different length. Also the comparisons between different measures is complicated as many measures are dependent on the size of the target. We show that the manual assessment can be reproduced to about 70% using automatic measures. Alignment independent measures, detects slightly more of the models with the correct fold, while alignment dependent measures agree better when selecting the best models for each target. Finally we show that using automatic measures would, to a large extent, reproduce the assessors ranking of the predictors at CASP3. CONCLUSIONS We show that given a sufficient number of targets the manual and automatic measures would have given almost identical results at CASP3. If the intent is to reproduce the type of scoring done by the manual assessor in in CASP3, the best approach might be to use a combination of alignment independent and alignment dependent measures, as used in several recent studies.
BibTeX:
@article{Cristobal2001,
  author = {Cristobal, Susana and Zemla, Adam and Fischer, Daniel and Rychlewski, Leszek and Elofsson, Arne},
  title = {A study of quality measures for protein threading models},
  journal = {BMC Bioinformatics},
  year = {2001},
  volume = {2},
  number = {1},
  pages = {5},
  doi = {http://dx.doi.org/10.1186/1471-2105-2-5}
}
Cutello, V., Narzisi, G. and Nicosia, G. A multi-objective evolutionary approach to the protein structure prediction problem 2006 Journal of The Royal Society Interface
Vol. 3(6), pp. 139-151 
article DOI  
BibTeX:
@article{Cutello2006,
  author = {Vincenzo Cutello and Giuseppe Narzisi and Giuseppe Nicosia},
  title = {A multi-objective evolutionary approach to the protein structure prediction problem},
  journal = {Journal of The Royal Society Interface},
  year = {2006},
  volume = {3},
  number = {6},
  pages = {139-151},
  note = {Applies MOO for CHARMM27 energy (computed with TINKER).},
  doi = {http://dx.doi.org/10.1098/rsif.2005.0083}
}
Das, R., Qian, B., Raman, S., Vernon, R., Thompson, J., Bradley, P., Khare, S., Tyka, M.D., Bhat, D., Chivian, D., Kim, D.E., Sheffler, W.H., Malmström, L., Wollacott, A.M., Wang, C., Andre, I. and Baker, D. Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home 2007 Proteins: Structure, Function, and Bioinformatics
Vol. 69(S8), pp. 118-128 
article DOI  
Abstract: We describe predictions made using the Rosetta structure prediction methodology for both template-based modeling and free modeling categories in the Seventh Critical Assessment of Techniques for Protein Structure Prediction. For the first time, aggressive sampling and all-atom refinement could be carried out for the majority of targets, an advance enabled by the Rosetta@home distributed computing network. Template-based modeling predictions using an iterative refinement algorithm improved over the best existing templates for the majority of proteins with less than 200 residues. Free modeling methods gave near-atomic accuracy predictions for several targets under 100 residues from all secondary structure classes. These results indicate that refinement with an all-atom energy function, although computationally expensive, is a powerful method for obtaining accurate structure predictions. Proteins 2007. � 2007 Wiley-Liss, Inc.
BibTeX:
@article{Das2007,
  author = {Das, Rhiju and Qian, Bin and Raman, Srivatsan and Vernon, Robert and Thompson, James and Bradley, Philip and Khare, Sagar and Tyka, Michael D. and Bhat, Divya and Chivian, Dylan and Kim, David E. and Sheffler, William H. and Malmström, Lars and Wollacott, Andrew M. and Wang, Chu and Andre, Ingemar and Baker, David},
  title = {Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2007},
  volume = {69},
  number = {S8},
  pages = {118--128},
  doi = {http://dx.doi.org/10.1002/prot.21636}
}
Day, R.O., Lamont, G.B. and Pachter, R. Protein Structure Prediction by Applying an Evolutionary Algorithm 2003 Proceedings of the 17th International Symposium on Parallel and Distributed Processing, pp. 155.1  inproceedings DOI  
BibTeX:
@inproceedings{Day2003,
  author = {Day, Richard O. and Lamont, Gary B. and Pachter, Ruth},
  title = {Protein Structure Prediction by Applying an Evolutionary Algorithm},
  booktitle = {Proceedings of the 17th International Symposium on Parallel and Distributed Processing},
  publisher = {IEEE Computer Society},
  year = {2003},
  pages = {155.1},
  doi = {http://dx.doi.org/10.1109/IPDPS.2003.1213291}
}
Di Lena, P., Fariselli, P., Margara, L., Vassura, M. and Casadio, R. Fast overlapping of protein contact maps by alignment of eigenvectors 2010 Bioinformatics
Vol. 26(18), pp. 2250-2258 
article URL 
Abstract: Motivation: Searching for structural similarity is a key issue of protein functional annotation. The maximum contact map overlap (CMO) is one of the possible measures of protein structure similarity. Exact and approximate methods known to optimize the CMO are computationally expensive and this hampers their applicability to large-scale comparison of protein structures.Results: In this article, we describe a heuristic algorithm (Al-Eigen) for finding a solution to the CMO problem. Our approach relies on the approximation of contact maps by eigendecomposition. We obtain good overlaps of two contact maps by computing the optimal global alignment of few principal eigenvectors. Our algorithm is simple, fast and its running time is independent of the amount of contacts in the map. Experimental testing indicates that the algorithm is comparable to exact CMO methods in terms of the overlap quality, to structural alignment methods in terms of structure similarity detection and it is fast enough to be suited for large-scale comparison of protein structures. Furthermore, our preliminary tests indicates that it is quite robust to noise, which makes it suitable for structural similarity detection also for noisy and incomplete contact maps.Availability: Available at http://bioinformatics.cs.unibo.it/Al-EigenContact: dilena@cs.unibo.itSupplementary information: Supplementary data are available at Bioinformatics online.
BibTeX:
@article{DiLena2010,
  author = {Di Lena, Pietro and Fariselli, Piero and Margara, Luciano and Vassura, Marco and Casadio, Rita},
  title = {Fast overlapping of protein contact maps by alignment of eigenvectors},
  journal = {Bioinformatics},
  year = {2010},
  volume = {26},
  number = {18},
  pages = {2250--2258},
  url = {http://bioinformatics.oxfordjournals.org/content/26/18/2250.abstract}
}
Dill, K.A. Dominant forces in protein folding 1990 Biochemistry
Vol. 29(31), pp. 7133-7155 
article DOI  
Abstract: no abstract
BibTeX:
@article{Dill1990,
  author = {Dill, Ken A.},
  title = {Dominant forces in protein folding},
  journal = {Biochemistry},
  year = {1990},
  volume = {29},
  number = {31},
  pages = {7133--7155},
  doi = {http://dx.doi.org/10.1021/bi00483a001}
}
Dill, K.A. and Chan, H.S. From Levinthal to pathways to funnels 1997 Nat Struct Mol Biol
Vol. 4(1), pp. 10-19 
article DOI  
BibTeX:
@article{Dill1997,
  author = {Dill, Ken A. and Chan, Hue Sun},
  title = {From Levinthal to pathways to funnels},
  journal = {Nat Struct Mol Biol},
  year = {1997},
  volume = {4},
  number = {1},
  pages = {10--19},
  doi = {http://dx.doi.org/10.1038/nsb0197-10}
}
Dinner, A.R., Sali, A., Smith, L.J., Dobson, C.M. and Karplus, M. Understanding protein folding via free-energy surfaces from theory and experiment 2000 Trends in Biochemical Sciences
Vol. 25(7), pp. 331-339 
article DOI  
Abstract: The ability of protein molecules to fold into their highly structured functional states is one of the most remarkable evolutionary achievements of biology. In recent years, our understanding of the way in which this complex self-assembly process takes place has increased dramatically. Much of the reason for this advance has been the development of energy surfaces (landscapes), which allow the folding reaction to be described and visualized in a meaningful manner. Analysis of these surfaces, derived from the constructive interplay between theory and experiment, has led to the development of a unified mechanism for folding and a recognition of the underlying factors that control the rates and products of the folding process.
BibTeX:
@article{Dinner2000,
  author = {Dinner, Aaron R. and Sali, Andrej and Smith, Lorna J. and Dobson, Christopher M. and Karplus, Martin},
  title = {Understanding protein folding via free-energy surfaces from theory and experiment},
  journal = {Trends in Biochemical Sciences},
  year = {2000},
  volume = {25},
  number = {7},
  pages = {331--339},
  doi = {http://dx.doi.org/10.1016/S0968-0004(00)01610-8}
}
Djurdjevic, D.P. and Biggs, M.J. Ab initio protein fold prediction using evolutionary algorithms: Influence of design and control parameters on performance 2006 Journal of Computational Chemistry
Vol. 27(11), pp. 1177-1195 
article DOI  
Abstract: True ab initio prediction of protein 3D structure requires only the protein primary structure, a physicochemical free energy model, and a search method for identifying the free energy global minimum. Various characteristics of evolutionary algorithms (EAs) mean they are in principle well suited to the latter. Studies to date have been less than encouraging, however. This is because of the limited consideration given to EA design and control parameter issues. A comprehensive study of these issues was, therefore, undertaken for ab initio protein fold prediction using a full atomistic protein model. The performance and optimal control parameter settings of twelve EA designs where first established using a 15-residue polyalanine molecule�-�design aspects varied include the encoding alphabet, crossover operator, and replacement strategy. It can be concluded that real encoding and multipoint crossover are superior, while both generational and steady-state replacement strategies have merits. The scaling between the optimal control parameter settings and polyalanine size was also identified for both generational and steady-state designs based on real encoding and multipoint crossover. Application of the steady-state design to met-enkephalin indicated that these scalings are potentially transferable to real proteins. Comparison of the performance of the steady state design for met-enkephalin with other ab initio methods indicates that EAs can be competitive provided the correct design and control parameter values are used. � 2006 Wiley Periodicals, Inc. J Comput Chem 27: 1177-1195, 2006
BibTeX:
@article{Djurdjevic2006,
  author = {Djurdjevic, Dusan P. and Biggs, Mark J.},
  title = {Ab initio protein fold prediction using evolutionary algorithms: Influence of design and control parameters on performance},
  journal = {Journal of Computational Chemistry},
  year = {2006},
  volume = {27},
  number = {11},
  pages = {1177--1195},
  doi = {http://dx.doi.org/10.1002/jcc.20440}
}
Dobson, C.M., Sali, A. and Karplus, M. Protein Folding: A Perspective from Theory and Experiment 1998 Angewandte Chemie International Edition
Vol. 37(7), pp. 868-893 
article DOI  
BibTeX:
@article{Dobson1998,
  author = {Christopher M. Dobson and Andrej Sali and Martin Karplus},
  title = {Protein Folding: A Perspective from Theory and Experiment},
  journal = {Angewandte Chemie International Edition},
  year = {1998},
  volume = {37},
  number = {7},
  pages = {868--893},
  doi = {http://dx.doi.org/10.1002/(SICI)1521-3773(19980420)37:7<868::AID-ANIE868>3.0.CO;2-H}
}
Duan, Y. and Kollman, P.A. Computational protein folding: From lattice to all-atom 2001 IBM Systems Journal
Vol. 40(2), pp. 297-309 
article DOI  
Abstract: Understanding the mechanism of protein folding is often referred to as the second half of genetics. Computational approaches have been instrumental in the efforts. Simplified models have been applied to understand the physical principles governing the folding processes and will continue to play important roles in the endeavor. Encouraging results have been obtained from all-atom molecular dynamics simulations of protein folding. A recent microsecond-length molecular dynamics simulation on a small protein, villin headpiece subdomain, with an explicit atomic-level representation of both protein and solvent, has marked the beginning of direct and realistic simulations of the folding processes. With growing computer power and increasingly accurate representations together with the advancement of experimental methods, such approaches will help us to achieve a detailed understanding of protein folding mechanisms.
BibTeX:
@article{Duan2001,
  author = {Y. Duan and P. A. Kollman},
  title = {Computational protein folding: From lattice to all-atom},
  journal = {IBM Systems Journal},
  year = {2001},
  volume = {40},
  number = {2},
  pages = {297--309},
  doi = {http://dx.doi.org/10.1147/sj.402.0297}
}
Dudley, J., Pouliot, Y., Chen, R., Morgan, A. and Butte, A. Translational bioinformatics in the cloud: an affordable alternative 2010 Genome Medicine
Vol. 2(8), pp. 51- 
article DOI  
Abstract: With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine.
BibTeX:
@article{Dudley2010,
  author = {Dudley, Joel and Pouliot, Yannick and Chen, Rong and Morgan, Alexander and Butte, Atul},
  title = {Translational bioinformatics in the cloud: an affordable alternative},
  journal = {Genome Medicine},
  year = {2010},
  volume = {2},
  number = {8},
  pages = {51--},
  doi = {http://dx.doi.org/10.1186/gm172}
}
Dwork, C., Kumar, R., Naor, M. and Sivakumar, D. Rank aggregation methods for the Web 2001 Proceedings of the 10th international conference on World Wide Web, pp. 613-622  inproceedings DOI  
BibTeX:
@inproceedings{Dwork2001,
  author = {Dwork, Cynthia and Kumar, Ravi and Naor, Moni and Sivakumar, D.},
  title = {Rank aggregation methods for the Web},
  booktitle = {Proceedings of the 10th international conference on World Wide Web},
  publisher = {ACM},
  year = {2001},
  pages = {613--622},
  doi = {http://dx.doi.org/10.1145/371920.372165}
}
Earl, D.J. and Deem, M.W. Parallel tempering: Theory, applications, and new perspectives 2005 Physical Chemistry Chemical Physics
Vol. 7(23), pp. 3910-3916 
article DOI  
Abstract: We review the history of the parallel tempering simulation method. From its origins in data analysis, the parallel tempering method has become a standard workhorse of physicochemical simulations. We discuss the theory behind the method and its various generalizations. We mention a selected set of the many applications that have become possible with the introduction of parallel tempering, and we suggest several promising avenues for future research.
BibTeX:
@article{Earl2005,
  author = {Earl, David J. and Deem, Michael W.},
  title = {Parallel tempering: Theory, applications, and new perspectives},
  journal = {Physical Chemistry Chemical Physics},
  publisher = {The Royal Society of Chemistry},
  year = {2005},
  volume = {7},
  number = {23},
  pages = {3910--3916},
  doi = {http://dx.doi.org/10.1039/b509983h}
}
Etro, F. The Economic Impact of Cloud Computing on Business Creation, Employment and Output in Europe 2009 Review of Business and Economics
Vol. 54(2), pp. 179-208 
article URL 
BibTeX:
@article{Etro2009,
  author = {Federico Etro},
  title = {The Economic Impact of Cloud Computing on Business Creation, Employment and Output in Europe},
  journal = {Review of Business and Economics},
  year = {2009},
  volume = {54},
  number = {2},
  pages = {179-208},
  url = {http://www.intertic.org/Policy%20Papers/RBE.pdf}
}
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. and Lin, C.-J. LIBLINEAR: A Library for Large Linear Classification 2008 Journal of Machine Learning Research
Vol. 9, pp. 1871-1874 
article URL 
BibTeX:
@article{Fan2008,
  author = {Fan, Rong-En and Chang, Kai-Wei and Hsieh, Cho-Jui and Wang, Xiang-Rui and Lin, Chih-Jen},
  title = {LIBLINEAR: A Library for Large Linear Classification},
  journal = {Journal of Machine Learning Research},
  year = {2008},
  volume = {9},
  pages = {1871--1874},
  url = {http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf}
}
Fang, Q. and Shortle, D. A consistent set of statistical potentials for quantifying local side-chain and backbone interactions 2005 Proteins: Structure, Function, and Bioinformatics
Vol. 60(1), pp. 90-96 
article DOI  
Abstract: The frequencies of occurrence of atom arrangements in high-resolution protein structures provide some of the most accurate quantitative measures of interaction energies in proteins. In this report we extend our development of a consistent set of statistical potentials for quantifying local interactions between side-chains and the polypeptide backbone, as well as nearby side-chains. Starting with &phis;/?/?1 propensities that select for optimal interactions of the 20 amino acid side-chains with the 2 flanking peptide bonds, the following 3 new terms are added: (1) a distance-dependent interaction between the side-chain at i and the carbonyl oxygens and amide protons of the peptide units at i � 2, i � 3, and i � 4; (2) a distance-dependent interaction between the side-chain at position i and side-chains at positions i + 1 through i + 4; and (3) an orientation-dependent interaction between the side-chain at position i and side-chains at i + 1 through i + 4. The relative strengths of these 4 pseudo free energy terms are estimated by the average information content of each scoring matrix and by assessing their performance in a simple fragment threading test. They vary from -0.4--0.5 kcal/mole per residue for &phis;/?/?1 propensities to a range of -0.15--0.6 kcal/mole per residue for each of the other 3 terms. The combined energy function, containing no interactions between atoms more than 4 residues apart, identifies the correct structural fragment for randomly selected 15 mers over 40% of the time, after searching through 232,000 alternative conformations. For 14 out of 20 sets of all-atom Rosetta decoys analyzed, the native structure has a combined score lower than any of the 1700-1900 decoy conformations. The ability of this energy function to detect energetically important details of local structure is demonstrated by its power to distinguish high-resolution crystal structures from NMR solution structures. Proteins 2005. � 2005 Wiley-Liss, Inc.
BibTeX:
@article{Fang2005,
  author = {Fang, Qiaojun and Shortle, David},
  title = {A consistent set of statistical potentials for quantifying local side-chain and backbone interactions},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2005},
  volume = {60},
  number = {1},
  pages = {90--96},
  doi = {http://dx.doi.org/10.1002/prot.20482}
}
Fischer, D. Servers for protein structure prediction 2006 Current Opinion in Structural Biology
Vol. 16(2)Theory and simulation/Macromolecular assemblages - Joel Janin and Michael Levitt/Edward H Egelman and Andrew GW Leslie, pp. 178-182 
article DOI  
Abstract: The 1990s cultivated a generation of protein structure human predictors. As a result of structural genomics and genome sequencing projects, and significant improvements in the performance of protein structure prediction methods, a generation of automated servers has evolved in the past few years. Servers for close and distant homology modeling are now routinely used by many biologists, and have already been applied to the experimental structure determination process itself, and to the interpretation and annotation of genome sequences. Because dozens of servers are currently available, it is hard for a biologist to know which server(s) to use; however, the state of the art of these methods is now assessed through the LiveBench and CAFASP experiments. Meta-servers -- servers that use the results of other autonomous servers to produce a consensus prediction -- have proven to be the best performers, and are already challenging all but a handful of expert human predictors. The difference in performance of the top ten autonomous (non-meta) servers is small and hard to assess using relatively small test sets. Recent experiments suggest that servers will soon free humans from most of the burden of protein structure prediction.
BibTeX:
@article{Fischer2006,
  author = {Fischer, Daniel},
  title = {Servers for protein structure prediction},
  booktitle = {Theory and simulation/Macromolecular assemblages - Joel Janin and Michael Levitt/Edward H Egelman and Andrew GW Leslie},
  journal = {Current Opinion in Structural Biology},
  year = {2006},
  volume = {16},
  number = {2},
  pages = {178--182},
  doi = {http://dx.doi.org/10.1016/j.sbi.2006.03.004}
}
Fischer, D. 3D-SHOTGUN: A novel, cooperative, fold-recognition meta-predictor 2003 Proteins: Structure, Function, and Genetics
Vol. 51(3), pp. 434-441 
article DOI  
Abstract: To gain a better understanding of the biological role of proteins encoded in genome sequences, knowledge of their three-dimensional (3D) structure and function is required. The computational assignment of folds is becoming an increasingly important complement to experimental structure determination. In particular, fold-recognition methods aim to predict approximate 3D models for proteins bearing no sequence similarity to any protein of known structure. However, fully automated structure-prediction methods can currently produce reliable models for only a fraction of these sequences. Using a number of semiautomated procedures, human expert predictors are often able to produce more and better predictions than automated methods. We describe a novel, fully automatic, fold-recognition meta-predictor, named 3D-SHOTGUN, which incorporates some of the strategies human predictors have successfully applied. This new method is reminiscent of the so-called cooperative algorithms of Computer Vision. The input to 3D-SHOTGUN are the top models predicted by a number of independent fold-recognition servers. The meta-predictor consists of three steps: (i) assembly of hybrid models, (ii) confidence assignment, and (iii) selection. We have applied 3D-SHOTGUN to an unbiased test set of 77 newly released protein structures sharing no sequence similarity to proteins previously released. Forty-six correct rank-1 predictions were obtained, 30 of which had scores higher than that of the first incorrect prediction�-�a significant improvement over the performance of all individual servers. Furthermore, the predicted hybrid models were, on average, more similar to their corresponding native structures than those produced by the individual servers. This opens the possibility of generating more accurate, full-atom homology models for proteins with no sequence similarity to proteins of known structure. These improvements represent a step forward toward the wider applicability of fully automated structure-prediction methods at genome scales.
BibTeX:
@article{Fischer2003,
  author = {Fischer, Daniel},
  title = {3D-SHOTGUN: A novel, cooperative, fold-recognition meta-predictor},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {2003},
  volume = {51},
  number = {3},
  pages = {434--441},
  doi = {http://dx.doi.org/10.1002/prot.10357}
}
Fischer, D. and Barret, C. CAFASP-1: Critical assessment of fully automated structure prediction methods 1999 Proteins: Structure, Function, and Genetics
Vol. 37(S3), pp. 209-217 
article  
BibTeX:
@article{Fischer1999,
  author = {Daniel Fischer and Christian Barret},
  title = {CAFASP-1: Critical assessment of fully automated structure prediction methods},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {1999},
  volume = {37},
  number = {S3},
  pages = {209--217}
}
Fitzgerald, J.E., Jha, A.K., Colubri, A., Sosnick, T.R. and Freed, K.F. Reduced C-beta statistical potentials can outperform all-atom potentials in decoy identification 2007 Protein Science
Vol. 16(10), pp. 2123-2139 
article DOI  
Abstract: We developed a series of statistical potentials to recognize the native protein from decoys, particularly when using only a reduced representation in which each side chain is treated as a single Cbeta atom. Beginning with a highly successful all-atom statistical potential, the Discrete Optimized Protein Energy function (DOPE), we considered the implications of including additional information in the all-atom statistical potential and subsequently reducing to the Cbeta representation. One of the potentials includes interaction energies conditional on backbone geometries. A second potential separates sequence local from sequence nonlocal interactions and introduces a novel reference state for the sequence local interactions. The resultant potentials perform better than the original DOPE statistical potential in decoy identification. Moreover, even upon passing to a reduced Cbeta representation, these statistical potentials outscore the original (all-atom) DOPE potential in identifying native states for sets of decoys. Interestingly, the backbone-dependent statistical potential is shown to retain nearly all of the information content of the all-atom representation in the Cbeta representation. In addition, these new statistical potentials are combined with existing potentials to model hydrogen bonding, torsion energies, and solvation energies to produce even better performing potentials. The ability of the Cbeta statistical potentials to accurately represent protein interactions bodes well for computational efficiency in protein folding calculations using reduced backbone representations, while the extensions to DOPE illustrate general principles for improving knowledge-based potentials.
BibTeX:
@article{Fitzgerald2007,
  author = {Fitzgerald, James E. and Jha, Abhishek K. and Colubri, Andres and Sosnick, Tobin R. and Freed, Karl F.},
  title = {Reduced C-beta statistical potentials can outperform all-atom potentials in decoy identification},
  journal = {Protein Science},
  year = {2007},
  volume = {16},
  number = {10},
  pages = {2123--2139},
  doi = {http://dx.doi.org/10.1110/ps.072939707}
}
Folino, G., Shah, A. and Kransnogor, N. On the storage, management and analysis of (multi) similarity for large scale protein structure datasets in the grid 2009 22nd IEEE International Symposium on Computer-Based Medical Systems (CBMS 2009), pp. 1-8  inproceedings DOI  
Abstract: Assessment of the (Multi) Similarity among a set of protein structures is achieved through an ensemble of protein structure comparison methods/algorithms. This leads to the generation of a multitude of data that varies both in type and size. After passing through standardization and normalization, this data is further used in consensus development; providing domain independent and highly reliable view of the assessment of (di)similarities. This paper briefly describes some of the techniques used for the estimation of missing/invalid values resulting from the process of multi-comparison of very large scale datasets in a distributed/grid environment. This is followed by an empirical study on the storage capacity and query processing time required to cope with the results of such comparisons. In particular we investigate and compare the storage/query overhead of two commonly used database technologies such as the Hierarchical Data Format (HDF) (HDF5) and Relational Database Management System (RDBMS) (Oracle/SQL) in terms of our application deployed on the National Grid Service (NGS), UK. As the technologies explored under this investigation are quite generic in the science and engineering domain, our findings would also be beneficial for other scientific applications having related magnitude of data and functionality.
BibTeX:
@inproceedings{Folino2009,
  author = {Folino, G. and Shah, A.A. and Kransnogor, N.},
  title = {On the storage, management and analysis of (multi) similarity for large scale protein structure datasets in the grid},
  booktitle = {22nd IEEE International Symposium on Computer-Based Medical Systems (CBMS 2009)},
  year = {2009},
  pages = {1--8},
  doi = {http://dx.doi.org/10.1109/CBMS.2009.5255328}
}
Fortnow, L. The status of the P versus NP problem 2009 Communications of the ACM
Vol. 52(9), pp. 78-86 
article DOI  
BibTeX:
@article{Fortnow2009,
  author = {Fortnow, Lance},
  title = {The status of the P versus NP problem},
  journal = {Communications of the ACM},
  publisher = {ACM},
  year = {2009},
  volume = {52},
  number = {9},
  pages = {78--86},
  doi = {http://dx.doi.org/10.1145/1562164.1562186}
}
Fujitsuka, Y., Chikenji, G. and Takada, S. SimFold energy function for de novo protein structure prediction: Consensus with Rosetta 2006 Proteins: Structure, Function, and Bioinformatics
Vol. 62(2), pp. 381-98 
article DOI  
BibTeX:
@article{Fujitsuka2006,
  author = {Fujitsuka, Yoshimi and Chikenji, George and Takada, Shoji},
  title = {SimFold energy function for de novo protein structure prediction: Consensus with Rosetta},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2006},
  volume = {62},
  number = {2},
  pages = {381--98},
  doi = {http://dx.doi.org/10.1002/prot.20748}
}
Gagné, C. and Parizeau, M. Genericity in Evolutionary Computation Software Tools: Principles and Case-study 2006 International Journal on Artificial Intelligence Tools
Vol. 15(2), pp. 173-194 
article DOI  
Abstract: This paper deals with the need for generic software development tools in evolutionary computations (EC). These tools will be essential for the next generation of evolutionary algorithms where application designers and researchers will need to mix different combinations of traditional EC (e.g. genetic algorithms, genetic programming, evolutionary strategies, etc.), or to create new variations of these EC, in order to solve complex real world problems. Six basic principles are proposed to guide the development of such tools. These principles are then used to evaluate six freely available, widely used EC software tools. Finally, the design of Open BEAGLE, the framework developed by the authors, is presented in more detail.
BibTeX:
@article{Gagne2006,
  author = {Christian Gagné and Marc Parizeau},
  title = {Genericity in Evolutionary Computation Software Tools: Principles and Case-study},
  journal = {International Journal on Artificial Intelligence Tools},
  year = {2006},
  volume = {15},
  number = {2},
  pages = {173-194},
  doi = {http://dx.doi.org/10.1142/S021821300600262X}
}
Garcia, R. and Nickels, T. Uploadify - a multiple file upload plugin for jQuery   webpage URL 
BibTeX:
@webpage{URL_UPLOADIFY,
  author = {Ronnie Garcia and Travis Nickels},
  title = {Uploadify - a multiple file upload plugin for jQuery},
  url = {http://www.uploadify.com/}
}
Garey, M.R. and Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completeness 1979 , pp. -338  book  
BibTeX:
@book{Garey1979,
  author = {Garey, Michael R. and Johnson, David S.},
  title = {Computers and Intractability: A Guide to the Theory of NP-Completeness},
  publisher = {W. H. Freeman & Co.},
  year = {1979},
  pages = {--338}
}
Gendreau, M. and Potvin, J.-Y. Metaheuristics in Combinatorial Optimization 2005 Annals of Operations Research
Vol. 140(1), pp. 189-213 
article DOI  
Abstract: The emergence of metaheuristics for solving difficult combinatorial optimization problems is one of the most notable achievements of the last two decades in operations research. This paper provides an account of the most recent developments in the field and identifies some common issues and trends. Examples of applications are also reported for vehicle routing and scheduling problems.
BibTeX:
@article{Gendreau2005,
  author = {Gendreau, Michel and Potvin, Jean-Yves},
  title = {Metaheuristics in Combinatorial Optimization},
  journal = {Annals of Operations Research},
  year = {2005},
  volume = {140},
  number = {1},
  pages = {189--213},
  doi = {http://dx.doi.org/10.1007/s10479-005-3971-7}
}
Ginalski, K., Elofsson, A., Fischer, D. and Rychlewski, L. 3D-Jury: a simple approach to improve protein structure predictions 2003 Bioinformatics
Vol. 19(8), pp. 1015-1018 
article DOI  
Abstract: Motivation: Consensus structure prediction methods (meta-predictors) have higher accuracy than individual structure prediction algorithms (their components). The goal for the development of the 3D-Jury system is to create a simple but powerful procedure for generating meta-predictions using variable sets of models obtained from diverse sources. The resulting protocol should help to improve the quality of structural annotations of novel proteins. Results: The 3D-Jury system generates meta-predictions from sets of models created using variable methods. It is not necessary to know prior characteristics of the methods. The system is able to utilize immediately new components (additional prediction providers). The accuracy of the system is comparable with other well-tuned prediction servers. The algorithm resembles methods of selecting models generated using ab initio folding simulations. It is simple and offers a portable solution to improve the accuracy of other protein structure prediction protocols. Availability: The 3D-Jury system is available via the Structure Prediction Meta Server (http://BioInfo.PL/Meta/) to the academic community. Contact: leszek@bioinfo.pl Supplementary information: 3D-Jury is coupled to the continuous online server evaluation program, LiveBench (http://BioInfo.PL/LiveBench/)
BibTeX:
@article{Ginalski2003,
  author = {Ginalski, Krzysztof and Elofsson, Arne and Fischer, Daniel and Rychlewski, Leszek},
  title = {3D-Jury: a simple approach to improve protein structure predictions},
  journal = {Bioinformatics},
  year = {2003},
  volume = {19},
  number = {8},
  pages = {1015--1018},
  doi = {http://dx.doi.org/10.1093/bioinformatics/btg124}
}
Ginalski, K., Grishin, N.V., Godzik, A. and Rychlewski, L. Practical lessons from protein structure prediction 2005 Nucl. Acids Res.
Vol. 33(6), pp. 1874-1891 
article DOI  
Abstract: Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed.
BibTeX:
@article{Ginalski2005,
  author = {Ginalski, Krzysztof and Grishin, Nick V. and Godzik, Adam and Rychlewski, Leszek},
  title = {Practical lessons from protein structure prediction},
  journal = {Nucl. Acids Res.},
  year = {2005},
  volume = {33},
  number = {6},
  pages = {1874--1891},
  doi = {http://dx.doi.org/10.1093/nar/gki327}
}
Glover, F. Tabu Serach -- Part II 1990 ORSA Journal on Computing
Vol. 2(1), pp. 4-32 
article  
BibTeX:
@article{Glover1990,
  author = {Fred Glover},
  title = {Tabu Serach -- Part II},
  journal = {ORSA Journal on Computing},
  year = {1990},
  volume = {2},
  number = {1},
  pages = {4-32}
}
Glover, F. Tabu Serach -- Part I 1989 ORSA Journal on Computing
Vol. 1(3), pp. 190-206 
article  
BibTeX:
@article{Glover1989,
  author = {Fred Glover},
  title = {Tabu Serach -- Part I},
  journal = {ORSA Journal on Computing},
  year = {1989},
  volume = {1},
  number = {3},
  pages = {190-206}
}
Godzik, A., Skolnick, J. and Kolinski, A. Regularities in interaction patterns of globular proteins 1993 Protein Engineering Design and Selection
Vol. 6(8), pp. 801-810 
article DOI  
Abstract: The description of protein structure in the language of side chain contact maps is shown to offer many advantages over more traditional approaches. Because it focuses on side chain interactions, it aids in the discovery, study and classification of similarities between interactions defining particular protein folds and offers new insights into the rules of protein structure. For example, there is a small number of characteristic patterns of interactions between protein supersecondary structural fragments, which can be seen in various non-related proteins. Furthermore, the overlap of the side chain contact maps of two proteins provides a new measure of protein structure similarity. As shown in several examples, alignments based on contact map overlaps are a powerful alternative to other structure-based alignments.
BibTeX:
@article{Godzik1993,
  author = {Godzik, Adam and Skolnick, Jeffrey and Kolinski, Andrzej},
  title = {Regularities in interaction patterns of globular proteins},
  journal = {Protein Engineering Design and Selection},
  year = {1993},
  volume = {6},
  number = {8},
  pages = {801--810},
  note = {Max-CMO measure definition.},
  doi = {http://dx.doi.org/10.1093/protein/6.8.801}
}
Goldberg, D.E. and Deb, K. A Comparative Analysis of Selection Schemes Used in Genetic Algorithms 1991 Foundations of Genetic Algorithms, pp. 69-93  inproceedings  
BibTeX:
@inproceedings{Goldberg1991,
  author = {David E. Goldberg and Kalyanmoy Deb},
  title = {A Comparative Analysis of Selection Schemes Used in Genetic Algorithms},
  booktitle = {Foundations of Genetic Algorithms},
  publisher = {Morgan Kaufmann},
  year = {1991},
  pages = {69--93}
}
Goldman, D., Papadimitriou, C.H. and Istrail, S. Algorithmic Aspects of Protein Structure Similarity 1999 FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, pp. 512-512  inproceedings DOI  
Abstract: We show that calculating contact map overlap (a measure of similarity of protein structures) is NP-hard, but can be solved in polynomial time for several interesting and relevant special cases. We identify an important special case of this problem corresponding to self-avoiding walks, and prove a decomposition theorem and a corollary approximation result for this special case. These are the first approximation algorithms with guaranteed error bounds, and NP-completeness results in the literature in the area of protein structure alignment/fold recognition for measures of structure similarity of practical interest.
BibTeX:
@inproceedings{Goldman1999,
  author = {Goldman, Deborah and Papadimitriou, Christos H. and Istrail, Sorin},
  title = {Algorithmic Aspects of Protein Structure Similarity},
  booktitle = {FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science},
  publisher = {IEEE Computer Society},
  year = {1999},
  pages = {512--512},
  doi = {http://dx.doi.org/10.1109/SFFCS.1999.814624}
}
Goldovsky, L., Janssen, P., Ahren, D., Audit, B., Cases, I., Darzentas, N., Enright, A.J., Lopez-Bigas, N., Peregrin-Alvarez, J.M., Smith, M., Tsoka, S., Kunin, V. and Ouzounis, C.A. CoGenT++: an extensive and extensible data environment for computational genomics 2005 Bioinformatics
Vol. 21(19), pp. 3806-3810 
article DOI  
Abstract: Motivation: CoGenT++ is a data environment for computational research in comparative and functional genomics, designed to address issues of consistency, reproducibility, scalability and accessibility. Description: CoGenT++ facilitates the re-distribution of all fully sequenced and published genomes, storing information about species, gene names and protein sequences. We describe our scalable implementation of ProXSim, a continually updated all-against-all similarity database, which stores pairwise relationships between all genome sequences. Based on these similarities, derived databases are generated for gene fusions--AllFuse, putative orthologs--OFAM, protein families--TRIBES, phylogenetic profiles--ProfUse and phylogenetic trees. Extensions based on the CoGenT++ environment include disease gene prediction, pattern discovery, automated domain detection, genome annotation and ancestral reconstruction. Conclusion: CoGenT++ provides a comprehensive environment for computational genomics, accessible primarily for large-scale analyses as well as manual browsing. Availability: The database and component downloads are accessible at http://cgg.ebi.ac.uk/cogentpp.html. Contact: ouzounis@ebi.ac.uk
BibTeX:
@article{Goldovsky2005,
  author = {Goldovsky, Leon and Janssen, Paul and Ahren, Dag and Audit, Benjamin and Cases, Ildefonso and Darzentas, Nikos and Enright, Anton J. and Lopez-Bigas, Nuria and Peregrin-Alvarez, Jose M. and Smith, Mike and Tsoka, Sophia and Kunin, Victor and Ouzounis, Christos A.},
  title = {CoGenT++: an extensive and extensible data environment for computational genomics},
  journal = {Bioinformatics},
  year = {2005},
  volume = {21},
  number = {19},
  pages = {3806-3810},
  doi = {http://dx.doi.org/10.1093/bioinformatics/bti579}
}
Gramm, J. A polynomial-time algorithm for the matching of crossing contact-map patterns 2004 Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Vol. 1(4), pp. 171-180 
article DOI  
Abstract: Contact maps are a model to capture the core information in the structure of biological molecules, e.g., proteins. A contact map consists of an ordered set S of elements (representing a protein's sequence of amino acids), and a set A of element pairs of S, called arcs (representing amino acids which are closely neighbored in the structure). Given two contact maps (S, A) and (S/sub p/, A/sub p/) with |A| /spl ges/ |A/sub p/| the contact map pattern matching (CMPM) problem asks whether the "pattern" (S/sub p/, A/sub p/) "occurs" in (S, A), i.e., informally stated, whether there is a subset of |A/sub p/| arcs in A whose arc structure coincides with A/sub p/. CMPM captures the biological question of finding structural motifs in protein structures. In general, CMPM is NP-hard. In this paper, we show that CMPM is solvable in O(|A|/sup 6/|A/sub p/| time when the pattern is <, -structured, i.e., when each two arcs in the pattern are disjoint or crossing. Our algorithm extends to other closely related models. In particular, it answers an open question raised by Vialette that, rephrased in terms of contact maps, asked whether CMPM for <, -structured patterns is NP-hard or solvable in polynomial time. Our result stands in sharp contrast to the NP-hardness of closely related problems. We provide experimental results which show that contact maps derived from real protein structures can be processed efficiently.
BibTeX:
@article{Gramm2004,
  author = {Gramm, J.},
  title = {A polynomial-time algorithm for the matching of crossing contact-map patterns},
  journal = {Computational Biology and Bioinformatics, IEEE/ACM Transactions on},
  year = {2004},
  volume = {1},
  number = {4},
  pages = {171--180},
  doi = {http://dx.doi.org/10.1109/TCBB.2004.35}
}
Greene, L.H., Lewis, T.E., Addou, S., Cuff, A., Dallman, T., Dibley, M., Redfern, O., Pearl, F., Nambudiry, R., Reid, A., Sillitoe, I., Yeats, C., Thornton, J.M. and Orengo, C.A. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution 2007 Nucl. Acids Res.
Vol. 35(suppl1), pp. D291-297 
article DOI  
Abstract: We report the latest release (version 3.0) of the CATH protein domain database (http://www.cathdb.info). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto [~]2 million sequences in completed genomes and UniProt.
BibTeX:
@article{Greene2007,
  author = {Greene, Lesley H. and Lewis, Tony E. and Addou, Sarah and Cuff, Alison and Dallman, Tim and Dibley, Mark and Redfern, Oliver and Pearl, Frances and Nambudiry, Rekha and Reid, Adam and Sillitoe, Ian and Yeats, Corin and Thornton, Janet M. and Orengo, Christine A.},
  title = {The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution},
  journal = {Nucl. Acids Res.},
  year = {2007},
  volume = {35},
  number = {suppl1},
  pages = {D291--297},
  doi = {http://dx.doi.org/10.1093/nar/gkl959}
}
Gront, D. and Kolinski, A. Utility library for structural bioinformatics 2008 Bioinformatics
Vol. 24(4), pp. 584-585 
article DOI  
Abstract: Summary: In this Note we present a new software library for structural bioinformatics. The library contains programs, computing sequence- and profile-based alignments and a variety of structural calculations with user-friendly handling of various data formats. The software organization is very flexible. Algorithms are written in Java language and may be used by Java programs. Moreover the modules can be accessed from Jython (Python scripting language implemented in Java) scripts. Finally, the new version of BioShell delivers several utility programs that can do typical bioinformatics task from a command-line level. Availability The software is available for download free of charge from its website: http://bioshell.chem.uw.edu.pl. This website provides also numerous examples, code snippets and API documentation. Contact: dgront@chem.uw.edu.pl
BibTeX:
@article{Gront2008,
  author = {Gront, Dominik and Kolinski, Andrzej},
  title = {Utility library for structural bioinformatics},
  journal = {Bioinformatics},
  year = {2008},
  volume = {24},
  number = {4},
  pages = {584--585},
  doi = {http://dx.doi.org/10.1093/bioinformatics/btm627}
}
Gupta, N., Mangal, N. and Biswas, S. Evolution and similarity evaluation of protein structures in contact map space 2005 Proteins
Vol. 59(2), pp. 196-204 
article DOI  
Abstract: Prediction of fold from amino acid sequence of a protein has been an active area of research in the past few years, but the limited accuracy of existing techniques emphasizes the need to develop newer approaches to tackle this task. In this study, we use contact map prediction as an intermediate step in fold prediction from sequence. Contact map is a reduced graph-theoretic representation of proteins that models the local and global inter-residue contacts in the structure. We start with a population of random contact maps for the protein sequence and "evolve" the population to a "high-feasibility" configuration using a genetic algorithm. A neural network is employed to assess the feasibility of contact maps based on their 4 physically relevant properties. We also introduce 5 parameters, based on algebraic graph theory and physical considerations, that can be used to judge the structural similarity between proteins through contact maps. To predict the fold of a given amino acid sequence, we predict a contact map that will sufficiently approximate the structure of the corresponding protein. Then we assess the similarity of this contact map with the representative contact map of each fold; the fold that corresponds to the closest match is our predicted fold for the input sequence. We have found that our feasibility measure is able to differentiate between feasible and infeasible contact maps. Further, this novel approach is able to predict the folds from sequences significantly better than a random predictor.
BibTeX:
@article{Gupta2005,
  author = {Nitin Gupta and Nitin Mangal and Somenath Biswas},
  title = {Evolution and similarity evaluation of protein structures in contact map space},
  journal = {Proteins},
  year = {2005},
  volume = {59},
  number = {2},
  pages = {196--204},
  doi = {http://dx.doi.org/10.1002/prot.20415}
}
Hardin, C., Eastwood, M.P., Prentiss, M., Luthey-Schulten, Z. and Wolynes, P.G. Folding funnels: The key to robust protein structure prediction 2002 Journal of Computational Chemistry
Vol. 23(1), pp. 138-146 
article DOI  
Abstract: Natural proteins fold because their free energy landscapes are funneled to their native states. The degree to which a model energy function for protein structure prediction can avoid the multiple minima problem and reliably yield at least low-resolution predictions is also dependent on the topography of the energy landscape. We show that the degree of funneling can be quantitatively expressed in terms of a few averaged properties of the landscape. This allows us to optimize simplified energy functions for protein structure prediction even in the absence of homology information. Here we outline the optimization procedure in the context of associative memory energy functions originally introduced for tertiary structure recognition and demonstrate that even partially funneled landscapes lead to qualitatively correct, low-resolution predictions. � 2002 John Wiley & Sons, Inc. J Comput Chem 23: 138-146, 2002
BibTeX:
@article{Hardin2002,
  author = {Hardin, Corey and Eastwood, Michael P. and Prentiss, Michael and Luthey-Schulten, Z. and Wolynes, Peter G.},
  title = {Folding funnels: The key to robust protein structure prediction},
  journal = {Journal of Computational Chemistry},
  year = {2002},
  volume = {23},
  number = {1},
  pages = {138--146},
  doi = {http://dx.doi.org/10.1002/jcc.1162}
}
Harding, S., Miller, J.F. and Banzhaf, W. Evolution, development and learning using self-modifying cartesian genetic programming 2009 Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pp. 699-706  inproceedings DOI  
BibTeX:
@inproceedings{Harding2009,
  author = {Harding, Simon and Miller, Julian F. and Banzhaf, Wolfgang},
  title = {Evolution, development and learning using self-modifying cartesian genetic programming},
  booktitle = {Proceedings of the 11th Annual conference on Genetic and evolutionary computation},
  year = {2009},
  pages = {699--706},
  doi = {http://dx.doi.org/10.1145/1569901.1569998}
}
Hasegawa, H. and Holm, L. Advances and pitfalls of protein structural alignment 2009 Current Opinion in Structural Biology
Vol. 19(3), pp. 341-348 
article DOI  
Abstract: Structure comparison opens a window into the distant past of protein evolution, which has been unreachable by sequence comparison alone. With 55000 entries in the Protein Data Bank and about 500 new structures added each week, automated processing, comparison, and classification are necessary. A variety of methods use different representations, scoring functions, and optimization algorithms, and they generate contradictory results even for moderately distant structures. Sequence mutations, insertions, and deletions are accommodated by plastic deformations of the common core, retaining the precise geometry of the active site, and peripheral regions may refold completely. Therefore structure comparison methods that allow for flexibility and plasticity generate the most biologically meaningful alignments. Active research directions include both the search for fold invariant features and the modeling of structural transitions in evolution. Advances have been made in algorithmic robustness, multiple alignment, and speeding up database searches.
BibTeX:
@article{Hasegawa2009,
  author = {Hasegawa, Hitomi and Holm, Liisa},
  title = {Advances and pitfalls of protein structural alignment},
  journal = {Current Opinion in Structural Biology},
  year = {2009},
  volume = {19},
  number = {3},
  pages = {341--348},
  doi = {http://dx.doi.org/10.1016/j.sbi.2009.04.003}
}
Hayes, B. Prototeins 1998 American Scientist
Vol. 86(3), pp. 216- 
article DOI  
BibTeX:
@article{Hayes1998,
  author = {Brian Hayes},
  title = {Prototeins},
  journal = {American Scientist},
  year = {1998},
  volume = {86},
  number = {3},
  pages = {216--},
  doi = {http://dx.doi.org/10.1511/1998.3.216}
}
Hertz, A., Taillard, E. and de Werra, D. A Tutorial on Tabu Search 1995 Proc. of Giornate di Lavoro AIRO'95 (Enterprise Systems: Management of Technologicaland Organizational Changes), pp. 13-24  inproceedings URL 
BibTeX:
@inproceedings{Hertz1995,
  author = {A. Hertz and E. Taillard and D. de Werra},
  title = {A Tutorial on Tabu Search},
  booktitle = {Proc. of Giornate di Lavoro AIRO'95 (Enterprise Systems: Management of Technologicaland Organizational Changes)},
  year = {1995},
  pages = {13-24},
  url = {http://citeseer.ist.psu.edu/hertz92tutorial.html}
}
Hoad, T.C. and Zobel, J. Methods for identifying versioned and plagiarized documents 2003 Journal of the American Society for Information Science and Technology
Vol. 54(3), pp. 203-215 
article DOI  
Abstract: The widespread use of on-line publishing of text promotes storage of multiple versions of documents and mirroring of documents in multiple locations, and greatly simplifies the task of plagiarizing the work of others. We evaluate two families of methods for searching a collection to find documents that are coderivative, that is, are versions or plagiarisms of each other. The first, the ranking family, uses information retrieval techniques; extending this family, we propose the identity measure, which is specifically designed for identification of coderivative documents. The second, the fingerprinting family, uses hashing to generate a compact document description, which can then be compared to the fingerprints of the documents in the collection. We introduce a new method for evaluating the effectiveness of these techniques, and demonstrate it in practice. Using experiments on two collections, we demonstrate that the identity measure and the best fingerprinting technique are both able to accurately identify coderivative documents. However, for fingerprinting parameters must be carefully chosen, and even so the identity measure is clearly superior.
BibTeX:
@article{Hoad2003,
  author = {Hoad, Timothy C. and Zobel, Justin},
  title = {Methods for identifying versioned and plagiarized documents},
  journal = {Journal of the American Society for Information Science and Technology},
  year = {2003},
  volume = {54},
  number = {3},
  pages = {203--215},
  doi = {http://dx.doi.org/10.1002/asi.10170}
}
Holm, L. and Park, J. DaliLite workbench for protein structure comparison 2000 Bioinformatics
Vol. 16(6), pp. 566-567 
article DOI  
Abstract: Summary: DaliLite is a program for pairwise structure comparison and for structure database searching. It is a standalone version of the search engine of the popular Dali server. A web interface is provided to view the results, multiple alignments and 3D superimpositions of structures. Availability: DaliLite has been ported to the Linux and Irix operating systems and can be compiled in many other UNIX operating systems. It is found at http://www.embl-ebi.ac.uk/dali/DaliLite. Contact: holm@embl-ebi.ac.uk
BibTeX:
@article{Holm2000,
  author = {Holm, Liisa and Park, Jong},
  title = {DaliLite workbench for protein structure comparison},
  journal = {Bioinformatics},
  year = {2000},
  volume = {16},
  number = {6},
  pages = {566--567},
  doi = {http://dx.doi.org/10.1093/bioinformatics/16.6.566}
}
Holm, L. and Sander, C. Mapping the Protein Universe 1996 Science
Vol. 273(5275), pp. 595-602 
article DOI  
BibTeX:
@article{Holm1996,
  author = {Holm, Liisa and Sander, Chris},
  title = {Mapping the Protein Universe},
  journal = {Science},
  year = {1996},
  volume = {273},
  number = {5275},
  pages = {595--602},
  doi = {http://dx.doi.org/10.1126/science.273.5275.595}
}
Hu, J., Goodman, E., Seo, K., Fan, Z. and Rosenberg, R. The Hierarchical Fair Competition (HFC) Framework for Sustainable Evolutionary Algorithms 2005 Evolutionary Computation
Vol. 13(2), pp. 241-277 
article DOI  
Abstract: Many current Evolutionary Algorithms (EAs) suffer from a tendency to converge prematurely or stagnate without progress for complex problems. This may be due to the loss of or failure to discover certain valuable genetic material or the loss of the capability to discover new genetic material before convergence has limited the algorithm's ability to search widely. In this paper, the Hierarchical Fair Competition (HFC) model, including several variants, is proposed as a generic framework for sustainable evolutionary search by transforming the convergent nature of the current EA framework into a non-convergent search process. That is, the structure of HFC does not allow the convergence of the population to the vicinity of any set of optimal or locally optimal solutions. The sustainable search capability of HFC is achieved by ensuring a continuous supply and the incorporation of genetic material in a hierarchical manner, and by culturing and maintaining, but continually renewing, populations of individuals of intermediate fitness levels. HFC employs an assembly-line structure in which subpopulations are hierarchically organized into different fitness levels, reducing the selection pressure within each subpopulation while maintaining the global selection pressure to help ensure the exploitation of the good genetic material found. Three EAs based on the HFC principle are tested - two on the even-10-parity genetic programming benchmark problem and a real-world analog circuit synthesis problem, and another on the HIFF genetic algorithm (GA) benchmark problem. The significant gain in robustness, scalability and efficiency by HFC, with little additional computing effort, and its tolerance of small population sizes, demonstrates its effectiveness on these problems and shows promise of its potential for improving other existing EAs for difficult problems. A paradigm shift from that of most EAs is proposed: rather than trying to escape from local optima or delay convergence at a local optimum, HFC allows the emergence of new optima continually in a bottom-up manner, maintaining low local selection pressure at all fitness levels, while fostering exploitation of high-fitness individuals through promotion to higher levels.
BibTeX:
@article{Hu2005,
  author = {Hu, Jianjun and Goodman, Erik and Seo, Kisung and Fan, Zhun and Rosenberg, Rondal},
  title = {The Hierarchical Fair Competition (HFC) Framework for Sustainable Evolutionary Algorithms},
  journal = {Evolutionary Computation},
  publisher = {MIT Press},
  year = {2005},
  volume = {13},
  number = {2},
  pages = {241--277},
  doi = {http://dx.doi.org/10.1162/1063656054088530}
}
Hubbard, T., Ailey, B., Brenner, S., Murzin, A. and Chothia, C. SCOP: a Structural Classification of Proteins database 1999 Nucl. Acids Res.
Vol. 27(1), pp. 254-256 
article DOI  
BibTeX:
@article{Hubbard1999,
  author = {Hubbard, TJ and Ailey, B and Brenner, SE and Murzin, AG and Chothia, C},
  title = {SCOP: a Structural Classification of Proteins database},
  journal = {Nucl. Acids Res.},
  year = {1999},
  volume = {27},
  number = {1},
  pages = {254--256},
  doi = {http://dx.doi.org/10.1093/nar/27.1.254}
}
Hung, L.-H., Ngan, S.-C., Liu, T. and Samudrala, R. PROTINFO: new algorithms for enhanced protein structure predictions 2005 Nucleic Acids Res
Vol. 33(suppl2), pp. W77-W80 
article DOI  
Abstract: We describe new algorithms and modules for protein structure prediction available as part of the PROTINFO web server. The modules, comparative and de novo modelling, have significantly improved back-end algorithms that were rigorously evaluated at the sixth meeting on the Critical Assessment of Protein Structure Prediction methods. We were one of four server groups invited to make an oral presentation (only the best performing groups are asked to do so). These two modules allow a user to submit a protein sequence and return atomic coordinates representing the tertiary structure of that protein. The PROTINFO server is available at http://protinfo.compbio.washington.edu
BibTeX:
@article{Hung2005,
  author = {Hung, Ling-Hong and Ngan, Shing-Chung and Liu, Tianyun and Samudrala, Ram},
  title = {PROTINFO: new algorithms for enhanced protein structure predictions},
  journal = {Nucleic Acids Res},
  year = {2005},
  volume = {33},
  number = {suppl2},
  pages = {W77--W80},
  doi = {http://dx.doi.org/10.1093/nar/gki403}
}
Hunter, J.D. Matplotlib: A 2D Graphics Environment 2007 Computing in Science and Engineering
Vol. 9(3)Computing in Science & Engineering, pp. 90-95 
article DOI  
BibTeX:
@article{Matplotlib,
  author = {Hunter, John D.},
  title = {Matplotlib: A 2D Graphics Environment},
  booktitle = {Computing in Science & Engineering},
  journal = {Computing in Science and Engineering},
  publisher = {IEEE Computer Society},
  year = {2007},
  volume = {9},
  number = {3},
  pages = {90--95},
  doi = {http://dx.doi.org/10.1109/MCSE.2007.55}
}
Jackson, S.E. How do small single-domain proteins fold? 1998 Folding and Design
Vol. 3(4), pp. R81-R91 
article DOI  
Abstract: Many small, monomeric proteins fold with simple two-state kinetics and show wide variation in folding rates, from microseconds to seconds. Thus, stable intermediates are not a prerequisite for the fast, efficient folding of proteins and may in fact be kinetic traps and slow the folding process. Using recent studies, can we begin to search for trends which may lead to a better understanding of the protein folding process?
BibTeX:
@article{Jackson1998,
  author = {Jackson, Sophie E.},
  title = {How do small single-domain proteins fold?},
  journal = {Folding and Design},
  year = {1998},
  volume = {3},
  number = {4},
  pages = {R81--R91},
  doi = {http://dx.doi.org/10.1016/S1359-0278(98)00033-9}
}
Jain, B. and Lappe, M. Joining Softassign and Dynamic Programming for the Contact Map Overlap Problem 2007
Vol. 4414Bioinformatics Research and Development, pp. 410-423 
incollection DOI  
Abstract: Comparison of 3-dimensional protein folds is a core problem in molecular biology. The Contact Map Overlap (CMO) scheme provides one of the most common measures for protein structure similarity. Maximizing CMO is, however, NP-hard. To approximately solve CMO, we combine softassign and dynamic programming. Softassign approximately solves the maximum common subgraph (MCS) problem. Dynamic programming converts the MCS solution to a solution of the CMO problem. We present and discuss experiments using proteins with up to 1500 residues. The results indicate that the proposed method is extremely fast compared to other methods, scales well with increasing problem size, and is useful for comparing similar protein structures.
BibTeX:
@incollection{Jain2007,
  author = {Jain, Brijnesh and Lappe, Michael},
  title = {Joining Softassign and Dynamic Programming for the Contact Map Overlap Problem},
  booktitle = {Bioinformatics Research and Development},
  publisher = {Springer Berlin / Heidelberg},
  year = {2007},
  volume = {4414},
  pages = {410-423},
  doi = {http://dx.doi.org/10.1007/978-3-540-71233-6_32}
}
Jiang, H. and Doerge, R. Estimating the proportion of true null hypotheses for multiple comparisons 2008 Cancer Informatics
Vol. 6, pp. 25-32 
article URL 
BibTeX:
@article{Jiang2008,
  author = {Hongmei Jiang and R.W. Doerge},
  title = {Estimating the proportion of true null hypotheses for multiple comparisons},
  journal = {Cancer Informatics},
  publisher = {Libertas Academica},
  year = {2008},
  volume = {6},
  pages = {25--32},
  url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2623313/}
}
Jones, D.T., Bryson, K., Coleman, A., McGuffin, L.J., Sadowski, M.I., Sodhi, J.S. and Ward, J.J. Prediction of novel and analogous folds using fragment assembly and fold recognition 2005 Proteins: Structure, Function, and Bioinformatics
Vol. 61(S7), pp. 143-151 
article DOI  
BibTeX:
@article{Jones2005,
  author = {D. T. Jones and K. Bryson and A. Coleman and L. J. McGuffin and M. I. Sadowski and J. S. Sodhi and J. J. Ward},
  title = {Prediction of novel and analogous folds using fragment assembly and fold recognition},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2005},
  volume = {61},
  number = {S7},
  pages = {143--151},
  doi = {http://dx.doi.org/10.1002/prot.20731}
}
Jones, E., Oliphant, T., Peterson, P. and others SciPy: Open source scientific tools for Python 2001--   misc URL 
BibTeX:
@misc{SciPy,
  author = {Eric Jones and Travis Oliphant and Pearu Peterson and others},
  title = {SciPy: Open source scientific tools for Python},
  year = {2001--},
  url = {http://www.scipy.org/}
}
Jurkowski, W., Bryliński, M., Konieczny, L., Wiśniowski, Z. and Roterman, I. Conformational subspace in simulation of early-stage protein folding 2004 Proteins: Structure, Function, and Bioinformatics
Vol. 55(1), pp. 115-27 
article DOI  
BibTeX:
@article{Jurkowski2004,
  author = {Jurkowski, Wiktor and Bryliński, Michał and Konieczny, Leszek and Wiśniowski, Zdzisław and Roterman, Irena},
  title = {Conformational subspace in simulation of early-stage protein folding},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2004},
  volume = {55},
  number = {1},
  pages = {115--27},
  doi = {http://dx.doi.org/10.1002/prot.20002}
}
Kabsch, W. A discussion of the solution for the best rotation to relate two sets of vectors 1978 Acta Crystallographica Section A
Vol. 34(5), pp. 827-828 
article DOI  
BibTeX:
@article{Kabsch1978,
  author = {W. Kabsch},
  title = {A discussion of the solution for the best rotation to relate two sets of vectors},
  journal = {Acta Crystallographica Section A},
  year = {1978},
  volume = {34},
  number = {5},
  pages = {827--828},
  doi = {http://dx.doi.org/10.1107/S0567739478001680}
}
Karplus, K. and Karchin, R. Combining local-structure, fold-recognition, and new fold methods for protein structure prediction 2003 Proteins: Structure, Function, and Genetics
Vol. 53(S6), pp. 491-496 
article DOI  
BibTeX:
@article{Karplus2003,
  author = {Kevin Karplus and Rachel Karchin},
  title = {Combining local-structure, fold-recognition, and new fold methods for protein structure prediction},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {2003},
  volume = {53},
  number = {S6},
  pages = {491--496},
  doi = {http://dx.doi.org/10.1002/prot.10540}
}
Karplus, K. and Katzman, S. SAM-T04: What is new in protein-structure prediction for CASP6 2005 Proteins: Structure, Function, and Bioinformatics
Vol. 61(S7), pp. 135-142 
article DOI  
BibTeX:
@article{Karplus2005,
  author = {Kevin Karplus and Sol Katzman},
  title = {SAM-T04: What is new in protein-structure prediction for CASP6},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2005},
  volume = {61},
  number = {S7},
  pages = {135--142},
  doi = {http://dx.doi.org/10.1002/prot.20730}
}
Kmiecik, S. and Kolinski, A. Characterization of protein-folding pathways by reduced-space modeling 2007 Proceedings of the National Academy of Sciences
Vol. 104(30), pp. 12330-12335 
article DOI  
Abstract: Ab initio simulations of the folding pathways are currently limited to very small proteins. For larger proteins, some approximations or simplifications in protein models need to be introduced. Protein folding and unfolding are among the basic processes in the cell and are very difficult to characterize in detail by experiment or simulation. Chymotrypsin inhibitor 2 (CI2) and barnase are probably the best characterized experimentally in this respect. For these model systems, initial folding stages were simulated by using CA–CB–side chain (CABS), a reduced-space protein-modeling tool. CABS employs knowledge-based potentials that proved to be very successful in protein structure prediction. With the use of isothermal Monte Carlo (MC) dynamics, initiation sites with a residual structure and weak tertiary interactions were identified. Such structures are essential for the initiation of the folding process through a sequential reduction of the protein conformational space, overcoming the Levinthal paradox in this manner. Furthermore, nucleation sites that initiate a tertiary interactions network were located. The MC simulations correspond perfectly to the results of experimental and theoretical research and bring insights into CI2 folding mechanism: unambiguous sequence of folding events was reported as well as cooperative substructures compatible with those obtained in recent molecular dynamics unfolding studies. The correspondence between the simulation and experiment shows that knowledge-based potentials are not only useful in protein structure predictions but are also capable of reproducing the folding pathways. Thus, the results of this work significantly extend the applicability range of reduced models in the theoretical study of proteins.
BibTeX:
@article{Kmiecik2007,
  author = {Kmiecik, Sebastian and Kolinski, Andrzej},
  title = {Characterization of protein-folding pathways by reduced-space modeling},
  journal = {Proceedings of the National Academy of Sciences},
  year = {2007},
  volume = {104},
  number = {30},
  pages = {12330--12335},
  doi = {http://dx.doi.org/10.1073/pnas.0702265104}
}
Knight, W.R. A Computer Method for Calculating Kendall's Tau with Ungrouped Data 1966 Journal of the American Statistical Association
Vol. 61(314), pp. 436-439 
article  
Abstract: Adapting the usual manual methods of computing Kendall's tau to automatic computation result in a running time of order $N^2$. A method is described with running time of order $N log N$.
BibTeX:
@article{Knight1966,
  author = {Knight, William R.},
  title = {A Computer Method for Calculating Kendall's Tau with Ungrouped Data},
  journal = {Journal of the American Statistical Association},
  publisher = {American Statistical Association},
  year = {1966},
  volume = {61},
  number = {314},
  pages = {436--439}
}
Kolinski, A. Protein modeling and structure prediction with a reduced representation. 2004 Acta Biochimica Polonica
Vol. 51(2), pp. 349-371 
article URL 
Abstract: Protein modeling could be done on various levels of structural details, from simplified lattice or continuous representations, through high resolution reduced models, employing the united atom representation, to all-atom models of the molecular mechanics. Here I describe a new high resolution reduced model, its force field and applications in the structural proteomics. The model uses a lattice representation with 800 possible orientations of the virtual alpha carbon-alpha carbon bonds. The sampling scheme of the conformational space employs the Replica Exchange Monte Carlo method. Knowledge-based potentials of the force field include: generic protein-like conformational biases, statistical potentials for the short-range conformational propensities, a model of the main chain hydrogen bonds and context-dependent statistical potentials describing the side group interactions. The model is more accurate than the previously designed lattice models and in many applications it is complementary and competitive in respect to the all-atom techniques. The test applications include: the ab initio structure prediction, multitemplate comparative modeling and structure prediction based on sparse experimental data. Especially, the new approach to comparative modeling could be a valuable tool of the structural proteomics. It is shown that the new approach goes beyond the range of applicability of the traditional methods of the protein comparative modeling.
BibTeX:
@article{Kolinski2004,
  author = {Andrzej Kolinski},
  title = {Protein modeling and structure prediction with a reduced representation.},
  journal = {Acta Biochimica Polonica},
  year = {2004},
  volume = {51},
  number = {2},
  pages = {349--371},
  url = {http://www.actabp.pl/html/2_2004/349.html}
}
Kolinski, A. and Bujnicki, J.M. Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models 2005 Proteins: Structure, Function, and Bioinformatics
Vol. 61(S7), pp. 84-90 
article DOI  
Abstract: To predict the tertiary structure of full-length sequences of all targets in CASP6, regardless of their potential category (from easy comparative modeling to fold recognition to apparent new folds) we used a novel combination of two very different approaches developed independently in our laboratories, which ranked quite well in different categories in CASP5. First, the GeneSilico metaserver was used to identify domains, predict secondary structure, and generate fold recognition (FR) alignments, which were converted to full-atom models using the ?FRankenstein's Monster? approach for comparative modeling (CM) by recombination of protein fragments. Additional models generated ?de novo? by fully automated servers were obtained from the CASP website. All these models were evaluated by VERIFY3D, and residues with scores better than 0.2 were used as a source of spatial restraints. Second, a new implementation of the lattice-based protein modeling tool CABS was used to carry out folding guided by the above-mentioned restraints with the Replica Exchange Monte Carlo sampling technique. Decoys generated in the course of simulation were subject to the average linkage hierarchical clustering. For a representative decoy from each cluster, a full-atom model was rebuilt. Finally, five models were selected for submission based on combination of various criteria, including the size, density, and average energy of the corresponding cluster, and the visual evaluation of the full-atom structures and their relationship to the original templates. The combination of FRankenstein and CABS was one of the best-performing algorithms over all categories in CASP6 (it is important to note that our human intervention was very limited, and all steps in our method can be easily automated). We were able to generate a number of very good models, especially in the Comparative Modeling and New Folds categories. Frequently, the best models were closer to the native structure than any of the templates used. The main problem we encountered was in the ranking of the final models (the only step of significant human intervention), due to the insufficient computational power, which precluded the possibility of full-atom refinement and energy-based evaluation.
BibTeX:
@article{Kolinski2005,
  author = {Kolinski, Andrzej and Bujnicki, Janusz M.},
  title = {Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2005},
  volume = {61},
  number = {S7},
  pages = {84--90},
  note = {FRankenstein's Monster + CABS modelling},
  doi = {http://dx.doi.org/10.1002/prot.20723}
}
Kolinski, A. and Skolnick, J. Assembly of protein structure from sparse experimental data: An efficient Monte Carlo model 1998 Proteins: Structure, Function, and Genetics
Vol. 32(4), pp. 475-494 
article DOI  
BibTeX:
@article{Kolinski1998,
  author = {Andrzej Kolinski and Jeffrey Skolnick},
  title = {Assembly of protein structure from sparse experimental data: An efficient Monte Carlo model},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {1998},
  volume = {32},
  number = {4},
  pages = {475--494},
  doi = {http://dx.doi.org/10.1002/(SICI)1097-0134(19980901)32:4<475::AID-PROT6>3.0.CO;2-F}
}
Kolinski, A. and Skolnick, J. Monte carlo simulations of protein folding. I. Lattice model and interaction scheme 1994 Proteins: Structure, Function, and Genetics
Vol. 18(4), pp. 338-52 
article DOI  
BibTeX:
@article{Kolinski1994,
  author = {Kolinski, Andrzej and Skolnick, Jeffrey},
  title = {Monte carlo simulations of protein folding. I. Lattice model and interaction scheme},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {1994},
  volume = {18},
  number = {4},
  pages = {338--52},
  doi = {http://dx.doi.org/10.1002/prot.340180405}
}
Kolodny, R., Koehl, P. and Levitt, M. Comprehensive Evaluation of Protein Structure Alignment Methods: Scoring by Geometric Measures 2005 Journal of Molecular Biology
Vol. 346(4), pp. 1173-1188 
article DOI  
Abstract: We report the largest and most comprehensive comparison of protein structural alignment methods. Specifically, we evaluate six publicly available structure alignment programs: SSAP, STRUCTAL, DALI, LSQMAN, CE and SSM by aligning all 8,581,970 protein structure pairs in a test set of 2930 protein domains specially selected from CATH v.2.4 to ensure sequence diversity. We consider an alignment good if it matches many residues, and the two substructures are geometrically similar. Even with this definition, evaluating structural alignment methods is not straightforward. At first, we compared the rates of true and false positives using receiver operating characteristic (ROC) curves with the CATH classification taken as a gold standard. This proved unsatisfactory in that the quality of the alignments is not taken into account: sometimes a method that finds less good alignments scores better than a method that finds better alignments. We correct this intrinsic limitation by using four different geometric match measures (SI, MI, SAS, and GSAS) to evaluate the quality of each structural alignment. With this improved analysis we show that there is a wide variation in the performance of different methods; the main reason for this is that it can be difficult to find a good structural alignment between two proteins even when such an alignment exists. We find that STRUCTAL and SSM perform best, followed by LSQMAN and CE. Our focus on the intrinsic quality of each alignment allows us to propose a new method, called "Best-of-All" that combines the best results of all methods. Many commonly used methods miss 10-50% of the good Best-of-All alignments. By putting existing structural alignments into proper perspective, our study allows better comparison of protein structures. By highlighting limitations of existing methods, it will spur the further development of better structural alignment methods. This will have significant biological implications now that structural comparison has come to play a central role in the analysis of experimental work on protein structure, protein function and protein evolution.
BibTeX:
@article{Kolodny2005,
  author = {Kolodny, Rachel and Koehl, Patrice and Levitt, Michael},
  title = {Comprehensive Evaluation of Protein Structure Alignment Methods: Scoring by Geometric Measures},
  journal = {Journal of Molecular Biology},
  year = {2005},
  volume = {346},
  number = {4},
  pages = {1173--1188},
  doi = {http://dx.doi.org/10.1016/j.jmb.2004.12.032}
}
Koza, J.R. Scalable learning in genetic programming using automatic function definition 1994 Advances in Genetic Programming, pp. 99-117  incollection  
BibTeX:
@incollection{Koza1994,
  author = {John R. Koza},
  title = {Scalable learning in genetic programming using automatic function definition},
  booktitle = {Advances in Genetic Programming},
  publisher = {MIT Press},
  year = {1994},
  pages = {99--117}
}
Koza, J.R. 36 Human-Competitive Results Produced by Genetic Programming   webpage URL 
BibTeX:
@webpage{URL_GP_results,
  author = {John R. Koza},
  title = {36 Human-Competitive Results Produced by Genetic Programming},
  url = {http://www.genetic-programming.com/humancompetitive.html}
}
Koza, J.R. Genetic programming: on the programming of computers by means of natural selection and genetics 1992   book  
BibTeX:
@book{Koza1992,
  author = {Koza, John R.},
  title = {Genetic programming: on the programming of computers by means of natural selection and genetics},
  publisher = {MIT Press},
  year = {1992}
}
Kraskov, A., Stogbauer, H., Andrzejak, R.G. and Grassberger, P. Hierarchical Clustering Based on Mutual Information 2003   misc URL 
Abstract: Motivation: Clustering is a frequently used concept in variety of bioinformatical applications. We present a new method for hierarchical clustering of data called mutual information clustering (MIC) algorithm. It uses mutual information (MI) as a similarity measure and exploits its grouping property: The MI between three objects X, Y, and Z is equal to the sum of the MI between X and Y, plus the MI between Z and the combined object (XY). Results: We use this both in the Shannon (probabilistic) version of information theory, where the "objects" are probability distributions represented by random samples, and in the Kolmogorov (algorithmic) version, where the "objects" are symbol sequences. We apply our method to the construction of mammal phylogenetic trees from mitochondrial DNA sequences and we reconstruct the fetal ECG from the output of independent components analysis (ICA) applied to the ECG of a pregnant woman. Availability: The programs for estimation of MI and for clustering (probabilistic version) are available at this http://www.fz-juelich.de/nic/cs/software
BibTeX:
@misc{Kraskov2003,
  author = {Alexander Kraskov and Harald Stogbauer and Ralph G. Andrzejak and Peter Grassberger},
  title = {Hierarchical Clustering Based on Mutual Information},
  year = {2003},
  url = {http://arxiv.org/abs/q-bio/0311039}
}
Krasnogor, N. Self-Generating metaheuristics in bioinformatics: the protein structure comparison case 2004 Genetic Programming and Evolvable Machines
Vol. 5(2), pp. 181-201 
article  
BibTeX:
@article{Krasnogor2004b,
  author = {N. Krasnogor},
  title = {Self-Generating metaheuristics in bioinformatics: the protein structure comparison case},
  journal = {Genetic Programming and Evolvable Machines},
  year = {2004},
  volume = {5},
  number = {2},
  pages = {181-201}
}
Krasnogor, N., Blackburn, B., Hirst, J. and Burke, E. Multimeme Algorithms for Protein Structure Prediction 2002
Vol. 2439Parallel Problem Solving from Nature - PPSN VII, pp. 769-778 
inproceedings DOI  
Abstract: Despite intensive studies during the last 30 years researchers are yet far from the “holy grail” of blind structure prediction of the three dimensional native state of a protein from its sequence of amino acids. We introduce here a Multimeme Algorithm which is robust across a range of protein structure models and instances. New benchmark sequences for the triangular lattice in the HP model and Functional Model Proteins in two and three dimensions are included here with their known optima. As there is no favourite protein model nor exact energy potentials to describe proteins, robustness accross a range of models must be present in any putative structure prediction algorithm. We demonstrate in this paper that while our algorithm present this feature it remains, in terms of cost, competitive with other techniques.
BibTeX:
@inproceedings{Krasnogor2002,
  author = {N. Krasnogor and B. Blackburn and J.D. Hirst and E.K. Burke},
  title = {Multimeme Algorithms for Protein Structure Prediction},
  booktitle = {Parallel Problem Solving from Nature - PPSN VII},
  publisher = {Springer},
  year = {2002},
  volume = {2439},
  pages = {769--778},
  doi = {http://dx.doi.org/10.1007/3-540-45712-7_74}
}
Krasnogor, N., Hart, W., Smith, J. and Pelta, D. Protein Structure Prediction With Evolutionary Algorithms 1999 International Genetic and Evolutionary Computation Conference (GECCO99), pp. 1569-1601  inproceedings  
BibTeX:
@inproceedings{Krasnogor1999,
  author = {N. Krasnogor and W.E. Hart and J. Smith and D. Pelta},
  title = {Protein Structure Prediction With Evolutionary Algorithms},
  booktitle = {International Genetic and Evolutionary Computation Conference (GECCO99)},
  publisher = {Morgan Kaufmann},
  year = {1999},
  pages = {1569-1601}
}
Krasnogor, N., Lancia, G., Zemla, A., Hart, W., Carr, R., Hirst, J. and Burke, E. A Comparison of Computational Methods for the Maximum Contact Map Overlap of Protein Pairs 2003   unpublished  
BibTeX:
@unpublished{Krasnogor2003,
  author = {N. Krasnogor and G. Lancia and A. Zemla and W.E. Hart and R.D. Carr and J.D. Hirst and E.K. Burke},
  title = {A Comparison of Computational Methods for the Maximum Contact Map Overlap of Protein Pairs},
  year = {2003},
  note = {submitted}
}
Krasnogor, N. and Pelta, D.A. Measuring the similarity of protein structures by means of the universal similarity metric 2004 Bioinformatics
Vol. 20(7), pp. 1015-1021 
article DOI  
Abstract: Motivation: As an increasing number of protein structures become available, the need for algorithms that can quantify the similarity between protein structures increases as well. Thus, the comparison of proteins' structures, and their clustering accordingly to a given similarity measure, is at the core of today's biomedical research. In this paper, we show how an algorithmic information theory inspired Universal Similarity Metric (USM) can be used to calculate similarities between protein pairs. The method, besides being theoretically supported, is surprisingly simple to implement and computationally efficient. Results: Structural similarity between proteins in four different datasets was measured using the USM. The sample employed represented alpha, beta, alpha-beta, tim-barrel, globins and serpine protein types. The use of the proposed metric allows for a correct measurement of similarity and classification of the proteins in the four datasets. Availability: All the scripts and programs used for the preparation of this paper are available at http://www.cs.nott.ac.uk/~nxk/USM/protocol.html. In that web-page the reader will find a brief description on how to use the various scripts and programs. Supplementary information: The protein datasets used are collected in http://www.cs.nott.ac.uk/~nxk/USM/datasets.html. The calculated similarity values for the proteins used in this paper can be found in http://www.cs.nott.ac.uk/~nxk/USM/similar.html. The clustering of the dataset based on these similarity values can be found in http://www.cs.nott.ac.uk/~nxk/USM/clustering.html
BibTeX:
@article{Krasnogor2004,
  author = {Krasnogor, N. and Pelta, D. A.},
  title = {Measuring the similarity of protein structures by means of the universal similarity metric},
  journal = {Bioinformatics},
  year = {2004},
  volume = {20},
  number = {7},
  pages = {1015--1021},
  doi = {http://dx.doi.org/10.1093/bioinformatics/bth031}
}
Krintz, C., Bunch, C., Chohan, N., Chohan, J., Garg, N., Hubert, M., Kupferman, J., Lakhina, P., Li, Y., Mehta, G., Mostafa, N., Nomura, Y. and Park, S.H. AppScale - open source implementation of the Google App Engine   webpage URL 
BibTeX:
@webpage{URL_APPSCALE,
  author = {Chandra Krintz and Chris Bunch and Navraj Chohan and Jovan Chohan and Nupur Garg and Matt Hubert and Jonathan Kupferman and Puneet Lakhina and Yiming Li and Gaurav Mehta and Nagy Mostafa and Yoshihide Nomura and Soo Hwan Park},
  title = {AppScale - open source implementation of the Google App Engine},
  url = {http://appscale.cs.ucsb.edu/}
}
Kryshtafovych, A. and Fidelis, K. Protein structure prediction and model quality assessment 2009 Drug Discovery Today
Vol. 14(7-8), pp. 386-393 
article DOI  
Abstract: Protein structures have proven to be a crucial piece of information for biomedical research. Of the millions of currently sequenced proteins only a small fraction is experimentally solved for structure and the only feasible way to bridge the gap between sequence and structure data is computational modeling. Half a century has passed since it was shown that the amino acid sequence of a protein determines its shape, but a method to translate the sequence code reliably into the 3D structure still remains to be developed. This review summarizes modern protein structure prediction techniques with the emphasis on comparative modeling, and describes the recent advances in methods for theoretical model quality assessment.
BibTeX:
@article{Kryshtafovych2009,
  author = {Kryshtafovych, Andriy and Fidelis, Krzysztof},
  title = {Protein structure prediction and model quality assessment},
  journal = {Drug Discovery Today},
  year = {2009},
  volume = {14},
  number = {7-8},
  pages = {386--393},
  doi = {http://dx.doi.org/10.1016/j.drudis.2008.11.010}
}
Kryshtafovych, A., Krysko, O., Daniluk, P., Dmytriv, Z. and Fidelis, K. Protein structure prediction center in CASP8 2009 Proteins
Vol. 77(S9), pp. 5-9 
article DOI  
Abstract: We present an outline of the Critical Assessment of Protein Structure Prediction (CASP) infrastructure implemented at the University of California, Davis, Protein Structure Prediction Center. The infrastructure supports selection and validation of prediction targets, collection of predictions, standard evaluation of submitted predictions, and presentation of results. The Center also supports information exchange relating to CASP experiments and structure prediction in general. Technical aspects of conducting the CASP8 experiment and relevant statistics are also provided.
BibTeX:
@article{Kryshtafovych2009a,
  author = {Kryshtafovych, A. and Krysko, O. and Daniluk, P. and Dmytriv, Z. and Fidelis, K.},
  title = {Protein structure prediction center in CASP8},
  journal = {Proteins},
  publisher = {Wiley Subscription Services, Inc., A Wiley Company},
  year = {2009},
  volume = {77},
  number = {S9},
  pages = {5--9},
  doi = {http://dx.doi.org/10.1002/prot.22517}
}
Kryshtafovych, A., Venclovas, C., Fidelis, K. and Moult, J. Progress over the first decade of CASP experiments 2005 Proteins: Structure, Function, and Bioinformatics
Vol. 61(S7), pp. 225-36 
article DOI  
BibTeX:
@article{Kryshtafovych2005,
  author = {Kryshtafovych, Andriy and Venclovas, Ceslovas and Fidelis, Krzysztof and Moult, John},
  title = {Progress over the first decade of CASP experiments},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2005},
  volume = {61},
  number = {S7},
  pages = {225--36},
  doi = {http://dx.doi.org/10.1002/prot.20740}
}
Krzywinski, M., Birol, I., Jones, S.J. and Marra, M.A. Hive plots -- rational approach to visualizing networks 2012 Briefings in Bioinformatics
Vol. 13(5)Brief Bioinform, pp. 627-644 
article DOI URL 
Abstract: Networks are typically visualized with force-based or spectral layouts. These algorithms lack reproducibility and perceptual uniformity because they do not use a node coordinate system. The layouts can be difficult to interpret and are unsuitable for assessing differences in networks. To address these issues, we introduce hive plots (http://www.hiveplot.com) for generating informative, quantitative and comparable network layouts. Hive plots depict network structure transparently, are simple to understand and can be easily tuned to identify patterns of interest. The method is computationally straightforward, scales well and is amenable to a plugin for existing tools.
BibTeX:
@article{Krzywinski2012,
  author = {Krzywinski, Martin and Birol, Inanc and Jones, Steven JM and Marra, Marco A},
  title = {Hive plots -- rational approach to visualizing networks},
  booktitle = {Brief Bioinform},
  journal = {Briefings in Bioinformatics},
  year = {2012},
  volume = {13},
  number = {5},
  pages = {627-644},
  url = {www.hiveplot.net},
  doi = {http://dx.doi.org/10.1093/bib/bbr069}
}
Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., Jones, S.J. and Marra, M.A. Circos: An information aesthetic for comparative genomics 2009 Genome Research
Vol. 19(9), pp. 1639-1645 
article DOI  
Abstract: We created a visualization tool called Circos to facilitate the identification and analysis of similarities and differences arising from comparisons of genomes. Our tool is effective in displaying variation in genome structure and, generally, any other kind of positional relationships between genomic intervals. Such data are routinely produced by sequence alignments, hybridization arrays, genome mapping, and genotyping studies. Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements. Circos is capable of displaying data as scatter, line, and histogram plots, heat maps, tiles, connectors, and text. Bitmap or vector images can be created from GFF-style data inputs and hierarchical configuration files, which can be easily generated by automated tools, making Circos suitable for rapid deployment in data analysis and reporting pipelines.
BibTeX:
@article{Krzywinski2009,
  author = {Krzywinski, Martin and Schein, Jacqueline and Birol, Inanc and Connors, Joseph and Gascoyne, Randy and Horsman, Doug and Jones, Steven J. and Marra, Marco A.},
  title = {Circos: An information aesthetic for comparative genomics},
  journal = {Genome Research},
  year = {2009},
  volume = {19},
  number = {9},
  pages = {1639--1645},
  doi = {http://dx.doi.org/10.1101/gr.092759.109}
}
Kurowski, M.A. and Bujnicki, J.M. GeneSilico protein structure prediction meta-server 2003 Nucl. Acids Res.
Vol. 31(13), pp. 3305-3307 
article DOI  
Abstract: Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
BibTeX:
@article{Kurowski2003,
  author = {Kurowski, Michal A. and Bujnicki, Janusz M.},
  title = {GeneSilico protein structure prediction meta-server},
  journal = {Nucl. Acids Res.},
  year = {2003},
  volume = {31},
  number = {13},
  pages = {3305--3307},
  doi = {http://dx.doi.org/10.1093/nar/gkg557}
}
Laguna, M. Implementing and testing the tabu cycle and conditional probability methods 2006 Computers & Operations Research
Vol. 33(9)Part Special Issue: Anniversary Focused Issue of Computers & Operations Research on Tabu Search, pp. 2495-2507 
article DOI  
Abstract: The purpose of this paper is to describe the implementation and testing of the tabu cycle method and two variants of the conditional probability method. These methods were originally described in Glover and Laguna [Tabu search. Boston: Kluwer Academic Publishers; 1997] but have been largely ignored in the tabu search literature. For the purpose of testing, we employ a single-attribute implementation of a short-term memory procedure for the solution of a single machine scheduling problem. Computational experiments that employ instances with up to 200 jobs reveal the usefulness of the tabu cycle and the conditional probability methods as viable alternatives for managing the short-term memory in a tabu search implementation.
BibTeX:
@article{Laguna2006,
  author = {Manuel Laguna},
  title = {Implementing and testing the tabu cycle and conditional probability methods},
  booktitle = {Part Special Issue: Anniversary Focused Issue of Computers & Operations Research on Tabu Search},
  journal = {Computers & Operations Research},
  year = {2006},
  volume = {33},
  number = {9},
  pages = {2495--2507},
  doi = {http://dx.doi.org/10.1016/j.cor.2005.07.008}
}
Laguna, M., Marti, R. and Campos, V. Intensification and diversification with elite tabu search solutions for the linear ordering problem 1999 Computers & Operations Research
Vol. 26(12), pp. 1217-1230 
article DOI  
BibTeX:
@article{Laguna1999,
  author = {Manuel Laguna and Rafael Marti and Vicente Campos},
  title = {Intensification and diversification with elite tabu search solutions for the linear ordering problem},
  journal = {Computers & Operations Research},
  year = {1999},
  volume = {26},
  number = {12},
  pages = {1217--1230},
  doi = {http://dx.doi.org/10.1016/S0305-0548(98)00104-X}
}
Lancia, G., Carr, R., Walenz, B. and Istrail, S. 101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem 2001 RECOMB '01: Proceedings of the fifth annual international conference on Computational biology, pp. 193-202  inproceedings DOI  
BibTeX:
@inproceedings{Lancia2001,
  author = {Lancia, Giuseppe and Carr, Robert and Walenz, Brian and Istrail, Sorin},
  title = {101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem},
  booktitle = {RECOMB '01: Proceedings of the fifth annual international conference on Computational biology},
  publisher = {ACM Press},
  year = {2001},
  pages = {193--202},
  doi = {http://dx.doi.org/10.1145/369133.369199}
}
Lancichinetti, A., Radicchi, F., Ramasco, J.J. and Fortunato, S. Finding Statistically Significant Communities in Networks 2011 PLoS ONE
Vol. 6(4), pp. e18961 
article DOI  
Abstract: Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks.
BibTeX:
@article{Lancichinetti2011,
  author = {Lancichinetti, Andrea AND Radicchi, Filippo AND Ramasco, José J. AND Fortunato, Santo},
  title = {Finding Statistically Significant Communities in Networks},
  journal = {PLoS ONE},
  publisher = {Public Library of Science},
  year = {2011},
  volume = {6},
  number = {4},
  pages = {e18961},
  doi = {http://dx.doi.org/10.1371/journal.pone.0018961}
}
Latek, D., Ekonomiuk, D. and Kolinski, A. Protein structure prediction: Combining de novo modeling with sparse experimental data 2007 Journal of Computational Chemistry
Vol. 9999(9999), pp. - 
article DOI  
BibTeX:
@article{Latek2007,
  author = {Latek, Dorota and Ekonomiuk, Dariusz and Kolinski, Andrzej},
  title = {Protein structure prediction: Combining de novo modeling with sparse experimental data},
  journal = {Journal of Computational Chemistry},
  year = {2007},
  volume = {9999},
  number = {9999},
  pages = {--},
  doi = {http://dx.doi.org/10.1002/jcc.20657}
}
Laursen, O. Flot - Javascript plotting library for jQuery   webpage URL 
BibTeX:
@webpage{URL_FLOT,
  author = {Ole Laursen},
  title = {Flot - Javascript plotting library for jQuery},
  url = {http://code.google.com/p/flot/}
}
Lazaridis, T. and Karplus, M. Effective energy function for proteins in solution 1999 Proteins: Structure, Function, and Genetics
Vol. 35(2), pp. 133-152 
article DOI  
Abstract: A Gaussian solvent-exclusion model for the solvation free energy is developed. It is based on theoretical considerations and parametrized with experimental data. When combined with the CHARMM 19 polar hydrogen energy function, it provides an effective energy function (EEF1) for proteins in solution. The solvation model assumes that the solvation free energy of a protein molecule is a sum of group contributions, which are determined from values for small model compounds. For charged groups, the self-energy contribution is accounted for primarily by the exclusion model. Ionic side-chains are neutralized, and a distance-dependent dielectric constant is used to approximate the charge-charge interactions in solution. The resulting EEF1 is subjected to a number of tests. Molecular dynamics simulations at room temperature of several proteins in their native conformation are performed, and stable trajectories are obtained. The deviations from the experimental structures are similar to those observed in explicit water simulations. The calculated enthalpy of unfolding of a polyalanine helix is found to be in good agreement with experimental data. Results reported elsewhere show that EEF1 clearly distinguishes correctly from incorrectly folded proteins, both in static energy evaluations and in molecular dynamics simulations and that unfolding pathways obtained by high-temperature molecular dynamics simulations agree with those obtained by explicit water simulations. Thus, this energy function appears to provide a realistic first approximation to the effective energy hypersurface of proteins. Proteins 1999;35:133-152. � 1999 Wiley-Liss, Inc.
BibTeX:
@article{Lazaridis1999,
  author = {Lazaridis, Themis and Karplus, Martin},
  title = {Effective energy function for proteins in solution},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {1999},
  volume = {35},
  number = {2},
  pages = {133--152},
  doi = {http://dx.doi.org/10.1002/(SICI)1097-0134(19990501)35:2<133::AID-PROT1>3.0.CO;2-N}
}
Leluk, J., Konieczny, L. and Roterman, I. Search for structural similarity in proteins. 2003 Bioinformatics
Vol. 19(1), pp. 117-124 
article DOI  
Abstract: MOTIVATION: The expanding protein sequence and structure databases await methods allowing rapid similarity search. Geometric parameters-dihedral angle between two sequential peptide bond planes (V) and radius of curvature (R) as they appear in pentapeptide fragments in polypeptide chains-are proposed for use in evaluating structural similarity in proteins (VeaR). The parabolic (empirical) function expressing the radius of curvature's dependence on the V-angle in model polypeptides is altered in real proteins in a form characteristic for a particular protein. This can be used as a criterion for judging similarity. RESULTS: A structural comparison of proteins representing a wide spectrum of structures was assessed versus sequence similarity analysis based on the genetic semihomology algorithm. The term 'consensus structure', analogous to 'consensus sequence', was introduced for the serpine family. AVAILABILITY: Semihom-sequence comparison freely available on request from J. Leluk. VeaR-structural comparison freely available on request from I. Roterman.
BibTeX:
@article{Leluk2003,
  author = {Jacek Leluk and Leszek Konieczny and Irena Roterman},
  title = {Search for structural similarity in proteins.},
  journal = {Bioinformatics},
  year = {2003},
  volume = {19},
  number = {1},
  pages = {117--124},
  doi = {http://dx.doi.org/10.1093/bioinformatics/19.1.117}
}
Levenshtein, V.I. Binary codes capable of correcting deletions, insertions, and reversals 1966 Soviet Physics Dokl.
Vol. 10(8), pp. 707-710 
article  
BibTeX:
@article{Levenshtein1966,
  author = {Levenshtein, V. I.},
  title = {Binary codes capable of correcting deletions, insertions, and reversals},
  journal = {Soviet Physics Dokl.},
  year = {1966},
  volume = {10},
  number = {8},
  pages = {707--710}
}
Levinthal, C. How to Fold Graciously 1969 Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, pp. 22-24  inproceedings URL 
BibTeX:
@inproceedings{Levinthal1969,
  author = {Cyrus Levinthal},
  title = {How to Fold Graciously},
  booktitle = {Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House},
  publisher = {University of Illinois Press},
  year = {1969},
  pages = {22-24},
  url = {http://www-wales.ch.cam.ac.uk/~mark/levinthal/levinthal.html}
}
Li, S.C. and Li, M. On the Complexity of the Crossing Contact Map Pattern Matching Problem 2006
Vol. 4175Algorithms in Bioinformatics, pp. 231-241 
incollection DOI  
Abstract: Contact maps are concepts that are often used to represent structural information in molecular biology. The contact map pattern matching (CMPM) problem is to decide if a contact map (called the pattern) is a substructure of another contact map (called the target). In general, the problem is NP-hard, but when there are restrictions on the form of the pattern, the problem can, in some case, be solved in polynomial time. In particular, a polynomial time algorithm has been proposed [1] for the case when the patterns are so-called crossing contact maps. In this paper we show that the problem is actually NP-hard, and show a flaw in the proposed polynomial-time algorithm. Through the same method, we also show that a related problem, namely, the 2-interval patten matching problem with $<, between$ -structured patterns and disjoint interval ground set, is NP-hard.
BibTeX:
@incollection{Li2006,
  author = {Li, Shuai Cheng and Li, Ming},
  title = {On the Complexity of the Crossing Contact Map Pattern Matching Problem},
  booktitle = {Algorithms in Bioinformatics},
  publisher = {Springer},
  year = {2006},
  volume = {4175},
  pages = {231--241},
  doi = {http://dx.doi.org/10.1007/11851561_22}
}
Lindberg, M.O. and Oliveberg, M. Malleability of protein folding pathways: a simple reason for complex behaviour 2007 Current Opinion in Structural Biology
Vol. 17(1)Folding and binding / Protein-nucleic interactions, pp. 21-29 
article DOI  
Abstract: Although the structures of native proteins are generally unique, the pathways by which they form are often free to vary. Some proteins fold by a multitude of different pathways, whereas others seem restricted to only one choice. An explanation for this variation in folding behaviour has recently emerged from studies of transition state changes: the number of accessible pathways is linked to the number of nucleation motifs contained within the native topology. We refer to these nucleation motifs as `foldons', as they approach the size of an independent cooperative unit. Thus, with respect to pathway malleability and the composition of the folding funnel, proteins can be seen as modular assemblies of competing foldons. For the split [beta]-[alpha]-[beta] fold, these foldons are two-strand-helix motifs coupled by spatial overlap.
BibTeX:
@article{Lindberg2007,
  author = {Lindberg, Magnus O and Oliveberg, Mikael},
  title = {Malleability of protein folding pathways: a simple reason for complex behaviour},
  booktitle = {Folding and binding / Protein-nucleic interactions},
  journal = {Current Opinion in Structural Biology},
  year = {2007},
  volume = {17},
  number = {1},
  pages = {21--29},
  doi = {http://dx.doi.org/10.1016/j.sbi.2007.01.008}
}
Liwo, A., Oldziej, S., Czaplewski, C., Kozlowska, U. and Scheraga, H. Parametrization of Backbone-Electrostatic and Multibody Contributions to the UNRES Force Field for Protein-Structure Prediction from Ab Initio Energy Surfaces of Model Systems 2004 J. Phys. Chem. B
Vol. 108(27), pp. 9421-9438 
article DOI  
Abstract: Abstract: The multibody terms pertaining to the correlation between backbone-local and backbone-electrostatic interactions in the UNRES force field for energy-based protein-structure prediction, developed in our laboratory, were reparametrized on the basis of the results of high-level ab initio calculations on relevant model systems. MP2/6-31G(d,p) ab initio calculations were carried out to evaluate the energy surfaces of pairs consisting of N-acetyl-N'-methylacetamide molecules (AcNHMe, which model a regular peptide group) and N-acetyl-N',N'-dimethylacetamide molecules (AcNMe2, which model a peptide group preceding proline) at various intermolecular distances and orientations. For each pair, the calculated ab initio energy surface was subsequently fitted by a sum of Coulombic and Lennard-Jones components. Then, the restricted free-energy (RFE) surfaces of pairs of free peptide groups as well as the RFE factors corresponding to the coupling of backbone-local and backbone-electrostatic interactions in model tetrapeptides were calculated by numerical integration, with the use of the ab initio-fitted simplified energy functions and the ab initio energy maps of model terminally blocked amino acid residues calculated recently (Odziej, S.; Kozowska, U.; Liwo, A.; Scheraga, H. A. J. Phys. Chem. B, in press, 2003). Next, analytical expressions based on Kubo's generalized cumulant theory from our previous work were fitted to the resulting RFE surfaces to parametrize the backbone-electrostatic and multibody terms in the UNRES force field. The computed coefficients of the cumulant-based expressions are different from those derived earlier, which had been based on the ECEPP/3 force field. To complete the force-field parametrization, the weights of the energy terms were determined, and the coefficients of the cumulant-based expressions were refined simultaneously by using our recently developed method of hierarchical optimization of a protein energy landscape using the protein 1IGD. The resulting force field was able to predict significant portions of the structures of proteins with , as well as both and structure correctly.
BibTeX:
@article{Liwo2004,
  author = {Liwo, A. and Oldziej, S. and Czaplewski, C. and Kozlowska, U. and Scheraga, H.A.},
  title = {Parametrization of Backbone-Electrostatic and Multibody Contributions to the UNRES Force Field for Protein-Structure Prediction from Ab Initio Energy Surfaces of Model Systems},
  journal = {J. Phys. Chem. B},
  year = {2004},
  volume = {108},
  number = {27},
  pages = {9421--9438},
  doi = {http://dx.doi.org/10.1021/jp030844f}
}
Longabaugh, W. Combing the hairball with BioFabric: a new approach for visualization of large networks 2012 BMC Bioinformatics
Vol. 13(1), pp. 275 
article DOI  
Abstract: BACKGROUND:The analysis of large, complex networks is an important aspect of ongoing biological research. Yet there is a need for entirely new, scalable approaches for network visualization that can provide more insight into the structure and function of these complex networks.RESULTS:To address this need, we have developed a software tool named BioFabric, which uses a novel network visualization technique that depicts nodes as one-dimensional horizontal lines arranged in unique rows. This is in distinct contrast to the traditional approach that represents nodes as discrete symbols that behave essentially as zero-dimensional points. BioFabric then depicts each edge in the network using a vertical line assigned to its own unique column, which spans between the source and target rows, i.e. nodes. This method of displaying the network allows a full-scale view to be organized in a rational fashion; interesting network structures, such as sets of nodes with similar connectivity, can be quickly scanned and visually identified in the full network view, even in networks with well over 100,000 edges. This approach means that the network is being represented as a fundamentally linear, sequential entity, where the horizontal scroll bar provides the basic navigation tool for browsing the entire network.CONCLUSIONS:BioFabric provides a novel and powerful way of looking at any size of network, including very large networks, using horizontal lines to represent nodes and vertical lines to represent edges. It is freely available as an open-source Java application.
BibTeX:
@article{Longabaugh2012,
  author = {Longabaugh, William},
  title = {Combing the hairball with BioFabric: a new approach for visualization of large networks},
  journal = {BMC Bioinformatics},
  year = {2012},
  volume = {13},
  number = {1},
  pages = {275},
  doi = {http://dx.doi.org/10.1186/1471-2105-13-275}
}
Lotan, I., Schwarzer, F., Halperin, D. and Latombe, J.-C. Algorithm and Data Structures for Efficient Energy Maintenance during Monte Carlo Simulation of Proteins 2004 Journal of Computational Biology
Vol. 11(5), pp. 902-932 
article DOI  
BibTeX:
@article{Lotan2004,
  author = {Lotan, Itay and Schwarzer, Fabian and Halperin, Dan and Latombe, Jean-Claude},
  title = {Algorithm and Data Structures for Efficient Energy Maintenance during Monte Carlo Simulation of Proteins},
  journal = {Journal of Computational Biology},
  year = {2004},
  volume = {11},
  number = {5},
  pages = {902--932},
  doi = {http://dx.doi.org/10.1089/cmb.2004.11.902}
}
Luke, S. Essentials of Metaheuristics 2009   book URL 
BibTeX:
@book{Luke2009,
  author = {Sean Luke},
  title = {Essentials of Metaheuristics},
  publisher = {self-published},
  year = {2009},
  url = {http://cs.gmu.edu/~sean/book/metaheuristics/}
}
Luke, S. and Panait, L. A survey and comparison of tree generation algorithms 2001 Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pp. 81-88  inproceedings URL 
BibTeX:
@inproceedings{Luke2001,
  author = {Sean Luke and Liviu Panait},
  title = {A survey and comparison of tree generation algorithms},
  booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001)},
  publisher = {Morgan Kaufman},
  year = {2001},
  pages = {81-88},
  url = {http://en.scientificcommons.org/453130}
}
Lyngsø, R.B. and Pedersen, C.N.S. Protein folding in the 2D HP model 2000 Proceedings of the 1st journees ouvertes: biologie, informatique et mathematiques (JOBIM 00)  inproceedings  
Abstract: We study folding algorithms in the two dimensional Hydrophobic-Hydrophilic model for protein structure formation. We consider three generalizations of the best known approximation algorithm. We show that two of the generalizations do not improve the worst case approximation ratio. The third generalization seems to be better. The analysis leads to an interesting combinatorial problem.
BibTeX:
@inproceedings{Lyngsoe2000,
  author = {Lyngsø, R. B. and C. N. S. Pedersen},
  title = {Protein folding in the 2D HP model},
  booktitle = {Proceedings of the 1st journees ouvertes: biologie, informatique et mathematiques (JOBIM 00)},
  year = {2000}
}
MacKerell, J.A. Empirical force fields for biological macromolecules: Overview and issues 2004 Journal of Computational Chemistry
Vol. 25(13), pp. 1584-1604 
article DOI  
BibTeX:
@article{MacKerell2004,
  author = {MacKerell, Jr Alexander},
  title = {Empirical force fields for biological macromolecules: Overview and issues},
  journal = {Journal of Computational Chemistry},
  year = {2004},
  volume = {25},
  number = {13},
  pages = {1584--1604},
  doi = {http://dx.doi.org/10.1002/jcc.20082}
}
Margraf, T., Schenk, G. and Torda, A.E. The SALAMI protein structure search server 2009 Nucl. Acids Res.
Vol. 37(suppl_2), pp. W480-484 
article DOI  
Abstract: Protein structures often show similarities to another which would not be seen at the sequence level. Given the coordinates of a protein chain, the SALAMI server at www.zbh.uni-hamburg.de/salami will search the protein data bank and return a set of similar structures without using sequence information. The results page lists the related proteins, details of the sequence and structure similarity and implied sequence alignments. Via a simple structure viewer, one can view superpositions of query and library structures and finally download superimposed coordinates. The alignment method is very tolerant of large gaps and insertions, and tends to produce slightly longer alignments than other similar programs.
BibTeX:
@article{Margraf2009,
  author = {Margraf, Thomas and Schenk, Gundolf and Torda, Andrew E.},
  title = {The SALAMI protein structure search server},
  journal = {Nucl. Acids Res.},
  year = {2009},
  volume = {37},
  number = {suppl_2},
  pages = {W480--484},
  note = {The method is based on probabilistic secondary structure profiles applied to +/- 6 residues windows. The server compares target protein against the PDB (uses precomputed, weekly updated profiles).
server URL: http://public.zbh.uni-hamburg.de/salami/}, doi = {http://dx.doi.org/10.1093/nar/gkp431} }
McGeoch, C. Analyzing algorithms by simulation: variance reduction techniques and simulation speedups 1992 ACM Comput. Surv.
Vol. 24(2), pp. 195-212 
article DOI  
BibTeX:
@article{McGeoch1992,
  author = {Catherine McGeoch},
  title = {Analyzing algorithms by simulation: variance reduction techniques and simulation speedups},
  journal = {ACM Comput. Surv.},
  publisher = {ACM Press},
  year = {1992},
  volume = {24},
  number = {2},
  pages = {195--212},
  doi = {http://dx.doi.org/10.1145/130844.130853}
}
McKinnon, K.I.M. Convergence of the Nelder-Mead simplex method to a nonstationary point 1999 SIAM Journal on Optimization
Vol. 9, pp. 148-158 
article  
BibTeX:
@article{McKinnon1999,
  author = {K. I. M. McKinnon},
  title = {Convergence of the Nelder-Mead simplex method to a nonstationary point},
  journal = {SIAM Journal on Optimization},
  year = {1999},
  volume = {9},
  pages = {148--158}
}
Mereghetti, P., Ganadu, M., Papaleo, E., Fantucci, P. and De Gioia, L. Validation of protein models by a neural network approach 2008 BMC Bioinformatics
Vol. 9(1), pp. 66 
article DOI  
Abstract: BACKGROUND:The development and improvement of reliable computational methods designed to evaluate the quality of protein models is relevant in the context of protein structure refinement, which has been recently identified as one of the bottlenecks limiting the quality and usefulness of protein structure prediction.RESULTS:In this contribution, we present a computational method (Artificial Intelligence Decoys Evaluator: AIDE) which is able to consistently discriminate between correct and incorrect protein models. In particular, the method is based on neural networks that use as input 15 structural parameters, which include energy, solvent accessible surface, hydrophobic contacts and secondary structure content. The results obtained with AIDE on a set of decoy structures were evaluated using statistical indicators such as Pearson correlation coefficients, Znat, fraction enrichment, as well as ROC plots. It turned out that AIDE performances are comparable and often complementary to available state-of-the-art learning-based methods.CONCLUSION:In light of the results obtained with AIDE, as well as its comparison with available learning-based methods, it can be concluded that AIDE can be successfully used to evaluate the quality of protein structures. The use of AIDE in combination with other evaluation tools is expected to further enhance protein refinement efforts.
BibTeX:
@article{Mereghetti2008,
  author = {Mereghetti, Paolo and Ganadu, Maria and Papaleo, Elena and Fantucci, Piercarlo and De Gioia, Luca},
  title = {Validation of protein models by a neural network approach},
  journal = {BMC Bioinformatics},
  year = {2008},
  volume = {9},
  number = {1},
  pages = {66},
  doi = {http://dx.doi.org/10.1186/1471-2105-9-66}
}
Miller, J.F. An empirical study of the efficiency of learning boolean functions using a Cartesian Genetic Programming approach 1999
Vol. 2Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1135-1142 
inproceedings URL 
Abstract: A new form of Genetic Programming (GP) called Cartesian Genetic Programming (CGP) is proposed in which programs are represented by linear integer chromosomes in the form of connections and functionalities of a rectangular array of primitive functions. The effectiveness of this approach is investigated for boolean even-parity functions (3,4,5), and the 2-bit multiplier. The minimum number of evaluations required to give a 0.99 probability of evolving a target function is used to measure the efficiency of the new approach. It is found that extremely low populations are most effective. A simple probabilistic hillclimber (PH) is devised which proves to be even more effective. For these boolean functions either method appears to be much more efficient than the GP and Evolutionary Programming (EP) methods reported. The efficacy of the PH suggests that boolean function learning may not be an appropriate problem for testing the effectiveness of GP and EP.
BibTeX:
@inproceedings{Miller1999,
  author = {Julian F. Miller},
  title = {An empirical study of the efficiency of learning boolean functions using a Cartesian Genetic Programming approach},
  booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference},
  year = {1999},
  volume = {2},
  pages = {1135--1142},
  url = {http://citeseer.ist.psu.edu/153431.html}
}
Misura, K.M.S., Chivian, D., Rohl, C.A., Kim, D.E. and Baker, D. Physically realistic homology models built with ROSETTA can be more accurate than their templates 2006 Proceedings of the National Academy of Sciences
Vol. 103(14), pp. 5361-5366 
article DOI  
Abstract: We have developed a method that combines the ROSETTA de novo protein folding and refinement protocol with distance constraints derived from homologous structures to build homology models that are frequently more accurate than their templates. We test this method by building complete-chain models for a benchmark set of 22 proteins, each with 1 or 2 candidate templates, for a total of 39 test cases. We use structure-based and sequence-based alignments for each of the test cases. All atoms, including hydrogens, are represented explicitly. The resulting models contain approximately the same number of atomic overlaps as experimentally determined crystal structures and maintain good stereochemistry. The most accurate models can be identified by their energies, and in 22 of 39 cases a model that is more accurate than the template over aligned regions is one of the 10 lowest-energy models.
BibTeX:
@article{Misura2006,
  author = {Misura, Kira M. S. and Chivian, Dylan and Rohl, Carol A. and Kim, David E. and Baker, David},
  title = {Physically realistic homology models built with ROSETTA can be more accurate than their templates},
  journal = {Proceedings of the National Academy of Sciences},
  year = {2006},
  volume = {103},
  number = {14},
  pages = {5361--5366},
  doi = {http://dx.doi.org/10.1073/pnas.0509355103}
}
Montana, D.J. Strongly Typed Genetic Programming 1995 Evolutionary Computation
Vol. 3(2), pp. 199-230 
article DOI  
BibTeX:
@article{Montana1995,
  author = {David J. Montana},
  title = {Strongly Typed Genetic Programming},
  journal = {Evolutionary Computation},
  year = {1995},
  volume = {3},
  number = {2},
  pages = {199-230},
  doi = {http://dx.doi.org/10.1162/evco.1995.3.2.199}
}
Moult, J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction 2005 Current Opinion in Structural Biology
Vol. 15(3), pp. 285-289 
article DOI  
Abstract: For the past ten years, CASP (Critical Assessment of Structure Prediction) has monitored the state of the art in modeling protein structure from sequence. During this period, there has been substantial progress in both comparative modeling of structure (using information from an evolutionarily related structural template) and template-free modeling. The quality of comparative models depends on the closeness of the evolutionary relationship on which they are based. Template-free modeling, although still very approximate, now produces topologically near correct models for some small proteins. Current major challenges are refining comparative models so that they match experimental accuracy, obtaining accurate sequence alignments for models based on remote evolutionary relationships, and extending template-free modeling methods so that they produce more accurate models, handle parts of comparative models not available from a template and deal with larger structures.
BibTeX:
@article{Moult2005a,
  author = {Moult, John},
  title = {A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction},
  journal = {Current Opinion in Structural Biology},
  year = {2005},
  volume = {15},
  number = {3},
  pages = {285--289},
  doi = {http://dx.doi.org/10.1016/j.sbi.2005.05.011}
}
Moult, J. and Fidelis, K. Critical assessment of methods of protein structure prediction (CASP) - Round 6 2005 Proteins: Structure, Function, and Bioinformatics
Vol. 61(S7), pp. 3-7 
article DOI  
BibTeX:
@article{Moult2005,
  author = {John Moult and Krzysztof Fidelis},
  title = {Critical assessment of methods of protein structure prediction (CASP) - Round 6},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2005},
  volume = {61},
  number = {S7},
  pages = {3--7},
  doi = {http://dx.doi.org/10.1002/prot.20716}
}
Murtagh, F. and Contreras, P. Algorithms for hierarchical clustering: an overview 2012 Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Vol. 2(1), pp. 86-97 
article DOI  
Abstract: We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally, we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm. © 2011 Wiley Periodicals, Inc.
BibTeX:
@article{Murtagh2012,
  author = {Murtagh, Fionn and Contreras, Pedro},
  title = {Algorithms for hierarchical clustering: an overview},
  journal = {Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery},
  year = {2012},
  volume = {2},
  number = {1},
  pages = {86--97},
  doi = {http://dx.doi.org/10.1002/widm.53}
}
Nelder, J. and Mead, R. A simplex method for function minimization 1964 The Computer Journal
Vol. 7, pp. 308-313 
article  
BibTeX:
@article{Nelder1964,
  author = {Nelder, J.A. and Mead, R.},
  title = {A simplex method for function minimization},
  journal = {The Computer Journal},
  year = {1964},
  volume = {7},
  pages = {308-313}
}
Newman, A. A new algorithm for protein folding in the HP model 2002 SODA 2002: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 876-884  inproceedings  
BibTeX:
@inproceedings{Newman2002,
  author = {Alantha Newman},
  title = {A new algorithm for protein folding in the HP model},
  booktitle = {SODA 2002: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms},
  publisher = {Society for Industrial and Applied Mathematics},
  year = {2002},
  pages = {876--884}
}
Offman, M.N., Fitzjohn, P.W. and Bates, P.A. Developing a move-set for protein model refinement 2006 Bioinformatics
Vol. 22(15), pp. 1838-1845 
article DOI  
Abstract: Motivation: A wide variety of methods for the construction of an atomic model for a given amino acid sequence are known, the more accurate being those that use experimentally determined structures as templates. However, far fewer methods are aimed at refining these models. The approach presented here carefully blends models created by several different means, in an attempt to combine the good quality regions from each into a final, more refined, model. Results: We describe here a number of refinement operators (collectively, move-set') that enable a relatively large region of conformational space to be searched. This is used within a genetic algorithm that reshuffles and repacks structural components. The utility of the move-set is demonstrated by introducing a cost function, containing both physical and other components guiding the input structures towards the target structure. We show that our move-set has the potential to improve the conformation of models and that this improvement can be beyond even the best template for some comparative modelling targets. Availability: The POPULUS software package and the source code are available at http://bmm.cancerresearchuk.org/~offman01/populus.html Contact: paul.bates@cancer.org.uk
BibTeX:
@article{Offman2006,
  author = {Offman, Marc N. and Fitzjohn, Paul W. and Bates, Paul A.},
  title = {Developing a move-set for protein model refinement},
  journal = {Bioinformatics},
  year = {2006},
  volume = {22},
  number = {15},
  pages = {1838--1845},
  doi = {http://dx.doi.org/10.1093/bioinformatics/btl192}
}
Oliphant, T.E. Python for Scientific Computing 2007 Computing in Science and Engineering
Vol. 9(3), pp. 10-20 
article DOI  
BibTeX:
@article{NumPy,
  author = {Oliphant, Travis E.},
  title = {Python for Scientific Computing},
  journal = {Computing in Science and Engineering},
  year = {2007},
  volume = {9},
  number = {3},
  pages = {10--20},
  doi = {http://dx.doi.org/10.1109/MCSE.2007.58}
}
Ortiz, A.R., Strauss, C.E. and Olmea, O. MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison 2002 Protein Science
Vol. 11(11), pp. 2606-2621 
article DOI  
Abstract: Advances in structural genomics and protein structure prediction require the design of automatic, fast, objective, and well benchmarked methods capable of comparing and assessing the similarity of low-resolution three-dimensional structures, via experimental or theoretical approaches. Here, a new method for sequence-independent structural alignment is presented that allows comparison of an experimental protein structure with an arbitrary low-resolution protein tertiary model. The heuristic algorithm is given and then used to show that it can describe random structural alignments of proteins with different folds with good accuracy by an extreme value distribution. From this observation, a structural similarity score between two proteins or two different conformations of the same protein is derived from the likelihood of obtaining a given structural alignment by chance. The performance of the derived score is then compared with well established, consensus manual-based scores and data sets. We found that the new approach correlates better than other tools with the gold standard provided by a human evaluator. Timings indicate that the algorithm is fast enough for routine use with large databases of protein models. Overall, our results indicate that the new program (MAMMOTH) will be a good tool for protein structure comparisons in structural genomics applications. MAMMOTH is available from our web site at .
BibTeX:
@article{Ortiz2002,
  author = {Angel R. Ortiz and Charlie E.M. Strauss and Osvaldo Olmea},
  title = {MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison},
  journal = {Protein Science},
  year = {2002},
  volume = {11},
  number = {11},
  pages = {2606--2621},
  note = {MAMMOTH is using URMS to build a similarity matrix. Then uses Needleman&Wunsch to align the backbones (with gaps penalties) and MaxSub heuristic to find a subset of residues in distance <= 4A.},
  doi = {http://dx.doi.org/10.1110/ps.0215902}
}
Overdijk, M. and Laforge, G. Gaelyk - lightweight Groovy toolkit for Google App Engine   webpage URL 
BibTeX:
@webpage{URL_GAELYK,
  author = {Marcel Overdijk and Guillaume Laforge},
  title = {Gaelyk - lightweight Groovy toolkit for Google App Engine},
  url = {http://gaelyk.appspot.com/}
}
Pande, V.S., Baker, I., Chapman, J., Elmer, S.P., Khaliq, S., Larson, S.M., Rhee, Y.M., Shirts, M.R., Snow, C.D., Sorin, E.J. and Zagrovic, B. Atomistic protein folding simulations on the submillisecond time scale using worldwide distributed computing 2003 Biopolymers
Vol. 68(1), pp. 91-109 
article DOI  
Abstract: Atomistic simulations of protein folding have the potential to be a great complement to experimental studies, but have been severely limited by the time scales accessible with current computer hardware and algorithms. By employing a worldwide distributed computing network of tens of thousands of PCs and algorithms designed to efficiently utilize this new many-processor, highly heterogeneous, loosely coupled distributed computing paradigm, we have been able to simulate hundreds of microseconds of atomistic molecular dynamics. This has allowed us to directly simulate the folding mechanism and to accurately predict the folding rate of several fast-folding proteins and polymers, including a nonbiological helix, polypeptide
BibTeX:
@article{Pande2003,
  author = {Pande, Vijay S. and Baker, Ian and Chapman, Jarrod and Elmer, Sidney P. and Khaliq, Siraj and Larson, Stefan M. and Rhee, Young Min and Shirts, Michael R. and Snow, Christopher D. and Sorin, Eric J. and Zagrovic, Bojan},
  title = {Atomistic protein folding simulations on the submillisecond time scale using worldwide distributed computing},
  journal = {Biopolymers},
  year = {2003},
  volume = {68},
  number = {1},
  pages = {91--109},
  doi = {http://dx.doi.org/10.1002/bip.10219}
}
Pandit, S.B., Zhang, Y. and Skolnick, J. TASSER-Lite: An Automated Tool for Protein Comparative Modeling 2006 Biophys. J.
Vol. 91(11), pp. 4180-4190 
article DOI  
Abstract: This study involves the development of a rapid comparative modeling tool for homologous sequences by extension of the TASSER methodology, developed for tertiary structure prediction. This comparative modeling procedure was validated on a representative benchmark set of proteins in the Protein Data Bank composed of 901 single domain proteins (41-200 residues) having sequence identities between 35-90% with respect to the template. Using a Monte Carlo search scheme with the length of runs optimized for weakly/nonhomologous proteins, TASSER often provides appreciable improvement in structure quality over the initial template. However, on average, this requires [~]29 h of CPU time per sequence. Since homologous proteins are unlikely to require the extent of conformational search as weakly/nonhomologous proteins, TASSER's parameters were optimized to reduce the required CPU time to [~]17 min, while retaining TASSER's ability to improve structure quality. Using this optimized TASSER (TASSER-Lite), we find an average improvement in the aligned region of [~]10% in root mean-square deviation from native over the initial template. Comparison of TASSER-Lite with the widely used comparative modeling tool MODELLER showed that TASSER-Lite yields final models that are closer to the native. TASSER-Lite is provided on the web at http://cssb.biology.gatech.edu/skolnick/webservice/tasserlite/index.html.
BibTeX:
@article{Pandit2006,
  author = {Pandit, Shashi Bhushan and Zhang, Yang and Skolnick, Jeffrey},
  title = {TASSER-Lite: An Automated Tool for Protein Comparative Modeling},
  journal = {Biophys. J.},
  year = {2006},
  volume = {91},
  number = {11},
  pages = {4180--4190},
  doi = {http://dx.doi.org/10.1529/biophysj.106.084293}
}
Papoian, G.A., Ulander, J., Eastwood, M.P., Luthey-Schulten, Z. and Wolynes, P.G. Water in protein structure prediction. 2004 Proc Natl Acad Sci U S A
Vol. 101(10), pp. 3352-3357 
article DOI  
Abstract: Proteins have evolved to use water to help guide folding. A physically motivated, nonpairwise-additive model of water-mediated interactions added to a protein structure prediction Hamiltonian yields marked improvement in the quality of structure prediction for larger proteins. Free energy profile analysis suggests that long-range water-mediated potentials guide folding and smooth the underlying folding funnel. Analyzing simulation trajectories gives direct evidence that water-mediated interactions facilitate native-like packing of supersecondary structural elements. Long-range pairing of hydrophilic groups is an integral part of protein architecture. Specific water-mediated interactions are a universal feature of biomolecular recognition landscapes in both folding and binding.
BibTeX:
@article{Papoian2004,
  author = {Garegin A Papoian and Johan Ulander and Michael P Eastwood and Zaida Luthey-Schulten and Peter G Wolynes},
  title = {Water in protein structure prediction.},
  journal = {Proc Natl Acad Sci U S A},
  year = {2004},
  volume = {101},
  number = {10},
  pages = {3352--3357},
  doi = {http://dx.doi.org/10.1073/pnas.0307851100}
}
Pearl, F.M.G., Bennett, C.F., Bray, J.E., Harrison, A.P., Martin, N., Shepherd, A., Sillitoe, I., Thornton, J. and Orengo, C.A. The CATH database: an extended protein family resource for structural and functional genomics 2003 Nucl. Acids Res.
Vol. 31(1), pp. 452-455 
article DOI  
Abstract: The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath_new) currently contains 34 287 domain structures classified into 1383 superfamilies and 3285 sequence families. Each structural family is expanded with domain sequence relatives recruited from GenBank using a variety of efficient sequence search protocols and reliable thresholds. This extended resource, known as the CATH-protein family database (CATH-PFDB) contains a total of 310 000 domain sequences classified into 26 812 sequence families. New sequence search protocols have been designed, based on these intermediate sequence libraries, to allow more regular updating of the classification. Further developments include the adaptation of a recently developed method for rapid structure comparison, based on secondary structure matching, for domain boundary assignment. The philosophy behind CATHEDRAL is the recognition of recurrent folds already classified in CATH. Benchmarking of CATHEDRAL, using manually validated domain assignments, demonstrated that 43% of domains boundaries could be completely automatically assigned. This is an improvement on a previous consensus approach for which only 10-20% of domains could be reliably processed in a completely automated fashion. Since domain boundary assignment is a significant bottleneck in the classification of new structures, CATHEDRAL will also help to increase the frequency of CATH updates.
BibTeX:
@article{Pearl2003,
  author = {Pearl, F. M. G. and Bennett, C. F. and Bray, J. E. and Harrison, A. P. and Martin, N. and Shepherd, A. and Sillitoe, I. and Thornton, J. and Orengo, C. A.},
  title = {The CATH database: an extended protein family resource for structural and functional genomics},
  journal = {Nucl. Acids Res.},
  year = {2003},
  volume = {31},
  number = {1},
  pages = {452--455},
  doi = {http://dx.doi.org/10.1093/nar/gkg062}
}
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. Scikit-learn: Machine Learning in Python 2011 Journal of Machine Learning Research
Vol. 12, pp. 2825-2830 
article URL 
BibTeX:
@article{scikit-learn,
  author = {Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
  title = {Scikit-learn: Machine Learning in Python},
  journal = {Journal of Machine Learning Research},
  year = {2011},
  volume = {12},
  pages = {2825--2830},
  url = {http://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html}
}
Pelta, D., Gonzalez, J. and Moreno Vega, M. A simple and fast heuristic for protein structure comparison 2008 BMC Bioinformatics
Vol. 9(1), pp. 161- 
article DOI  
Abstract: BACKGROUND:Protein structure comparison is a key problem in bioinformatics. There exist several methods for doing protein comparison, being the solution of the Maximum Contact Map Overlap problem (MAX-CMO) one of the alternatives available. Although this problem may be solved using exact algorithms, researchers require approximate algorithms that obtain good quality solutions using less computational resources than the formers.RESULTS:We propose a variable neighborhood search metaheuristic for solving MAX-CMO. We analyze this strategy in two aspects: 1) from an optimization point of view the strategy is tested on two different datasets, obtaining an error of 3.5%(over 2702 pairs) and 1.7% (over 161 pairs) with respect to optimal values; thus leading to high accurate solutions in a simpler and less expensive way than exact algorithms; 2) in terms of protein structure classification, we conduct experiments on three datasets and show that is feasible to detect structural similarities at SCOP's family and CATH's architecture levels using normalized overlap values. Some limitations and the role of normalization are outlined for doing classification at SCOP's fold level.CONCLUSION:We designed, implemented and tested.a new tool for solving MAX-CMO, based on a well-known metaheuristic technique. The good balance between solution's quality and computational effort makes it a valuable tool. Moreover, to the best of our knowledge, this is the first time the MAX-CMO measure is tested at SCOP's fold and CATH's architecture levels with encouraging results.Software is available for download at http://modo.ugr.es/jrgonzalez/msvns4maxcmo.
BibTeX:
@article{Pelta2008,
  author = {Pelta, David and Gonzalez, Juan and Moreno Vega, Marcos},
  title = {A simple and fast heuristic for protein structure comparison},
  journal = {BMC Bioinformatics},
  year = {2008},
  volume = {9},
  number = {1},
  pages = {161--},
  doi = {http://dx.doi.org/10.1186/1471-2105-9-161}
}
Pelta, D., Krasnogor, N., Bousono-Calzon, C., Verdegay, J., Hirst, J. and Burke, E. A fuzzy sets based generalization of contact maps for the overlap of protein structures 2005 Journal of Fuzzy Sets and Systems
Vol. 152(2), pp. 103-123 
article DOI  
BibTeX:
@article{Pelta2005,
  author = {D.A. Pelta and N. Krasnogor and C. Bousono-Calzon and J.L. Verdegay and J. Hirst and E.K. Burke},
  title = {A fuzzy sets based generalization of contact maps for the overlap of protein structures},
  journal = {Journal of Fuzzy Sets and Systems},
  year = {2005},
  volume = {152},
  number = {2},
  pages = {103-123},
  doi = {http://dx.doi.org/10.1016/j.fss.2004.10.017}
}
Perkins, A. and Langston, M. Threshold selection in gene co-expression networks using spectral graph theory techniques 2009 BMC Bioinformatics
Vol. 10(Suppl 11), pp. S4 
article DOI  
Abstract: BACKGROUND: Gene co-expression networks are often constructed by computing some measure of similarity between expression levels of gene transcripts and subsequently applying a high-pass filter to remove all but the most likely biologically-significant relationships. The selection of this expression threshold necessarily has a significant effect on any conclusions derived from the resulting network. Many approaches have been taken to choose an appropriate threshold, among them computing levels of statistical significance, accepting only the top one percent of relationships, and selecting an arbitrary expression cutoff. RESULTS: We apply spectral graph theory methods to develop a systematic method for threshold selection. Eigenvalues and eigenvectors are computed for a transformation of the adjacency matrix of the network constructed at various threshold values. From these, we use a basic spectral clustering method to examine the set of gene-gene relationships and select a threshold dependent upon the community structure of the data. This approach is applied to two well-studied microarray data sets from Homo sapiens and Saccharomyces cerevisiae. CONCLUSION: This method presents a systematic, data-based alternative to using more artificial cutoff values and results in a more conservative approach to threshold selection than some other popular techniques such as retaining only statistically-significant relationships or setting a cutoff to include a percentage of the highest correlations.
BibTeX:
@article{Perkins2009,
  author = {Perkins, Andy and Langston, Michael},
  title = {Threshold selection in gene co-expression networks using spectral graph theory techniques},
  journal = {BMC Bioinformatics},
  year = {2009},
  volume = {10},
  number = {Suppl 11},
  pages = {S4},
  doi = {http://dx.doi.org/10.1186/1471-2105-10-S11-S4}
}
Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kale, L. and Schulten, K. Scalable molecular dynamics with NAMD 2005 Journal of Computational Chemistry
Vol. 26, pp. 1781-1802 
article URL 
BibTeX:
@article{Phillips2005,
  author = {James C. Phillips and Rosemary Braun and Wei Wang and James Gumbart and Emad Tajkhorshid and Elizabeth Villa and Christophe Chipot and Robert D. Skeel and Laxmikant Kale and Klaus Schulten},
  title = {Scalable molecular dynamics with NAMD},
  journal = {Journal of Computational Chemistry},
  year = {2005},
  volume = {26},
  pages = {1781-1802},
  url = {http://www.ks.uiuc.edu/Research/namd/}
}
Pierro, M.D. web2py web framework   webpage URL 
BibTeX:
@webpage{URL_WEB2PY,
  author = {Massimo Di Pierro},
  title = {web2py web framework},
  url = {http://www.web2py.com/}
}
Poleksic, A. Algorithms for optimal protein structure alignment 2009 Bioinformatics
Vol. 25(21), pp. 2751-2756 
article DOI  
Abstract: Motivation: Structural alignment is an important tool for understanding the evolutionary relationships between proteins. However, finding the best pairwise structural alignment is difficult, due to the infinite number of possible superpositions of two structures. Unlike the sequence alignment problem, which has a polynomial time solution, the structural alignment problem has not been even classified as solvable. Results: We study one of the most widely used measures of protein structural similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. We prove that, for any two proteins, this measure can be optimized for all but finitely many distance cutoffs. Our method leads to a series of algorithms for optimizing other structure similarity measures, including the measures commonly used in protein structure prediction experiments. We also present a polynomial time algorithm for finding a near-optimal superposition of two proteins. Aside from having a relatively low cost, the algorithm for near-optimal solution returns a superposition of provable quality. In other words, the difference between the score of the returned superposition and the score of an optimal superposition can be explicitly computed and used to determine whether the returned superposition is, in fact, the best superposition. Contact: poleksic@cs.uni.edu Supplementary information: Supplementary data are available at Bioinformatics online.
BibTeX:
@article{Poleksic2009,
  author = {Poleksic, Aleksandar},
  title = {Algorithms for optimal protein structure alignment},
  journal = {Bioinformatics},
  year = {2009},
  volume = {25},
  number = {21},
  pages = {2751--2756},
  doi = {http://dx.doi.org/10.1093/bioinformatics/btp530}
}
Poli, R., Langdon, W.B. and McPhee, N.F. A field guide to genetic programming 2008   book URL 
Abstract: Genetic programming (GP) is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until high-fitness solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions.
This unique overview of this exciting technique is written by three of the most active scientists in GP. See www.gp-field-guide.org.uk for more information on the book.
BibTeX:
@book{Poli2008,
  author = {Riccardo Poli and William B. Langdon and Nicholas Freitag McPhee},
  title = {A field guide to genetic programming},
  publisher = {Published via http://lulu.com.},
  year = {2008},
  note = {with contributions by J.R. Koza},
  url = {http://www.gp-field-guide.org.uk}
}
Ponder, J.W. and Case, D.A. Force fields for protein simulations. 2003 Adv Protein Chem
Vol. 66, pp. 27-85 
article  
BibTeX:
@article{Ponder2003,
  author = {Jay W Ponder and David A Case},
  title = {Force fields for protein simulations.},
  journal = {Adv Protein Chem},
  year = {2003},
  volume = {66},
  pages = {27--85}
}
Rahmati, S. and Glasgow, J.I. Comparing protein contact maps via Universal Similarity Metric: an improvement in the noise-tolerance 2009 International Journal of Computational Biology and Drug Design
Vol. 2(2), pp. 149-167 
article DOI  
Abstract: Comparing protein structures based on their contact maps similarity is an important problem in molecular biology. One motivation to seek fast algorithms for comparing contact maps is devising systems for reconstructing three-dimensional structure of proteins from their predicted contact maps. In this paper, we propose an algorithm to apply the Universal Similarity Metric (USM) to contact map comparison problem in a two-dimensional space. The major advantage of this algorithm is the highly improved noise-tolerance of the metric in comparison with its previous one-dimensional implementations. This is the first successful attempt to apply the USM to two-dimensional objects, without reducing their dimensionality.
BibTeX:
@article{Rahmati2009,
  author = {Rahmati, Sara and Glasgow, Janice I.},
  title = {Comparing protein contact maps via Universal Similarity Metric: an improvement in the noise-tolerance},
  journal = {International Journal of Computational Biology and Drug Design},
  year = {2009},
  volume = {2},
  number = {2},
  pages = {149--167},
  doi = {http://dx.doi.org/10.1504/IJCBDD.2009.028821}
}
Rardin, R.L. and Uzsoy, R. Experimental Evaluation of Heuristic Optimization Algorithms: A Tutorial 2001 Journal of Heuristics
Vol. 7(3), pp. 261-304 
article DOI  
Abstract: Heuristic optimization algorithms seek good feasible solutions to optimization problems in circumstances where the complexities of the problem or the limited time available for solution do not allow exact solution. Although worst case and probabilistic analysis of algorithms have produced insight on some classic models, most of the heuristics developed for large optimization problem must be evaluated empirically—by applying procedures to a collection of specific instances and comparing the observed solution quality and computational burden.
BibTeX:
@article{Rardin2001,
  author = {Rardin, Ronald L. and Uzsoy, Reha},
  title = {Experimental Evaluation of Heuristic Optimization Algorithms: A Tutorial},
  journal = {Journal of Heuristics},
  year = {2001},
  volume = {7},
  number = {3},
  pages = {261--304},
  doi = {http://dx.doi.org/10.1023/A:1011319115230}
}
Rocha, J. and Alberich, R. The Significance of the ProtDeform Score for Structure Prediction and Alignment 2011 PLoS ONE
Vol. 6(6), pp. e20889 
article DOI  
Abstract: Background: When a researcher uses a program to align two proteins and gets a score, one of her main concerns is how often the program gives a similar score to pairs that are or are not in the same fold. This issue was analysed in detail recently for the program TM-align with its associated TM-score. It was shown that because the TM-score is length independent, it allows a P-value and a hit probability to be defined depending only on the score. Also, it was found that the TM-scores of gapless alignments closely follow an Extreme Value Distribution (EVD).
The program ProtDeform for structural protein alignment was developed recently and is characterised by the ability to propose different transformations of different protein regions. Our goal is to analyse its associated score to allow a researcher to have objective reasons to prefer one aligner over another, and carry out a better interpretation of the output.
Results: The study on the ProtDeform score reveals that it is length independent in a wider score range than TM-scores and that PD-scores of gapless (random) alignments also approximately follow an EVD. On the CASP8 predictions, PD-scores and TM-scores, with respect to native structures, are highly correlated (0.95), and show that around a fifth of the predictions have a quality as low as 99.5% of the random scores. Using the Gold Standard benchmark, ProtDeform has lower probabilities of error than TM-align both at a similar speed. The analysis is extended to homology discrimination showing that, again, ProtDeform offers higher hit probabilities than TM-align. Finally, we suggest using three different P-values according to the three different contexts: Gapless alignments, optimised alignments for fold discrimination and that for superfamily discrimination.
In conclusion, PD-scores are at the very least as valuable for prediction scoring as TM-scores, and on the protein classification problem, even more reliable.
BibTeX:
@article{Rocha2011,
  author = {Rocha, Jairo and Alberich, Ricardo},
  title = {The Significance of the ProtDeform Score for Structure Prediction and Alignment},
  journal = {PLoS ONE},
  publisher = {Public Library of Science},
  year = {2011},
  volume = {6},
  number = {6},
  pages = {e20889},
  doi = {http://dx.doi.org/10.1371/journal.pone.0020889}
}
Rocha, J., Rosselló, F. and Segura, J. Compression ratios based on the Universal Similarity Metric still yield protein distances far from CATH distances 2006 q-bio/0603007, pp. -  unpublished URL 
Abstract: Kolmogorov complexity has inspired several alignment-free distance measures, based on the comparison of lengths of compressions, which have been applied successfully in many areas. One of these measures, the so-called Universal Similarity Metric (USM), has been used by Krasnogor and Pelta to compare simple protein contact maps, showing that it yielded good clustering on four small datasets. We report an extensive test of this metric using a much larger and representative protein dataset: the domain dataset used by Sierk and Pearson to evaluate seven protein structure comparison methods and two protein sequence comparison methods. One result is that Krasnogor-Pelta method has less domain discriminant power than any one of the methods considered by Sierk and Pearson when using these simple contact maps. In another test, we found that the USM based distance has low agreement with the CATH tree structure for the same benchmark of Sierk and Pearson. In any case, its agreement is lower than the one of a standard sequential alignment method, SSEARCH. Finally, we manually found lots of small subsets of the database that are better clustered using SSEARCH than USM, to confirm that Krasnogor-Pelta's conclusions were based on datasets that were too small. Comment: 11 pages; It replaces the former "The Universal Similarity Metric does not detect domain similarity." This version reports on more extensive tests
BibTeX:
@unpublished{Rocha2006,
  author = {Rocha, Jairo and Rosselló, Francesc and Segura, Joan},
  title = {Compression ratios based on the Universal Similarity Metric still yield protein distances far from CATH distances},
  journal = {q-bio/0603007},
  year = {2006},
  pages = {--},
  note = {submitted},
  url = {http://arXiv.org/abs/q-bio/0603007}
}
Rocha, J., Segura, J., Wilson, R.C. and Dasgupta, S. Flexible structural protein alignment by a sequence of local transformations 2009 Bioinformatics
Vol. 25(13), pp. 1625-1631 
article DOI  
Abstract: Motivation: Throughout evolution, homologous proteins have common regions that stay semi-rigid relative to each other and other parts that vary in a more noticeable way. In order to compare the increasing number of structures in the PDB, flexible geometrical alignments are needed, that are reliable and easy to use.
Results: We present a protein structure alignment method whose main feature is the ability to consider different rigid transformations at different sites, allowing for deformations beyond a global rigid transformation. The performance of the method is comparable with that of the best ones from 10 aligners tested, regarding both the quality of the alignments with respect to hand curated ones, and the classification ability. An analysis of some structure pairs from the literature that need to be matched in a flexible fashion are shown. The use of a series of local transformations can be exported to other classifiers, and a future golden protein similarity measure could benefit from it.
Availability: A public server for the program is available at http://dmi.uib.es/ProtDeform/. Contact: jairo@uib.es Supplementary information: All data used, results and examples are available at http://dmi.uib.es/people/jairo/bio/ProtDeform.
BibTeX:
@article{Rocha2009,
  author = {Rocha, Jairo and Segura, Joan and Wilson, Richard C. and Dasgupta, Swagata},
  title = {Flexible structural protein alignment by a sequence of local transformations},
  journal = {Bioinformatics},
  year = {2009},
  volume = {25},
  number = {13},
  pages = {1625--1631},
  doi = {http://dx.doi.org/10.1093/bioinformatics/btp296}
}
Rodaebel, T. and Glanzner, F. TyphoonAE - environment to run Google App Engine (Python) applications   webpage URL 
BibTeX:
@webpage{URL_TYPHOONAE,
  author = {Tobias Rodaebel and Florian Glanzner},
  title = {TyphoonAE - environment to run Google App Engine (Python) applications},
  url = {http://code.google.com/p/typhoonae/}
}
Rohl, C.A., Strauss, C.E.M., Misura, K.M.S. and Baker, D. Protein Structure Prediction Using Rosetta 2004
Vol. Volume 383Numerical Computer Methods, Part D, pp. 66-93 
incollection DOI  
BibTeX:
@incollection{Rohl2004,
  author = {Rohl, Carol A. and Strauss, Charlie E. M. and Misura, Kira M. S. and Baker, David},
  title = {Protein Structure Prediction Using Rosetta},
  booktitle = {Numerical Computer Methods, Part D},
  publisher = {Academic Press},
  year = {2004},
  volume = {Volume 383},
  pages = {66--93},
  doi = {http://dx.doi.org/10.1016/S0076-6879(04)83004-0}
}
Rose, G., Fleming, P., Banavar, J. and Maritan, A. A backbone-based theory of protein folding 2006 Proceedings of the National Academy of Sciences
Vol. 103(45), pp. 16623-33 
article DOI  
BibTeX:
@article{Rose2006,
  author = {Rose, George and Fleming, Patrick and Banavar, Jayanth and Maritan, Amos},
  title = {A backbone-based theory of protein folding},
  journal = {Proceedings of the National Academy of Sciences},
  year = {2006},
  volume = {103},
  number = {45},
  pages = {16623--33},
  doi = {http://dx.doi.org/10.1073/pnas.0606843103}
}
Rychlewski, L. and Fischer, D. LiveBench-8: The large-scale, continuous assessment of automated protein structure prediction 2005 Protein Science
Vol. 14(1), pp. 240-245 
article DOI  
Abstract: We present the results of the evaluation of the latest LiveBench-8 experiment. These results provide a snapshot view of the state of the art in automated protein structure prediction, just before the 2004 CAFASP-4/CASP-6 experiments begin. The last CAFASP/CASP experiments demonstrated that automated meta-predictors entail a significant advance in the field, already challenging most human expert predictors. LiveBench-8 corroborates the superior performance of meta-predictors, which are able to produce useful predictions for over one-half of the test targets. More importantly, LiveBench-8 identifies a handful of recently developed autonomous (nonmeta) servers that perform at the very top, suggesting that further progress in the individual methods has recently been obtained.
BibTeX:
@article{Rychlewski2005,
  author = {Rychlewski, Leszek and Fischer, Daniel},
  title = {LiveBench-8: The large-scale, continuous assessment of automated protein structure prediction},
  journal = {Protein Science},
  year = {2005},
  volume = {14},
  number = {1},
  pages = {240--245},
  doi = {http://dx.doi.org/10.1110/ps.04888805}
}
Sammoud, O., Sorlin, S., Solnon, C. and Ghédira, K. A Comparative Study of Ant Colony Optimization and Reactive Search for Graph Matching Problems 2006 (3906)6th European Conference on Evolutionary Computation in Combinatorial Optimization (EvoCOP 2006), pp. 234-24  inproceedings URL 
BibTeX:
@inproceedings{Sammoud2006,
  author = {Olfa Sammoud and Sébastien Sorlin and Christine Solnon and Khaled Ghédira},
  title = {A Comparative Study of Ant Colony Optimization and Reactive Search for Graph Matching Problems},
  booktitle = {6th European Conference on Evolutionary Computation in Combinatorial Optimization (EvoCOP 2006)},
  publisher = {Springer},
  year = {2006},
  number = {3906},
  pages = {234-24},
  url = {http://liris.cnrs.fr/publis/?id=2363}
}
Santana, R., Larranaga, P. and Lozano, J. Protein Folding in Simplified Models With Estimation of Distribution Algorithms 2008 Evolutionary Computation, IEEE Transactions on
Vol. 12(4)Evolutionary Computation, IEEE Transactions on, pp. 418-438 
article DOI  
Abstract: Simplified lattice models have played an important role in protein structure prediction and protein folding problems. These models can be useful for an initial approximation of the protein structure, and for the investigation of the dynamics that govern the protein folding process. Estimation of distribution algorithms (EDAs) are efficient evolutionary algorithms that can learn and exploit the search space regularities in the form of probabilistic dependencies. This paper introduces the application of different variants of EDAs to the solution of the protein structure prediction problem in simplified models, and proposes their use as a simulation tool for the analysis of the protein folding process. We develop new ideas for the application of EDAs to the bidimensional and tridimensional (2-d and 3-d) simplified protein folding problems. This paper analyzes the rationale behind the application of EDAs to these problems, and elucidates the relationship between our proposal and other population-based approaches proposed for the protein folding problem. We argue that EDAs are an efficient alternative for many instances of the protein structure prediction problem and are indeed appropriate for a theoretical analysis of search procedures in lattice models. All the algorithms introduced are tested on a set of difficult 2-d and 3-d instances from lattice models. Some of the results obtained with EDAs are superior to the ones obtained with other well-known population-based optimization algorithms.
BibTeX:
@article{Santana2008,
  author = {Santana, R. and Larranaga, P. and Lozano, J.A.},
  title = {Protein Folding in Simplified Models With Estimation of Distribution Algorithms},
  booktitle = {Evolutionary Computation, IEEE Transactions on},
  journal = {Evolutionary Computation, IEEE Transactions on},
  year = {2008},
  volume = {12},
  number = {4},
  pages = {418--438},
  doi = {http://dx.doi.org/10.1109/TEVC.2007.906095}
}
Shah, A., Folino, G. and Krasnogor, N. Toward High-Throughput, Multicriteria Protein-Structure Comparison and Analysis 2010 IEEE Transactions on NanoBioscience
Vol. 9(2)NanoBioscience, IEEE Transactions on, pp. 144-155 
article DOI  
Abstract: Protein-structure comparison (PSC) is an essential component of biomedical research as it impacts on, e.g., drug design, molecular docking, protein folding and structure prediction algorithms as well as being essential to the assessment of these predictions. Each of these applications, as well as many others where molecular comparison plays an important role, requires a different notion of similarity that naturally lead to the multicriteria PSC (MC-PSC) problem. Protein (Structure) Comparison, Knowledge, Similarity, and Information (ProCKSI) (www.procksi.org) provides algorithmic solutions for the MC-PSC problem by means of an enhanced structural comparison that relies on the principled application of information fusion to similarity assessments derived from multiple comparison methods. Current MC-PSC works well for moderately sized datasets and it is time consuming as it provides public service to multiple users. Many of the structural bioinformatics applications mentioned above would benefit from the ability to perform, for a dedicated user, thousands or tens of thousands of comparisons through multiple methods in real time, a capacity beyond our current technology. In this paper, we take a key step into that direction by means of a high-throughput distributed reimplementation of ProCKSI for very large datasets. The core of the proposed framework lies in the design of an innovative distributed algorithm that runs on each compute node in a cluster/grid environment to perform structure comparison of a given subset of input structures using some of the most popular PSC methods [e.g., universal similarity metric (USM), maximum contact map overlap (MaxCMO), fast alignment and search tool (FAST), distance alignment (DaliLite), combinatorial extension (CE), template modeling alignment (TMAlign)]. We follow this with a procedure of distributed consensus building. Thus, the new algorithms proposed here achieve ProCKSI's similarity assessment quality but with a fraction of the time required by it. Our results show that the proposed distributed method can be used efficiently to compare: 1) a particular protein against a very large protein structures dataset (target-against-all comparison), and 2) a particular very large-scale dataset against itself or against another very large-scale dataset (all-against-all comparison). We conclude the paper by enumerating some of the outstanding challenges for real-time MC-PSC.
BibTeX:
@article{Shah2010,
  author = {Shah, A.A. and Folino, G. and Krasnogor, N.},
  title = {Toward High-Throughput, Multicriteria Protein-Structure Comparison and Analysis},
  booktitle = {NanoBioscience, IEEE Transactions on},
  journal = {IEEE Transactions on NanoBioscience},
  year = {2010},
  volume = {9},
  number = {2},
  pages = {144--155},
  doi = {http://dx.doi.org/10.1109/TNB.2010.2043851}
}
Shen, M.-y. and Sali, A. Statistical potential for assessment and prediction of protein structures 2006 Protein Science
Vol. 15(11), pp. 2507-2524 
article DOI  
Abstract: Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
BibTeX:
@article{Shen2006,
  author = {Shen, Min-yi and Sali, Andrej},
  title = {Statistical potential for assessment and prediction of protein structures},
  journal = {Protein Science},
  year = {2006},
  volume = {15},
  number = {11},
  pages = {2507--2524},
  doi = {http://dx.doi.org/10.1110/ps.062416606}
}
Shindyalov, I. and Bourne, P. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path 1998 Protein Eng.
Vol. 11(9), pp. 739-747 
article DOI  
Abstract: A new algorithm is reported which builds an alignment between two protein structures. The algorithm involves a combinatorial extension (CE) of an alignment path defined by aligned fragment pairs (AFPs) rather than the more conventional techniques using dynamic programming and Monte Carlo optimization. AFPs, as the name suggests, are pairs of fragments, one from each protein, which confer structure similarity. AFPs are based on local geometry, rather than global features such as orientation of secondary structures and overall topology. Combinations of AFPs that represent possible continuous alignment paths are selectively extended or discarded thereby leading to a single optimal alignment. The algorithm is fast and accurate in finding an optimal structure alignment and hence suitable for database scanning and detailed analysis of large protein families. The method has been tested and compared with results from Dali and VAST using a representative sample of similar structures. Several new structural similarities not detected by these other methods are reported. Specific one-on-one alignments and searches against all structures as found in the Protein Data Bank (PDB) can be performed via the web at http://cl.sdsc.edu/ce.html
BibTeX:
@article{Shindyalov1998,
  author = {Shindyalov, IN and Bourne, PE},
  title = {Protein structure alignment by incremental combinatorial extension (CE) of the optimal path},
  journal = {Protein Eng.},
  year = {1998},
  volume = {11},
  number = {9},
  pages = {739--747},
  doi = {http://dx.doi.org/10.1093/protein/11.9.739}
}
Shirts, M. and Pande, V. COMPUTING: Screen Savers of the World Unite! 2000 Science
Vol. 290(5498), pp. 1903-4 
article DOI  
BibTeX:
@article{Shirts2000,
  author = {Shirts, Michael and Pande, Vijay},
  title = {COMPUTING: Screen Savers of the World Unite!},
  journal = {Science},
  year = {2000},
  volume = {290},
  number = {5498},
  pages = {1903--4},
  doi = {http://dx.doi.org/10.1126/science.290.5498.1903}
}
Shortle, D., Simons, K.T. and Baker, D. Clustering of low-energy conformations near the native structures of small proteins 1998 Proceedings of the National Academy of Sciences of the United States of America
Vol. 95(19), pp. 11158-11162 
article URL 
Abstract: Recent experimental studies of the denatured state and theoretical analyses of the folding landscape suggest that there are a large multiplicity of low-energy, partially folded conformations near the native state. In this report, we describe a strategy for predicting protein structure based on the working hypothesis that there are a greater number of low-energy conformations surrounding the correct fold than there are surrounding low-energy incorrect folds. To test this idea, 12 ensembles of 500 to 1,000 low-energy structures for 10 small proteins were analyzed by calculating the rms deviation of the Cα coordinates between each conformation and every other conformation in the ensemble. In all 12 cases, the conformation with the greatest number of conformations within 4-Å rms deviation was closer to the native structure than were the majority of conformations in the ensemble, and in most cases it was among the closest 1 to 5%. These results suggest that, to fold efficiently and retain robustness to changes in amino acid sequence, proteins may have evolved a native structure situated within a broad basin of low-energy conformations, a feature which could facilitate the prediction of protein structure at low resolution.
BibTeX:
@article{Shortle1998,
  author = {Shortle, David and Simons, Kim T. and Baker, David},
  title = {Clustering of low-energy conformations near the native structures of small proteins},
  journal = {Proceedings of the National Academy of Sciences of the United States of America},
  year = {1998},
  volume = {95},
  number = {19},
  pages = {11158--11162},
  url = {http://www.pnas.org/content/95/19/11158.full}
}
Siew, N. and Fischer, D. Convergent evolution of protein structure prediction and computer chess tournaments: CASP, Kasparov, and CAFASP 2001 IBM Syst. J.
Vol. 40(2), pp. 410-425 
article DOI  
Abstract: Predicting the three-dimensional structure of a protein from its amino acid sequence is one of the most important current problems of modern biology. The CASP (Critical Assessment of Structure Prediction) blind prediction experiments aim to assess the prediction capabilities in the field. A limitation of CASP is that predictions are prepared and filed by humans using programs, and thus, what is being evaluated is the performance of the predicting groups rather than the performance of the programs themselves. To address this limitation, the Critical Assessment of Fully Automated Structure Prediction (CAFASP) experiment was initiated in 1998. In CAFASP, the participants are programs or Internet servers, and what is evaluated are their automatic results without allowing any human intervention. In this paper, we review in brief the current state of protein structure prediction and describe what has been learned from the CAFASP1 experiment, the evolution toward CAFASP2, and how we foresee the future of automated structure prediction. We observe that the histories of "in silico" structure prediction experiments and computer chess tournaments show some striking similarities as well as some differences. We question whether the major advances in automated protein structure prediction stem from novel insights of the protein folding problem, of protein evolution and function, or merely from the technical advances in the ways the evolutionary information available in the biological databases is exploited. We conclude with a speculation about the future, where interesting chess might only be observed in computer games and where the interpretation of the information encoded in the human genome may be achieved mainly through in silico biology.
BibTeX:
@article{Siew2001,
  author = {Siew, N. and Fischer, D.},
  title = {Convergent evolution of protein structure prediction and computer chess tournaments: CASP, Kasparov, and CAFASP},
  journal = {IBM Syst. J.},
  year = {2001},
  volume = {40},
  number = {2},
  pages = {410--425},
  doi = {http://dx.doi.org/10.1147/sj.402.0410}
}
Silva, S. and Costa, E. Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories 2009 Genetic Programming and Evolvable Machines
Vol. 10(2), pp. 141-179 
article DOI  
Abstract: Bloat is an excess of code growth without a corresponding improvement in fitness. This is a serious problem in Genetic Programming, often leading to the stagnation of the evolutionary process. Here we provide an extensive review of all the past and current theories regarding why bloat occurs. After more than 15&nbsp;years of intense research, recent work is shedding new light on what may be the real reasons for the bloat phenomenon. We then introduce Dynamic Limits, our new approach to bloat control. It implements a dynamic limit that can be raised or lowered, depending on the best solution found so far, and can be applied either to the depth or size of the programs being evolved. Four problems were used as a benchmark to study the efficiency of Dynamic Limits. The quality of the results is highly dependent on the type of limit used: depth or size. The depth variants performed very well across the set of problems studied, achieving similar fitness to the baseline technique while using significantly smaller trees. Unlike many other methods available so far, Dynamic Limits does not require specific genetic operators, modifications in fitness evaluation or different selection schemes, nor does it add any parameters to the search process. Furthermore, its implementation is simple and its efficiency does not rely on the usage of a static upper limit. The results are discussed in the context of the newest bloat theory.
BibTeX:
@article{Silva2009,
  author = {Silva, Sara and Costa, Ernesto},
  title = {Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories},
  journal = {Genetic Programming and Evolvable Machines},
  year = {2009},
  volume = {10},
  number = {2},
  pages = {141--179},
  doi = {http://dx.doi.org/10.1007/s10710-008-9075-9}
}
Simons, K.T., Ruczinski, I., Kooperberg, C., Fox, B.A., Bystroff, C. and Baker, D. Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins 1999 Proteins: Structure, Function, and Genetics
Vol. 34(1), pp. 82-95 
article DOI  
Abstract: We describe the development of a scoring function based on the decomposition P(structure|sequence) ? P(sequence|structure) *P(structure), which outperforms previous scoring functions in correctly identifying native-like protein structures in large ensembles of compact decoys. The first term captures sequence-dependent features of protein structures, such as the burial of hydrophobic residues in the core, the second term, universal sequence-independent features, such as the assembly of &bgr;-strands into &bgr;-sheets. The efficacies of a wide variety of sequence-dependent and sequence-independent features of protein structures for recognizing native-like structures were systematically evaluated using ensembles of &ap;30,000 compact conformations with fixed secondary structure for each of 17 small protein domains. The best results were obtained using a core scoring function with P(sequence|structure) parameterized similarly to our previous work (Simons et al., J Mol Biol 1997;268:209-225] and P(structure) focused on secondary structure packing preferences; while several additional features had some discriminatory power on their own, they did not provide any additional discriminatory power when combined with the core scoring function. Our results, on both the training set and the independent decoy set of Park and Levitt (J Mol Biol 1996;258:367-392), suggest that this scoring function should contribute to the prediction of tertiary structure from knowledge of sequence and secondary structure. Proteins 1999;34:82-95. � 1999 Wiley-Liss, Inc.
BibTeX:
@article{Simons1999,
  author = {Simons, Kim T. and Ruczinski, Ingo and Kooperberg, Charles and Fox, Brian A. and Bystroff, Chris and Baker, David},
  title = {Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {1999},
  volume = {34},
  number = {1},
  pages = {82--95},
  doi = {http://dx.doi.org/10.1002/(SICI)1097-0134(19990101)34:1<82::AID-PROT7>3.0.CO;2-A}
}
Skolnick, J. Putting the pathway back into protein folding 2005 Proc Natl Acad Sci USA
Vol. 102(7), pp. 2265-2266 
article DOI  
BibTeX:
@article{Skolnick2005,
  author = {Jeffrey Skolnick},
  title = {Putting the pathway back into protein folding},
  journal = {Proc Natl Acad Sci USA},
  year = {2005},
  volume = {102},
  number = {7},
  pages = {2265--2266},
  doi = {http://dx.doi.org/10.1073/pnas.0500128102}
}
Skolnick, J., Zhang, Y., Arakaki, A., Kolinski, A., Boniecki, M., Szilágyi, A. and Kihara, D. TOUCHSTONE: A unified approach to protein structure prediction 2003 Proteins: Structure, Function, and Genetics
Vol. 53(S6), pp. 469-79 
article DOI  
BibTeX:
@article{Skolnick2003,
  author = {Skolnick, Jeffrey and Zhang, Yang and Arakaki, Adrian and Kolinski, Andrzej and Boniecki, Michal and Szilágyi, András and Kihara, Daisuke},
  title = {TOUCHSTONE: A unified approach to protein structure prediction},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {2003},
  volume = {53},
  number = {S6},
  pages = {469--79},
  doi = {http://dx.doi.org/10.1002/prot.10551}
}
Sokal, R.R. and Michener, C.D. A Statistical Method for Evaluating Systematic Relationships 1958 The University of Kansas Science Bulletin
Vol. 38, pp. 1409-1438 
article URL 
Abstract: Abstract. Starting with correlation coefficients (based on numerous characters) among species of a systematic unit, the authors developed a method for grouping species, and regrouping the resultant assemblages, to form a classificatory hierarchy most easily expressed as a treelike diagram of relationships. The details of the method are described, using as an example a group of bees. The resulting classification was similar to that previously established by classical systematic methods, although some taxonomic changes were made in view of the new light thrown on relationships. The method is time consuming, although practical in isolated cases, with punched-card machines such as were used; it becomes generally practical with increasingly widely available digital computers.
BibTeX:
@article{Sokal1958,
  author = {Sokal, Robert R. and Michener, Charles D.},
  title = {A Statistical Method for Evaluating Systematic Relationships},
  journal = {The University of Kansas Science Bulletin},
  year = {1958},
  volume = {38},
  pages = {1409--1438},
  url = {http://archive.org/details/cbarchive_121235_astatisticalmethodforevaluatin1902}
}
Sorlin, S. and Solnon, C. Reactive tabu search for measuring graph similarity 2005 5th IAPR-TC-15 workshop on Graph-based Representations in Pattern Recognition, pp. 172-182  inproceedings URL 
BibTeX:
@inproceedings{Sorlin2005,
  author = {Sébastien Sorlin and Christine Solnon},
  title = {Reactive tabu search for measuring graph similarity},
  booktitle = {5th IAPR-TC-15 workshop on Graph-based Representations in Pattern Recognition},
  publisher = {Springer-Verlag},
  year = {2005},
  pages = {172-182},
  url = {http://liris.cnrs.fr/publis/?id=1525}
}
Stasko, J., Catrambone, R., Guzdial, M. and McDonald, K. An evaluation of space-filling information visualizations for depicting hierarchical structures 2000 International Journal of Human-Computer Studies
Vol. 53(5), pp. 663-694 
article DOI  
Abstract: A variety of information visualization tools have been developed recently, but relatively little effort has been made to evaluate the effectiveness and utility of the tools. This article describes results from two empirical studies of two visualization tools for depicting hierarchies, in particular, computer file and directory structures. The two tools examined implement space-filling methodologies, one rectangular, the Treemap method, and one circular, the Sunburst method. Participants performed typical file/directory search and analysis tasks using the two tools. In general, performance trends favored the Sunburst tool with respect to correct task performance, particularly on initial use. Performance with Treemap tended to improve over time and use, suggesting a greater learning cost that was partially recouped over time. Each tool afforded somewhat different search strategies, which also appeared to influence performance. Finally, participants strongly preferred the Sunburst tool, citing better ability to convey structure and hierarchy.
BibTeX:
@article{Stasko2000,
  author = {Stasko, John and Catrambone, Richard and Guzdial, Mark and McDonald, Kevin},
  title = {An evaluation of space-filling information visualizations for depicting hierarchical structures},
  journal = {International Journal of Human-Computer Studies},
  year = {2000},
  volume = {53},
  number = {5},
  pages = {663--694},
  doi = {http://dx.doi.org/10.1006/ijhc.2000.0420}
}
Steuer, R., Kurths, J., Daub, C.O., Weise, J. and Selbig, J. The mutual information: Detecting and evaluating dependencies between variables 2002 Bioinformatics
Vol. 18(suppl2), pp. S231-240 
article  
Abstract: Motivation: Clustering co-expressed genes usually requires the definition of `distance' or `similarity' between measured datasets, the most common choices being Pearson correlation or Euclidean distance. With the size of available datasets steadily increasing, it has become feasible to consider other, more general, definitions as well. One alternative, based on information theory, is the mutual information, providing a general measure of dependencies between variables. While the use of mutual information in cluster analysis and visualization of large-scale gene expression data has been suggested previously, the earlier studies did not focus on comparing different algorithms to estimate the mutual information from finite data. Results: Here we describe and review several approaches to estimate the mutual information from finite datasets. Our findings show that the algorithms used so far may be quite substantially improved upon. In particular when dealing with small datasets, finite sample effects and other sources of potentially misleading results have to be taken into account. Contact: steuer@agnld.uni-potsdam.de
BibTeX:
@article{Steuer2002,
  author = {Steuer, R. and Kurths, J. and Daub, C. O. and Weise, J. and Selbig, J.},
  title = {The mutual information: Detecting and evaluating dependencies between variables},
  journal = {Bioinformatics},
  year = {2002},
  volume = {18},
  number = {suppl2},
  pages = {S231--240}
}
Stout, M., Bacardit, J., Hirst, J. and Krasnogor, N. Prediction of Residue Exposure and Contact Number for Simplified HP Lattice Model Proteins Using Learning Classifier Systems 2006 7th International FLINS Conference on Applied Artificial Intelligence  conference URL 
BibTeX:
@conference{Stout2006,
  author = {Stout, M. and Bacardit, J. and Hirst, J.D. and Krasnogor, N.},
  title = {Prediction of Residue Exposure and Contact Number for Simplified HP Lattice Model Proteins Using Learning Classifier Systems},
  booktitle = {7th International FLINS Conference on Applied Artificial Intelligence},
  year = {2006},
  url = {http://citeseer.ist.psu.edu/stout06prediction.html}
}
Stout, M., Bacardit, J., Hirst, J., Krasnogor, N. and Blazewicz, J. From HP Lattice Models to Real Proteins: Coordination Number Prediction using Learning Classifier Systems 2006 4th European Workshop on Evolutionary Computation and Machine Learning in Bioinformatics  conference URL 
BibTeX:
@conference{Stout2006a,
  author = {Stout, M. and Bacardit, J. and Hirst, J.D. and Krasnogor, N. and Blazewicz, J.},
  title = {From HP Lattice Models to Real Proteins: Coordination Number Prediction using Learning Classifier Systems},
  booktitle = {4th European Workshop on Evolutionary Computation and Machine Learning in Bioinformatics},
  year = {2006},
  url = {http://citeseer.ist.psu.edu/stout06from.html}
}
Stout, M., Bacardit, J., Hirst, J., Smith, R. and Krasnogor, N. Prediction of topological contacts in proteins using learning classifier systems 2009 Soft Computing - A Fusion of Foundations, Methodologies and Applications
Vol. 13(3), pp. 245-258 
article DOI  
Abstract: Abstract&nbsp;&nbsp;Evolutionary based data mining techniques are increasingly applied to problems in the bioinformatics domain. We investigate an important aspect of predicting the folded 3D structure of proteins from their unfolded residue sequence using evolutionary based machine learning techniques. Our approach is to predict specific features of residues in folded protein chains, in particular features derived from the Delaunay tessellations, Gabriel graphs and relative neighborhood graphs as well as minimum spanning trees. Several standard machine learning algorithms were compared to a state-of-the-art learning method, a learning classifier system (LCS), that is capable of generating compact and interpretable rule sets. Predictions were performed for various degrees of precision using a range of experimental parameters. Examples of the rules obtained are presented. The LCS produces results with good predictive performance and generates competent yet simple and interpretable classification rules.
BibTeX:
@article{Stout2008a,
  author = {Stout, Michael and Bacardit, Jaume and Hirst, Jonathan and Smith, Robert and Krasnogor, Natalio},
  title = {Prediction of topological contacts in proteins using learning classifier systems},
  journal = {Soft Computing - A Fusion of Foundations, Methodologies and Applications},
  year = {2009},
  volume = {13},
  number = {3},
  pages = {245--258},
  doi = {http://dx.doi.org/10.1007/s00500-008-0318-8}
}
Stout, M., Bacardit, J., Hirst, J.D. and Krasnogor, N. Prediction of recursive convex hull class assignments for protein residues 2008 Bioinformatics
Vol. 24(7), pp. 916-923 
article DOI  
Abstract: Motivation: We introduce a new method for designating the location of residues in folded protein structures based on the recursive convex hull (RCH) of a point set of atomic coordinates. The RCH can be calculated with an efficient and parameterless algorithm. Results: We show that residue RCH class contains information complementary to widely studied measures such as solvent accessibility (SA), residue depth (RD) and to the distance of residues from the centroid of the chain, the residues' exposure (Exp). RCH is more conserved for related structures across folds and correlates better with changes in thermal stability of mutants than the other measures. Further, we assess the predictability of these measures using three types of machine-learning technique: decision trees (C4.5), Naive Bayes and Learning Classifier Systems (LCS) showing that RCH is more easily predicted than the other measures. As an exemplar application of predicted RCH class (in combination with other measures), we show that RCH is potentially helpful in improving prediction of residue contact numbers (CN). Contact: nxk@cs.nott.ac.uk Supplementary Information: For Supplementary data please refer to Datasets: www.infobiotic.net/datasets, RCH Prediction Servers: www.infobiotic.net
BibTeX:
@article{Stout2008,
  author = {Stout, Michael and Bacardit, Jaume and Hirst, Jonathan D. and Krasnogor, Natalio},
  title = {Prediction of recursive convex hull class assignments for protein residues},
  journal = {Bioinformatics},
  year = {2008},
  volume = {24},
  number = {7},
  pages = {916--923},
  doi = {http://dx.doi.org/10.1093/bioinformatics/btn050}
}
Strickland, D.M., Barnes, E. and Sokol, J.S. Optimal Protein Structure Alignment Using Maximum Cliques 2005 OPERATIONS RESEARCH
Vol. 53(3), pp. 389-402 
article DOI  
Abstract: In biology, the protein structure alignment problem answers the question of how similar two proteins are. Proteins with strong physical similarities in their tertiary (folded) structure often have similar functions, so understanding physical similarity could be a key to developing protein-based medical treatments. One of the models for protein structure alignment is the maximum contact map overlap (CMO) model. The CMO model of protein structure alignment can be cast as a maximum clique problem on an appropriately defined graph. We exploit properties of these protein-based maximum clique problems to develop specialized preprocessing techniques and show how they can be used to more quickly solve contact map overlap instances to optimality.
BibTeX:
@article{Strickland2005,
  author = {Strickland, Dawn M. and Barnes, Earl and Sokol, Joel S.},
  title = {Optimal Protein Structure Alignment Using Maximum Cliques},
  journal = {OPERATIONS RESEARCH},
  year = {2005},
  volume = {53},
  number = {3},
  pages = {389--402},
  doi = {http://dx.doi.org/10.1287/opre.1040.0189}
}
Summa, C.M., Levitt, M. and DeGrado, W.F. An Atomic Environment Potential for use in Protein Structure Prediction 2005 Journal of Molecular Biology
Vol. 352(4), pp. 986-1001 
article DOI  
BibTeX:
@article{Summa2005,
  author = {Summa, Christopher M. and Levitt, Michael and DeGrado, William F.},
  title = {An Atomic Environment Potential for use in Protein Structure Prediction},
  journal = {Journal of Molecular Biology},
  year = {2005},
  volume = {352},
  number = {4},
  pages = {986--1001},
  doi = {http://dx.doi.org/10.1016/j.jmb.2005.07.054}
}
Syswerda, G. A Study of Reproduction in Generational and Steady State Genetic Algorithms 1990 Foundations of Genetic Algorithms, pp. 94-101  inproceedings  
BibTeX:
@inproceedings{Syswerda1990,
  author = {Gilbert Syswerda},
  title = {A Study of Reproduction in Generational and Steady State Genetic Algorithms},
  booktitle = {Foundations of Genetic Algorithms},
  publisher = {Morgan Kaufmann},
  year = {1990},
  pages = {94-101}
}
Tan, C.-W. and Jones, D. Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction 2008 BMC Bioinformatics
Vol. 9(1), pp. 94 
article DOI  
Abstract: BACKGROUND:We present a novel method of protein fold decoy discrimination using machine learning, more specifically using neural networks. Here, decoy discrimination is represented as a machine learning problem, where neural networks are used to learn the native-like features of protein structures using a set of positive and negative training examples. A set of native protein structures provides the positive training examples, while negative training examples are simulated decoy structures obtained by reversing the sequences of native structures. Various features are extracted from the training dataset of positive and negative examples and used as inputs to the neural networks.RESULTS:Results have shown that the best performing neural network is the one that uses input information comprising of PSI-BLAST 1 profiles of residue pairs, pairwise distance and the relative solvent accessibilities of the residues. This neural network is the best among all methods tested in discriminating the native structure from a set of decoys for all decoy datasets tested.CONCLUSION:This method is demonstrated to be viable, and furthermore evolutionary information is successfully used in the neural networks to improve decoy discrimination.
BibTeX:
@article{Tan2008,
  author = {Tan, Ching-Wai and Jones, David},
  title = {Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction},
  journal = {BMC Bioinformatics},
  year = {2008},
  volume = {9},
  number = {1},
  pages = {94},
  doi = {http://dx.doi.org/10.1186/1471-2105-9-94}
}
Tsai, J., Bonneau, R., Morozov, A.V., Kuhlman, B., Rohl, C.A. and Baker, D. An improved protein decoy set for testing energy functions for protein structure prediction 2003 Proteins Structure Function and Bioinformatics
Vol. 53(1), pp. 76 
article DOI  
BibTeX:
@article{Tsai2003,
  author = {Jerry Tsai and Richard Bonneau and Alexandre V. Morozov and Brian Kuhlman and Carol A. Rohl and David Baker},
  title = {An improved protein decoy set for testing energy functions for protein structure prediction},
  journal = {Proteins Structure Function and Bioinformatics},
  year = {2003},
  volume = {53},
  number = {1},
  pages = {76},
  doi = {http://dx.doi.org/10.1002/prot.10454}
}
Unger, R. Applications of Evolutionary Computation in Chemistry 2004
Vol. 110, pp. 153-175 
inbook DOI  
Abstract: Predicting the three-dimensional structure of proteins from their linear sequence is one of the major challenges in modern biology. It is widely recognized that one of the major obstacles in addressing this question is that the ldquostandardrdquo computational approaches are not powerful enough to search for the correct structure in the huge conformational space. Genetic algorithms, a cooperative computational method, have been successful in many difficult computational tasks. Thus, it is not surprising that in recent years several studies were performed to explore the possibility of using genetic algorithms to address the protein structure prediction problem. In this review, a general framework of how genetic algorithms can be used for structure prediction is described. Using this framework, the significant studies that were published in recent years are discussed and compared. Applications of genetic algorithms to the related question of protein alignments are also mentioned. The rationale of why genetic algorithms are suitable for protein structure prediction is presented, and future improvements that are still needed are discussed.
BibTeX:
@inbook{Unger2004,
  author = {Ron Unger},
  title = {Applications of Evolutionary Computation in Chemistry},
  publisher = {Springer},
  year = {2004},
  volume = {110},
  pages = {153--175},
  doi = {http://dx.doi.org/10.1007/b13936}
}
Van Der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A.E. and Berendsen, H.J.C. GROMACS: Fast, flexible, and free 2005 Journal of Computational Chemistry
Vol. 26(16), pp. 1701-18 
article DOI  
BibTeX:
@article{Van2005,
  author = {Van Der Spoel, David and Lindahl, Erik and Hess, Berk and Groenhof, Gerrit and Mark, Alan E. and Berendsen, Herman J. C.},
  title = {GROMACS: Fast, flexible, and free},
  journal = {Journal of Computational Chemistry},
  year = {2005},
  volume = {26},
  number = {16},
  pages = {1701--18},
  doi = {http://dx.doi.org/10.1002/jcc.20291}
}
Vanneschi, L., Tomassini, M., Collard, P. and Clergue, M. Fitness Distance Correlation in Structural Mutation Genetic Programming 2003
Vol. 2610EuroGP, pp. 455-464 
inproceedings DOI  
Abstract: A new kind of mutation for genetic programming based on the structural distance operators for trees is presented in this paper. We firstly describe a new genetic programming process based on these operators (we call it structural mutation genetic programming). Then we use structural distance to calculate the fitness distance correlation coefficient and we show that this coefficient is a reasonable measure to express problem difficulty for structural mutation genetic programming for the considered set of problems, i.e. unimodal trap functions, royal trees and MAX problem.
BibTeX:
@inproceedings{Vanneschi2003,
  author = {Leonardo Vanneschi and Marco Tomassini and Philippe Collard and Manuel Clergue},
  title = {Fitness Distance Correlation in Structural Mutation Genetic Programming},
  booktitle = {EuroGP},
  publisher = {Springer},
  year = {2003},
  volume = {2610},
  pages = {455-464},
  doi = {http://dx.doi.org/10.1007/3-540-36599-0_43}
}
Vapnik, V.N. The nature of statistical learning theory 1995   book  
BibTeX:
@book{Vapnik1995,
  author = {Vapnik, Vladimir N.},
  title = {The nature of statistical learning theory},
  publisher = {Springer-Verlag New York},
  year = {1995}
}
Wagner, G.P. and Altenberg, L. Complex Adaptations and the Evolution of Evolvability 1996 Evolution
Vol. 50(3), pp. 967-97 
article URL 
BibTeX:
@article{Wagner1996,
  author = {Gunter P. Wagner and Lee Altenberg},
  title = {Complex Adaptations and the Evolution of Evolvability},
  journal = {Evolution},
  year = {1996},
  volume = {50},
  number = {3},
  pages = {967--97},
  url = {http://dynamics.org/Altenberg/PAPERS/CAEE/}
}
Wallin, S., Farwer, J. and Bastolla, U. Testing similarity measures with continuous and discrete protein models 2003 Proteins: Structure, Function, and Genetics
Vol. 50(1), pp. 144-157 
article DOI  
Abstract: There are many ways to define the distance between two protein structures, thus assessing their similarity. Here, we investigate and compare the properties of five different distance measures, including the standard root-mean-square deviation (cRMSD). The performance of these measures is studied from different perspectives with two different protein models, one continuous and the other discrete. Using the continuous model, we examine the correlation between energy and native distance, and the ability of the different measures to discriminate between the two possible topologies of a three-helix bundle. Using the discrete model, we perform fits to real protein structures by minimizing different distance measures. The properties of the fitted structures are found to depend strongly on the distance measure used and the scale considered. We find that the cRMSD measure very effectively describes long-range features but is less effective with short-range features, and it correlates weakly with energy. A stronger correlation with energy and a better description of short-range properties is obtained when we use measures based on intramolecular distances. Proteins 2003;50:144-157. � 2002 Wiley-Liss, Inc.
BibTeX:
@article{Wallin2003,
  author = {Wallin, Stefan and Farwer, Jochen and Bastolla, Ugo},
  title = {Testing similarity measures with continuous and discrete protein models},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {2003},
  volume = {50},
  number = {1},
  pages = {144--157},
  doi = {http://dx.doi.org/10.1002/prot.10271}
}
Wattenberg, M. Visual Exploration of Multivariate Graphs 2006 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 811-819  inproceedings DOI  
BibTeX:
@inproceedings{Wattenberg2006,
  author = {Wattenberg, Martin},
  title = {Visual Exploration of Multivariate Graphs},
  booktitle = {Proceedings of the SIGCHI Conference on Human Factors in Computing Systems},
  year = {2006},
  pages = {811--819},
  doi = {http://dx.doi.org/10.1145/1124772.1124891}
}
Wernisch, L., Hery, S. and Wodak, S.J. Automatic Protein Design with All Atom Force-fields by Exact and Heuristic Optimization 2000 Journal of Molecular Biology
Vol. 301, pp. 713-736 
article URL 
Abstract: A fully automatic procedure for predicting the amino acid sequences compatible with a given target structure is described. It is based on the CHARMM package, and uses an all atom force-field and rotamer libraries to describe and evaluate side-chain types and conformations. Sequences are ranked by a quantity akin to the free energy of folding, which incorporates hydration effects. Exact (Branch and Bound) and heuristic optimisation procedures are used to identifying highly scoring sequences from an astronomical number of possibilities. These sequences include the minimum free energy sequence, as well as all amino acid sequences whose free energy lies within a specified window from the minimum. Several applications of our procedure are illustrated. Prediction of side-chain conformations for a set of ten proteins yields results comparable to those of established side-chain placement programs. Applications to sequence optimisation comprise the re-design of the protein cores of c-Crk SH3 domain, the B1 domain of protein G and Ubiquitin, and of surface residues of the SH3 domain. In all calculations, no restrictions are imposed on the amino acid composition and identical parameter settings are used for core and surface residues. The best scoring sequences for the protein cores are virtually identical to wild-type. They feature no more than one to three mutations in a total of 11-16 variable positions. Tests suggest that this is due to the balance between various contributions in the force-field rather than to overwhelming influence from packing constraints. The effectiveness of our force-field is further supported by the sequence predictions for surface residues of the SH3 domain. More mutations are predicted than in the core, seemingly in order to optimise the network of complementary interactions between polar and charged groups. This appears to be an important energetic requirement in absence of the partner molecules with which the SH3 domain interacts, which were not included in the calculations. Finally, a detailed comparison between the sequences generated by the heuristic and exact optimisation algorithms, commends a note of caution concerning the efficiency of heuristic procedures in exploring sequence space.
BibTeX:
@article{Wernisch2000,
  author = {L. Wernisch and S. Hery and S. J. Wodak},
  title = {Automatic Protein Design with All Atom Force-fields by Exact and Heuristic Optimization},
  journal = {Journal of Molecular Biology},
  year = {2000},
  volume = {301},
  pages = {713-736},
  url = {http://www.ingentaconnect.com/content/ap/mb/2000/00000301/00000003/art03984}
}
Wheelan, S.J., Marchler-Bauer, A. and Bryant, S.H. Domain size distributions can predict domain boundaries 2000 Bioinformatics
Vol. 16(7), pp. 613-618 
article DOI  
Abstract: Motivation: The sizes of protein domains observed in the 3D-structure database follow a surprisingly narrow distribution. Structural domains are furthermore formed from a single-chain continuous segment in over 80% of instances. These observations imply that some choices of domain boundaries on an otherwise uncharacterized sequence are more likely than others, based solely on the size and segment number of predicted domains. This property might be used to guess the locations of protein domain boundaries. Results: To test this possibility we enumerate putative domain boundaries and calculate their relative likelihood under a probability model that considers only the size and segment number of predicted domains. We ask, in a cross-validated test using sequences with known 3D structure, whether the most likely guesses agree with the observed domain structure. We find that domain boundary predictions are surprisingly successful for sequences up to 400 residues long and that guessing domain boundaries in this way can improve the sensitivity of threading analysis. Availability: The DGS algorithm, for Domain Guess by Size', is available as a web service at http://www.ncbi.nlm.nih.gov/dgs. This site also provides the DGS source code. Contact: bryant@ncbi.nlm.nih.gov
BibTeX:
@article{Wheelan2000,
  author = {Wheelan, S. J. and Marchler-Bauer, A. and Bryant, S. H.},
  title = {Domain size distributions can predict domain boundaries},
  journal = {Bioinformatics},
  year = {2000},
  volume = {16},
  number = {7},
  pages = {613--618},
  doi = {http://dx.doi.org/10.1093/bioinformatics/16.7.613}
}
Widera, P., Bacardit, J., Krasnogor, N., GarcMart C. and Lozano, M. Evolutionary symbolic discovery for bioinformatics, systems and synthetic biology 2010 Proceedings of the 12th annual Conference on Genetic and Evolutionary Computation (GECCO 2010), pp. 1991-1998  inproceedings DOI  
Abstract: Symbolic regression and modeling are tightly linked in many Bioinformatics, Systems and Synthetic Biology problems. In this paper we briefly overview two problems, and the approaches we have use to tackle them, that can be deemed to represent this entwining of regression and modeling, namely, the evolutionary discovery of (1) effective energy functions for protein structure prediction and (2) models that capture biological behavior at the gene, signaling and metabolic networks level. These problems are not, strictly speaking, "regression problems" but they do share several characteristics with the latter, namely, a symbolic representation of a solution is sought, this symbolic representation must be human understandable and the results obtained by the symbolic and human interpretable solution must fit the available data without over-learning.
BibTeX:
@inproceedings{Widera2010a,
  author = {Widera, Paweł and Bacardit, Jaume and Krasnogor, Natalio and GarcMart Carlos and Lozano, Manuel},
  title = {Evolutionary symbolic discovery for bioinformatics, systems and synthetic biology},
  booktitle = {Proceedings of the 12th annual Conference on Genetic and Evolutionary Computation (GECCO 2010)},
  publisher = {ACM},
  year = {2010},
  pages = {1991--1998},
  doi = {http://dx.doi.org/10.1145/1830761.1830842}
}
Widera, P., Garibaldi, J. and Krasnogor, N. GP challenge: evolving energy function for protein structure prediction 2010 Genetic Programming and Evolvable Machines
Vol. 11(1), pp. 61-88 
article DOI  
Abstract: Abstract&nbsp;&nbsp;One of the key elements in protein structure prediction is the ability to distinguish between good and bad candidate structures. This distinction is made by estimation of the structure energy. The energy function used in the best state-of-the-art automatic predictors competing in the most recent CASP (Critical Assessment of Techniques for Protein Structure Prediction) experiment is defined as a weighted sum of a set of energy terms designed by experts. We hypothesised that combining these terms more freely will improve the prediction quality. To test this hypothesis, we designed a genetic programming algorithm to evolve the protein energy function. We compared the predictive power of the best evolved function and a linear combination of energy terms featuring weights optimised by the Nelder–Mead algorithm. The GP based optimisation outperformed the optimised linear function. We have made the data used in our experiments publicly available in order to encourage others to further investigate this challenging problem by using GP and other methods, and to attempt to improve on the results presented here.
BibTeX:
@article{Widera2010,
  author = {Widera, Paweł and Garibaldi, Jonathan and Krasnogor, Natalio},
  title = {GP challenge: evolving energy function for protein structure prediction},
  journal = {Genetic Programming and Evolvable Machines},
  year = {2010},
  volume = {11},
  number = {1},
  pages = {61--88},
  doi = {http://dx.doi.org/10.1007/s10710-009-9087-0}
}
Widera, P., Garibaldi, J. and Krasnogor, N. Evolutionary design of the energy function for protein structure prediction 2009 IEEE Congress on Evolutionary Computation (CEC 2009), pp. 1305-1312  inproceedings DOI  
Abstract: Automatic protein structure predictors use the notion of energy to guide the search towards good candidate structures. The energy functions used by the state-of-the-art predictors are defined as a linear combination of several energy terms designed by human experts. We hypothesised that the energy based guidance could be more accurate if the terms were combined more freely. To test this hypothesis, we designed a genetic programming algorithm to evolve the protein energy function. Using several different fitness functions we examined the potential of the evolutionary approach on a set of candidate structures generated during the protein structure prediction process. Although our algorithms were able to improve over the random walk, the fitness of the best individuals was far from the optimum. We discuss the shortcomings of our initial algorithm design and the possible directions for further research.
BibTeX:
@inproceedings{Widera2009,
  author = {Widera, P. and Garibaldi, J.M. and Krasnogor, N.},
  title = {Evolutionary design of the energy function for protein structure prediction},
  booktitle = {IEEE Congress on Evolutionary Computation (CEC 2009)},
  year = {2009},
  pages = {1305--1312},
  doi = {http://dx.doi.org/10.1109/CEC.2009.4983095}
}
Widera, P. and Krasnogor, N. Protein Models Comparator: Scalable Bioinformatics Computing on the Google App Engine Platform 2011 CoRR
Vol. abs/1102.4293 
article URL 
BibTeX:
@article{Widera2011,
  author = {Paweł Widera and Natalio Krasnogor},
  title = {Protein Models Comparator: Scalable Bioinformatics Computing on the Google App Engine Platform},
  journal = {CoRR},
  year = {2011},
  volume = {abs/1102.4293},
  url = {http://arxiv.org/abs/1102.4293}
}
Wieser, M.E. Atomic weights of the elements 2005 (IUPAC Technical Report) 2006 Pure and Applied Chemistry
Vol. 78(11), pp. 2051-2066 
article DOI  
BibTeX:
@article{Wieser2006,
  author = {M. E. Wieser},
  title = {Atomic weights of the elements 2005 (IUPAC Technical Report)},
  journal = {Pure and Applied Chemistry},
  year = {2006},
  volume = {78},
  number = {11},
  pages = {2051-2066},
  doi = {http://dx.doi.org/10.1351/pac200678112051}
}
Wu, S., Skolnick, J. and Zhang, Y. Ab initio modeling of small proteins by iterative TASSER simulations. 2007 BMC Biol
Vol. 5(1), pp. 17 
article DOI  
Abstract: ABSTRACT: BACKGROUND: Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins. RESULTS: We developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Ca-root mean square deviation (RMSD) of 3.8A, with 6 of them having a Ca-RMSD < 2.5A. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Ca-RMSD < 2.5A. The average Ca-RMSD of the I-TASSER models was 3.9A, whereas it was 5.9A using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Ca-RMSD of 3.9A was obtained for the third benchmark, with seven cases having a Ca-RMSD < 2.5A. CONCLUSIONS: Our simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users (http://zhang.bioinformatics.ku.edu/I-TASSER).
BibTeX:
@article{Wu2007,
  author = {Sitao Wu and Jeffrey Skolnick and Yang Zhang},
  title = {Ab initio modeling of small proteins by iterative TASSER simulations.},
  journal = {BMC Biol},
  year = {2007},
  volume = {5},
  number = {1},
  pages = {17},
  doi = {http://dx.doi.org/10.1186/1741-7007-5-17}
}
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G., Ng, A., Liu, B., Yu, P., Zhou, Z.-H., Steinbach, M., Hand, D. and Steinberg, D. Top 10 algorithms in data mining 2008 Knowledge and Information Systems
Vol. 14(1), pp. 1-37 
article DOI  
Abstract: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k -Means, SVM, Apriori, EM, PageRank, AdaBoost, k-NN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.
BibTeX:
@article{Wu2008,
  author = {Wu, Xindong and Kumar, Vipin and Ross Quinlan, J. and Ghosh, Joydeep and Yang, Qiang and Motoda, Hiroshi and McLachlan, Geoffrey and Ng, Angus and Liu, Bing and Yu, Philip and Zhou, Zhi-Hua and Steinbach, Michael and Hand, David and Steinberg, Dan},
  title = {Top 10 algorithms in data mining},
  journal = {Knowledge and Information Systems},
  year = {2008},
  volume = {14},
  number = {1},
  pages = {1--37},
  doi = {http://dx.doi.org/10.1007/s10115-007-0114-2}
}
Xie, W. and Sahinidis, N.V. A Branch-and-Reduce Algorithm for the Contact Map Overlap Problem 2006
Vol. 3909RECOMB, pp. 516-529 
inproceedings DOI  
BibTeX:
@inproceedings{Xie2006,
  author = {Xie, Wei and Sahinidis, Nikolaos V.},
  title = {A Branch-and-Reduce Algorithm for the Contact Map Overlap Problem},
  booktitle = {RECOMB},
  publisher = {Springer},
  year = {2006},
  volume = {3909},
  pages = {516--529},
  doi = {http://dx.doi.org/10.1007/11732990_43}
}
Xie, W. and Sahinidis, N.V. A Reduction-Based Exact Algorithm for the Contact Map Overlap Problem 2007 Journal of Computational Biology
Vol. 14(5), pp. 637–654 
article DOI  
Abstract: Aligning proteins based on their structural similarity is a fundamental problem in molecular biology with applications in many settings, including structure classification, database search, function prediction, and assessment of folding prediction methods. Structural alignment can be done via several methods, including contact map overlap (CMO) maximization that aligns proteins in a way that maximizes the number of common residue contacts. In this paper, we develop a reduction-based exact algorithm for the CMO problem. Our approach solves CMO directly rather than after transformation to other combinatorial optimization problems. We exploit the mathematical structure of the problem in order to develop a number of efficient lower bounding, upper bounding, and reduction schemes. Computational experiments demonstrate that our algorithm runs significantly faster than existing exact algorithms and solves some hard CMO instances that were not solved in the past. In addition, the algorithm produces protein clusters that are in excellent agreement with the SCOP classification. An implementation of our algorithm is accessible as an on-line server at http://eudoxus.scs.uiuc.edu/cmos/cmos.html.
BibTeX:
@article{Xie2007,
  author = {Wei Xie And Nikolaos V. Sahinidis},
  title = {A Reduction-Based Exact Algorithm for the Contact Map Overlap Problem},
  journal = {Journal of Computational Biology},
  year = {2007},
  volume = {14},
  number = {5},
  pages = {637–654},
  doi = {http://dx.doi.org/10.1089/cmb.2007.R007}
}
Xu, J. and Berger, B. Fast and accurate algorithms for protein side-chain packing 2006 Journal of the ACM
Vol. 53(4), pp. 533-557 
article DOI  
BibTeX:
@article{Xu2006,
  author = {Jinbo Xu and Bonnie Berger},
  title = {Fast and accurate algorithms for protein side-chain packing},
  journal = {Journal of the ACM},
  publisher = {ACM Press},
  year = {2006},
  volume = {53},
  number = {4},
  pages = {533--557},
  doi = {http://dx.doi.org/10.1145/1162349.1162350}
}
Xu, J., Jiao, F. and Berger, B. A Parameterized Algorithm for Protein Structure Alignment 2007 Journal of Computational Biology
Vol. 14(5), pp. 564-577 
article DOI  
Abstract: This paper proposes a parameterized polynomial time approximation scheme (PTAS) for aligning two protein structures, in the case where one protein structure is represented by a contact map graph and the other by a contact map graph or a distance matrix. If the sequential order of alignment is not required, the time complexity is polynomial in the protein size and exponential with respect to two parameters Du/Dl and Dc/Dl, which usually can be treated as constants. In particular, Du is the distance threshold determining if two residues are in contact or not, Dc is the maximally allowed distance between two matched residues after two proteins are superimposed, and Dl is the minimum inter-residue distance in a typical protein. This result clearly demonstrates that the computational hardness of the contact map based protein structure alignment problem is related not to protein size but to several parameters modeling the problem. The result is achieved by decomposing the protein structure using tree decomposition and discretizing the rigid-body transformation space. Preliminary experimental results indicate that on a Linux PC, it takes from ten minutes to one hour to align two proteins with approximately 100 residues.
BibTeX:
@article{Xu2007,
  author = {Xu, Jinbo and Jiao, Feng and Berger, Bonnie},
  title = {A Parameterized Algorithm for Protein Structure Alignment},
  journal = {Journal of Computational Biology},
  publisher = {Mary Ann Liebert, Inc., publishers},
  year = {2007},
  volume = {14},
  number = {5},
  pages = {564--577},
  doi = {http://dx.doi.org/10.1089/cmb.2007.R003}
}
Ye, Y. and Godzik, A. Flexible structure alignment by chaining aligned fragment pairs allowing twists 2003 Bioinformatics
Vol. 19(suppl_2), pp. ii246-255 
article DOI  
Abstract: Motivation: Protein structures are flexible and undergo structural rearrangements as part of their function, and yet most existing protein structure comparison methods treat them as rigid bodies, which may lead to incorrect alignment. Results: We have developed the Flexible structure AlignmenT by Chaining AFPs (Aligned Fragment Pairs) with Twists (FATCAT), a new method for structural alignment of proteins. The FATCAT approach simultaneously addresses the two major goals of flexible structure alignment; optimizing the alignment and minimizing the number of rigid-body movements (twists) around pivot points (hinges) introduced in the reference protein. In contrast, currently existing flexible structure alignment programs treat the hinge detection as a post-process of a standard rigid body alignment. We illustrate the advantages of the FATCAT approach by several examples of comparison between proteins known to adopt different conformations, where the FATCAT algorithm achieves more accurate structure alignments than current methods, while at the same time introducing fewer hinges. Contacts: adam@burnham.org
BibTeX:
@article{Ye2003,
  author = {Ye, Yuzhen and Godzik, Adam},
  title = {Flexible structure alignment by chaining aligned fragment pairs allowing twists},
  journal = {Bioinformatics},
  year = {2003},
  volume = {19},
  number = {suppl_2},
  pages = {ii246--255},
  doi = {http://dx.doi.org/10.1093/bioinformatics/btg1086}
}
Yona, G. and Kedem, K. The URMS-RMS Hybrid Algorithm for Fast and Sensitive Local Protein Structure Alignment 2005 Journal of Computational Biology
Vol. 12(1), pp. 12-32 
article DOI  
Abstract: We present an efficient and sensitive hybrid algorithm for local structure alignment of a pair of 3D protein structures. The hybrid algorithm employs both the URMS (unit-vector root mean squared) metric and the RMS metric. Our algorithm searches efficiently the transformation space using a fast screening protocol; initial transformations (rotations) are identified using the URMS algorithm. These rotations are then clustered and an RMS-based dynamic programming algorithm is invoked to find the maximal local similarities for representative rotations of the clusters. Statistical significance of the alignments is estimated using a model that accounts for both the score of the match and the RMS. We tested our algorithm over the SCOP classification of protein domains. Our algorithm performs very well; its main advantages are that (1) it combines the advantages of the RMS and the URMS metrics, (2) it searches extensively the transformation space, (3) it detects complex similarities and structural repeats, and (4) its results are symmetric. The software is available for download at biozon.org/ftp/software/urms/.
BibTeX:
@article{Yona2005,
  author = {Yona, Golan and Kedem, Klara},
  title = {The URMS-RMS Hybrid Algorithm for Fast and Sensitive Local Protein Structure Alignment},
  journal = {Journal of Computational Biology},
  year = {2005},
  volume = {12},
  number = {1},
  pages = {12--32},
  doi = {http://dx.doi.org/10.1089/cmb.2005.12.12}
}
Yu, H.-F., Huang, F.-L. and Lin, C.-J. Dual coordinate descent methods for logistic regression and maximum entropy models 2011 Machine Learning
Vol. 85(1-2), pp. 41-75 
article DOI  
Abstract: Most optimization methods for logistic regression or maximum entropy solve the primal problem. They range from iterative scaling, coordinate descent, quasi-Newton, and truncated Newton. Less efforts have been made to solve the dual problem. In contrast, for linear support vector machines (SVM), methods have been shown to be very effective for solving the dual problem. In this paper, we apply coordinate descent methods to solve the dual form of logistic regression and maximum entropy. Interestingly, many details are different from the situation in linear SVM. We carefully study the theoretical convergence as well as numerical issues. The proposed method is shown to be faster than most state of the art methods for training logistic regression and maximum entropy.
BibTeX:
@article{Yu2011,
  author = {Yu, Hsiang-Fu and Huang, Fang-Lan and Lin, Chih-Jen},
  title = {Dual coordinate descent methods for logistic regression and maximum entropy models},
  journal = {Machine Learning},
  year = {2011},
  volume = {85},
  number = {1-2},
  pages = {41--75},
  doi = {http://dx.doi.org/10.1007/s10994-010-5221-8}
}
Zemla, A. LGA: a method for finding 3D similarities in protein structures 2003 Nucl. Acids Res.
Vol. 31(13), pp. 3370-3374 
article DOI  
Abstract: We present the LGA (Local-Global Alignment) method, designed to facilitate the comparison of protein structures or fragments of protein structures in sequence dependent and sequence independent modes. The LGA structure alignment program is available as an online service at http://PredictionCenter.llnl.gov/local/lga. Data generated by LGA can be successfully used in a scoring function to rank the level of similarity between two structures and to allow structure classification when many proteins are being analyzed. LGA also allows the clustering of similar fragments of protein structures.
BibTeX:
@article{Zemla2003,
  author = {Zemla, Adam},
  title = {LGA: a method for finding 3D similarities in protein structures},
  journal = {Nucl. Acids Res.},
  year = {2003},
  volume = {31},
  number = {13},
  pages = {3370--3374},
  doi = {http://dx.doi.org/10.1093/nar/gkg571}
}
Zemla, A., Venclovas, C., Moult, J. and Fidelis, K. Processing and analysis of CASP3 protein structure predictions 1999 Proteins: Structure, Function, and Genetics
Vol. 37(S3), pp. 22-29 
article DOI  
BibTeX:
@article{Zemla1999,
  author = {Zemla, Adam and Venclovas, Ceslovas and Moult, John and Fidelis, Krzysztof},
  title = {Processing and analysis of CASP3 protein structure predictions},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {1999},
  volume = {37},
  number = {S3},
  pages = {22--29},
  doi = {http://dx.doi.org/10.1002/(SICI)1097-0134(1999)37:3+<22::AID-PROT5>3.0.CO;2-W}
}
Zhang, J. and Zhang, Y. A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction 2010 PLoS ONE
Vol. 5(10), pp. e15386 
article DOI  
Abstract: Background: An accurate potential function is essential to attack protein folding and structure prediction problems. The key to developing efficient knowledge-based potential functions is to design reference states that can appropriately counteract generic interactions. The reference states of many knowledge-based distance-dependent atomic potential functions were derived from non-interacting particles such as ideal gas, however, which ignored the inherent sequence connectivity and entropic elasticity of proteins.
Methodology: We developed a new pair-wise distance-dependent, atomic statistical potential function (RW), using an ideal random-walk chain as reference state, which was optimized on CASP models and then benchmarked on nine structural decoy sets. Second, we incorporated a new side-chain orientation-dependent energy term into RW (RWplus) and found that the side-chain packing orientation specificity can further improve the decoy recognition ability of the statistical potential.
Significance: RW and RWplus demonstrate a significantly better ability than the best performing pair-wise distance-dependent atomic potential functions in both native and near-native model selections. It has higher energy-RMSD and energy-TM-score correlations compared with other potentials of the same type in real-life structure assembly decoys. When benchmarked with a comprehensive list of publicly available potentials, RW and RWplus shows comparable performance to the state-of-the-art scoring functions, including those combining terms from multiple resources. These data demonstrate the usefulness of random-walk chain as reference states which correctly account for sequence connectivity and entropic elasticity of proteins. It shows potential usefulness in structure recognition and protein folding simulations. The RW and RWplus potentials, as well as the newly generated I-TASSER decoys, are freely available at http://zhanglab.ccmb.med.umich.edu/RW.
BibTeX:
@article{Zhang2010,
  author = {Zhang, Jian and Zhang, Yang},
  title = {A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction},
  journal = {PLoS ONE},
  publisher = {Public Library of Science},
  year = {2010},
  volume = {5},
  number = {10},
  pages = {e15386},
  doi = {http://dx.doi.org/10.1371/journal.pone.0015386}
}
Zhang, Y. I-TASSER: Fully automated protein structure prediction in CASP8 2009 Proteins: Structure, Function, and Bioinformatics
Vol. 77(S9), pp. 100-113 
article DOI  
Abstract: The I-TASSER algorithm for 3D protein structure prediction was tested in CASP8, with the procedure fully automated in both the Server and Human sections. The quality of the server models is close to that of human ones but the human predictions incorporate more diverse templates from other servers which improve the human predictions in some of the distant homology targets. For the first time, the sequence-based contact predictions from machine learning techniques are found helpful for both template-based modeling (TBM) and template-free modeling (FM). In TBM, although the accuracy of the sequence based contact predictions is on average lower than that from template-based ones, the novel contacts in the sequence-based predictions, which are complementary to the threading templates in the weakly or unaligned regions, are important to improve the global and local packing in these regions. Moreover, the newly developed atomic structural refinement algorithm was tested in CASP8 and found to improve the hydrogen-bonding networks and the overall TM-score, which is mainly due to its ability of removing steric clashes so that the models can be generated from cluster centroids. Nevertheless, one of the major issues of the I-TASSER pipeline is the model selection where the best models could not be appropriately recognized when the correct templates are detected only by the minority of the threading algorithms. There are also problems related with domain-splitting and mirror image recognition which mainly influences the performance of I-TASSER modeling in the FM-based structure predictions. Proteins 2009. � 2009 Wiley-Liss, Inc.
BibTeX:
@article{Zhang2009,
  author = {Zhang, Yang},
  title = {I-TASSER: Fully automated protein structure prediction in CASP8},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2009},
  volume = {77},
  number = {S9},
  pages = {100--113},
  doi = {http://dx.doi.org/10.1002/prot.22588}
}
Zhang, Y. Progress and challenges in protein structure prediction 2008 Current Opinion in Structural Biology
Vol. 18(3)Nucleic acids / Sequences and topology, pp. 342-348 
article DOI  
Abstract: Depending on whether similar structures are found in the PDB library, the protein structure prediction can be categorized into template-based modeling and free modeling. Although threading is an efficient tool to detect the structural analogs, the advancements in methodology development have come to a steady state. Encouraging progress is observed in structure refinement which aims at drawing template structures closer to the native; this has been mainly driven by the use of multiple structure templates and the development of hybrid knowledge-based and physics-based force fields. For free modeling, exciting examples have been witnessed in folding small proteins to atomic resolutions. However, predicting structures for proteins larger than 150 residues still remains a challenge, with bottlenecks from both force field and conformational search.
BibTeX:
@article{Zhang2008,
  author = {Zhang, Yang},
  title = {Progress and challenges in protein structure prediction},
  booktitle = {Nucleic acids / Sequences and topology},
  journal = {Current Opinion in Structural Biology},
  year = {2008},
  volume = {18},
  number = {3},
  pages = {342--348},
  doi = {http://dx.doi.org/10.1016/j.sbi.2008.02.004}
}
Zhang, Y. CASP7 server ranking for FM category (TM-Score) 2006   webpage URL 
BibTeX:
@webpage{URL_CASP7-rank_Zhang,
  author = {Yang Zhang},
  title = {CASP7 server ranking for FM category (TM-Score)},
  year = {2006},
  url = {http://zhang.bioinformatics.ku.edu/casp7/24.html}
}
Zhang, Y., Arakaki, A.K. and Skolnick, J. TASSER: an automated method for the prediction of protein tertiary structures in CASP6. 2005 Proteins
Vol. 61 Suppl 7, pp. 91-8 
article DOI  
Abstract: The recently developed TASSER (Threading/ASSembly/Refinement) method is applied to predict the tertiary structures of all CASP6 targets. TASSER is a hierarchical approach that consists of template identification by the threading program PROSPECTOR_3, followed by tertiary structure assembly via rearranging continuous template fragments. Assembly occurs using parallel hyperbolic Monte Carlo sampling under the guide of an optimized, reduced force field that includes knowledge-based statistical potentials and spatial restraints extracted from threading alignments. Models are automatically selected from the Monte Carlo trajectories in the low-temperature replicas using the clustering program SPICKER. For all 90 CASP targets/domains, PROSPECTOR_3 generates initial alignments with an average root-mean-square deviation (RMSD) to native of 8.4 A with 79% coverage. After TASSER reassembly, the average RMSD decreases to 5.4 A over the same aligned residues; the overall cumulative TM-score increases from 39.44 to 52.53. Despite significant improvements over the PROSPECTOR_3 template alignment observed in all target categories, the overall quality of the final models is essentially dictated by the quality of threading templates: The average TM-scores of TASSER models in the three categories are, respectively, 0.79 [comparative modeling (CM), 43 targets/domains], 0.47 [fold recognition (FR), 37 targets/domains], and 0.30 [new fold (NF), 10 targets/domains]. This highlights the need to develop novel (or improved) approaches to identify very distant targets as well as better NF algorithms.
BibTeX:
@article{Zhang2005,
  author = {Zhang, Yang and Arakaki, Adrian K. and Skolnick, Jeffrey},
  title = {TASSER: an automated method for the prediction of protein tertiary structures in CASP6.},
  journal = {Proteins},
  year = {2005},
  volume = {61 Suppl 7},
  pages = {91--8},
  doi = {http://dx.doi.org/10.1002/prot.20724}
}
Zhang, Y., Hubner, I.A., Arakaki, A.K., Shakhnovich, E. and Skolnick, J. On the origin and highly likely completeness of single-domain protein structures 2006 PNAS
Vol. 103(8), pp. 2605-2610 
article DOI  
Abstract: The size and origin of the protein fold universe is of fundamental and practical importance. Analyzing randomly generated, compact sticky homopolypeptide conformations constructed in generic simplified and all-atom protein models, all have similar folds in the library of solved structures, the Protein Data Bank, and conversely, all compact, single-domain protein structures in the Protein Data Bank have structural analogues in the compact model set. Thus, both sets are highly likely complete, with the protein fold universe arising from compact conformations of hydrogen-bonded, secondary structures. Because side chains are represented by their C[beta] atoms, these results also suggest that the observed protein folds are insensitive to the details of side-chain packing. Sequence specificity enters both in fine-tuning the structure and thermodynamically stabilizing a given fold with respect to the set of alternatives. Scanning the models against a three-dimensional active-site library, close geometric matches are frequently found. Thus, the presence of active-site-like geometries also seems to be a consequence of the packing of compact, secondary structural elements. These results have significant implications for the evolution of protein structure and function.
Review: Suplementary materials describe the new I-TASSER hydrogen bonding potential. It extents previous TOUCHSTONE-II purly geometrical/knowledge based potential with more accurate statistical based one.
BibTeX:
@article{Zhang2006,
  author = {Zhang, Yang and Hubner, Isaac A. and Arakaki, Adrian K. and Shakhnovich, Eugene and Skolnick, Jeffrey},
  title = {On the origin and highly likely completeness of single-domain protein structures},
  journal = {PNAS},
  year = {2006},
  volume = {103},
  number = {8},
  pages = {2605--2610},
  doi = {http://dx.doi.org/10.1073/pnas.0509379103}
}
Zhang, Y., Kihara, D. and Skolnick, J. Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding 2002 Proteins: Structure, Function, and Genetics
Vol. 48(2), pp. 192-201 
article DOI  
Abstract: Among the major difficulties in protein structure prediction is the roughness of the energy landscape that must be searched for the global energy minimum. To address this issue, we have developed a novel Monte Carlo algorithm called parallel hyperbolic sampling (PHS) that logarithmically flattens local high-energy barriers and, therefore, allows the simulation to tunnel more efficiently through energetically inaccessible regions to low-energy valleys. Here, we show the utility of this approach by applying it to the SICHO (SIde-CHain-Only) protein model. For the same CPU time, the parallel hyperbolic sampling method can identify much lower energy states and explore a larger region phase space than the commonly used replica sampling (RS) Monte Carlo method. By clustering the simulated structures obtained in the PHS implementation of the SICHO model, we can successfully predict, among a representative benchmark 65 proteins set, 50 cases in which one of the top 5 clusters have a root-mean-square deviation (RMSD) from the native structure below 6.5 �. Compared with our previous calculations that used RS as the conformational search procedure, the number of successful predictions increased by four and the CPU cost is reduced. By comparing the structure clusters produced by both PHS and RS, we find a strong correlation between the quality of predicted structures and the minimum relative RMSD (mrRMSD) of structures clusters identified by using different search engines. This mrRMSD correlation may be useful in blind prediction as an indicator of the likelihood of successful folds.
BibTeX:
@article{Zhang2002,
  author = {Zhang, Yang and Kihara, Daisuke and Skolnick, Jeffrey},
  title = {Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding},
  journal = {Proteins: Structure, Function, and Genetics},
  year = {2002},
  volume = {48},
  number = {2},
  pages = {192--201},
  doi = {http://dx.doi.org/10.1002/prot.10141}
}
Zhang, Y., Kolinski, A. and Skolnick, J. TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction 2003 Biophys. J.
Vol. 85(2), pp. 1145-1164 
article DOI  
Abstract: We have developed a new combined approach for ab initio protein structure prediction. The protein conformation is described as a lattice chain connecting Calpha atoms, with attached Cbeta atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of protein structures. The combination of these energy terms is optimized through the maximization of correlation for 30 x 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small proteins (36 [~] 120 residues) with predicted structures having a RMSD from native below 6.5 A in the top five cluster centroids. To fold larger-size proteins as well as to improve the folding yield of small proteins, we incorporate into the basic force field side-chain contact predictions from our threading program PROSPECTOR where homologous proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test proteins (36 [~] 174 residues) with structures having a RMSD to native below 6.5 A in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact prediction is >20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native structure than the previously used cluster energy or cluster size, and which can be used in native structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale.
Review: Chirality - property of being dissymmetric. It describes an object that is non-superimposable on its mirror image. http://tinyurl.com/2fwpln
BibTeX:
@article{Zhang2003,
  author = {Zhang, Yang and Kolinski, Andrzej and Skolnick, Jeffrey},
  title = {TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction},
  journal = {Biophys. J.},
  year = {2003},
  volume = {85},
  number = {2},
  pages = {1145--1164},
  doi = {http://dx.doi.org/10.1016/S0006-3495(03)74551-2}
}
Zhang, Y. and Skolnick, J. The protein structure prediction problem could be solved using the current PDB library 2005 PNAS
Vol. 102(4), pp. 1029-1034 
article DOI  
Abstract: For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 A with approx82% alignment coverage. These template structures often contain a significant number of insertions/deletions. The TASSER algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2/1,489), the resultant full-length models have an RMSD to native below 6 A (97% of them below 4 A). On average, the RMSD of full-length models is 2.25 A, with aligned regions improved from 2.5 A to 1.88 A, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-the-art structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments.
BibTeX:
@article{Zhang2005a,
  author = {Zhang, Yang and Skolnick, Jeffrey},
  title = {The protein structure prediction problem could be solved using the current PDB library},
  journal = {PNAS},
  year = {2005},
  volume = {102},
  number = {4},
  pages = {1029--1034},
  doi = {http://dx.doi.org/10.1073/pnas.0407152101}
}
Zhang, Y. and Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score 2005 Nucl. Acids Res.
Vol. 33(7), pp. 2302-2309 
article DOI  
Abstract: We have developed TM-align, a new algorithm to identify the best structural alignment between protein pairs that combines the TM-score rotation matrix and Dynamic Programming (DP). The algorithm is [~]4 times faster than CE and 20 times faster than DALI and SAL. On average, the resulting structure alignments have higher accuracy and coverage than those provided by these most often-used methods. TM-align is applied to an all-against-all structure comparison of 10 515 representative protein chains from the Protein Data Bank (PDB) with a sequence identity cutoff <95%: 1996 distinct folds are found when a TM-score threshold of 0.5 is used. We also use TM-align to match the models predicted by TASSER for solved non-homologous proteins in PDB. For both folded and misfolded models, TM-align can almost always find close structural analogs, with an average root mean square deviation, RMSD, of 3 A and 87% alignment coverage. Nevertheless, there exists a significant correlation between the correctness of the predicted structure and the structural similarity of the model to the other proteins in the PDB. This correlation could be used to assist in model selection in blind protein structure predictions. The TM-align program is freely downloadable at http://bioinformatics.buffalo.edu/TM-align.
BibTeX:
@article{Zhang2005b,
  author = {Zhang, Yang and Skolnick, Jeffrey},
  title = {TM-align: a protein structure alignment algorithm based on the TM-score},
  journal = {Nucl. Acids Res.},
  year = {2005},
  volume = {33},
  number = {7},
  pages = {2302--2309},
  doi = {http://dx.doi.org/10.1093/nar/gki524}
}
Zhang, Y. and Skolnick, J. Tertiary Structure Predictions on a Comprehensive Benchmark of Medium to Large Size Proteins 2004 Biophys. J.
Vol. 87(4), pp. 2647-2655 
article DOI  
Abstract: We evaluate tertiary structure predictions on medium to large size proteins by TASSER, a new algorithm that assembles protein structures through rearranging the rigid fragments from threading templates guided by a reduced Calpha and side-chain based potential consistent with threading based tertiary restraints. Predictions were generated for 745 proteins 201-300 residues in length that cover the Protein Data Bank (PDB) at the level of 35% sequence identity. With homologous proteins excluded, in 365 cases, the templates identified by our threading program, PROSPECTOR_3, have a root-mean-square deviation (RMSD) to native < 6.5 A, with >70% alignment coverage. After TASSER assembly, in 408 cases the best of the top five full-length models has a RMSD < 6.5 A. Among the 745 targets are 18 membrane proteins, with one-third having a predicted RMSD < 5.5 A. For all representative proteins less than or equal to 300 residues that have corresponding multiple NMR structures in the Protein Data Bank, approx20% of the models generated by TASSER are closer to the NMR structure centroid than the farthest individual NMR model. These results suggest that reasonable structure predictions for nonhomologous large size proteins can be automatically generated on a proteomic scale, and the application of this approach to structural as well as functional genomics represent promising applications of TASSER.
Review: Describes the TASSER method including new on/off-lattice model called CAS. Mention changes in the force field since TOUCHSTONE-II but does not describe them or give any usufull reference.
BibTeX:
@article{Zhang2004,
  author = {Zhang, Yang and Skolnick, Jeffrey},
  title = {Tertiary Structure Predictions on a Comprehensive Benchmark of Medium to Large Size Proteins},
  journal = {Biophys. J.},
  year = {2004},
  volume = {87},
  number = {4},
  pages = {2647--2655},
  doi = {http://dx.doi.org/10.1529/biophysj.104.045385}
}
Zhang, Y. and Skolnick, J. Automated structure prediction of weakly homologous proteins on a genomic scale 2004 PNAS
Vol. 101(20), pp. 7594-7599 
article DOI  
Abstract: We have developed TASSER, a hierarchical approach to protein structure prediction that consists of template identification by threading, followed by tertiary structure assembly via the rearrangement of continuous template fragments guided by an optimized Calpha and side-chain-based potential driven by threading-based, predicted tertiary restraints. TASSER was applied to a comprehensive benchmark set of 1,489 medium-sized proteins in the Protein Data Bank. With homologues excluded, in 927 cases, the templates identified by our threading algorithm PROSPECTOR_3 have a rms deviation from native <6.5 A with approx80% alignment coverage. After template reassembly, this number increases to 1,172. This shows significant and systematic improvement of the final models with respect to the initial template alignments. Furthermore, significant improvements in loop modeling are demonstrated. We then apply TASSER to the 1,360 medium-sized ORFs in the Escherichia coli genome; approx920 can be predicted with high accuracy based on confidence criteria established in the Protein Data Bank benchmark. These results from our unprecedented comprehensive folding benchmark on all protein categories provide a reliable basis for the application of TASSER to structural genomics, especially to proteins of low sequence identity to solved protein structures.
Review: Zhang2004 gives more broad description of the method. Despite the fact that I-TASSER articles (Wu2007) links here, no force field modifications are described here.
BibTeX:
@article{Zhang2004a,
  author = {Zhang, Yang and Skolnick, Jeffrey},
  title = {Automated structure prediction of weakly homologous proteins on a genomic scale},
  journal = {PNAS},
  year = {2004},
  volume = {101},
  number = {20},
  pages = {7594--7599},
  doi = {http://dx.doi.org/10.1073/pnas.0305695101}
}
Zhang, Y. and Skolnick, J. Scoring function for automated assessment of protein structure template quality 2004 Proteins: Structure, Function, and Bioinformatics
Vol. 57(4), pp. 702-710 
article DOI  
BibTeX:
@article{Zhang2004b,
  author = {Yang Zhang and Jeffrey Skolnick},
  title = {Scoring function for automated assessment of protein structure template quality},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2004},
  volume = {57},
  number = {4},
  pages = {702--710},
  doi = {http://dx.doi.org/10.1002/prot.20264}
}
Zhang, Y. and Skolnick, J. SPICKER: A clustering approach to identify near-native protein folds 2004 J. Comput. Chem.
Vol. 25(6), pp. 865-871 
article DOI  
Abstract: We have developed SPICKER, a simple and efficient strategy to identify near-native folds by clustering protein structures generated during computer simulations. In general, the most populated clusters tend to be closer to the native conformation than the lowest energy structures. To assess the generality of the approach, we applied SPICKER to 1489 representative benchmark proteins ≤200 residues that cover the PDB at the level of 35% sequence identity; each contains up to 280,000 structure decoys generated using the recently developed TASSER (Threading ASSembly Refinement) algorithm. The best of the top five identified folds has a root-mean-square deviation from native (RMSD) in the top 1.4% of all decoys. For 78% of the proteins, the difference in RMSD from native to the identified models and RMSD from native to the absolutely best individual decoy is below 1 Å; the majority belong to the targets with converged conformational distributions. Although native fold identification from divergent decoy structures remains a challenge, our overall results show significant improvement over our previous clustering algorithms.
BibTeX:
@article{Zhang2004c,
  author = {Zhang, Y. and Skolnick, J.},
  title = {SPICKER: A clustering approach to identify near-native protein folds},
  journal = {J. Comput. Chem.},
  publisher = {Wiley Subscription Services, Inc., A Wiley Company},
  year = {2004},
  volume = {25},
  number = {6},
  pages = {865--871},
  doi = {http://dx.doi.org/10.1002/jcc.20011}
}
Zhu, J. and Weng, Z. FAST: A novel protein structure alignment algorithm 2005 Proteins: Structure, Function, and Bioinformatics
Vol. 58(3), pp. 618-627 
article DOI  
Abstract: We present a novel algorithm named FAST for aligning protein three-dimensional structures. FAST uses a directionality-based scoring scheme to compare the intra-molecular residue-residue relationships in two structures. It employs an elimination heuristic to promote sparseness in the residue-pair graph and facilitate the detection of the global optimum. In order to test the overall accuracy of FAST, we determined its sensitivity and specificity with the SCOP classification (version 1.61) as the gold standard. FAST achieved higher sensitivities than several existing methods (DaliLite, CE, and K2) at all specificity levels. We also tested FAST against 1033 manually curated alignments in the HOMSTRAD database. The overall agreement was 96%. Close inspection of examples from broad structural classes indicated the high quality of FAST alignments. Moreover, FAST is an order of magnitude faster than other algorithms that attempt to establish residue-residue correspondence. Typical pairwise alignments take FAST less than a second with a Pentium III 1.2GHz CPU. FAST software and a web server are available at http://biowulf.bu.edu/FAST/
BibTeX:
@article{Zhu2005,
  author = {Zhu, Jianhua and Weng, Zhiping},
  title = {FAST: A novel protein structure alignment algorithm},
  journal = {Proteins: Structure, Function, and Bioinformatics},
  year = {2005},
  volume = {58},
  number = {3},
  pages = {618--627},
  doi = {http://dx.doi.org/10.1002/prot.20331}
}
Zwanzig, R., Szabo, A. and Bagchi, B. Levinthal's paradox 1992 Proc. Natl. Acad. Sci. USA
Vol. 89(1), pp. 20-2 
article DOI  
Abstract: Illustration of Levinthal's paradox.
BibTeX:
@article{Zwanzig1992,
  author = {Zwanzig, Robert and Szabo, Attila and Bagchi, Biman},
  title = {Levinthal's paradox},
  journal = {Proc. Natl. Acad. Sci. USA},
  year = {1992},
  volume = {89},
  number = {1},
  pages = {20--2},
  doi = {http://dx.doi.org/10.1073/pnas.89.1.20}
}
Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques 2005   book DOI  
BibTeX:
@book{Burke2005,,
  title = {Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques},
  publisher = {Springer},
  year = {2005},
  doi = {http://dx.doi.org/10.1007/0-387-28356-0}
}
Handbook of Metaheuristics 2003   book DOI  
BibTeX:
@book{Glover2003,,
  title = {Handbook of Metaheuristics},
  publisher = {Springer},
  year = {2003},
  doi = {http://dx.doi.org/10.1007/b101874}
}
CMOS contact maps data sets   webpage URL 
BibTeX:
@webpage{URL_CMOS,,
  title = {CMOS contact maps data sets},
  url = {http://eudoxus.scs.uiuc.edu/cgi-bin/cmosinstr.cgi}
}
Google App Engine   webpage URL 
BibTeX:
@webpage{URL_GAE,,
  title = {Google App Engine},
  url = {http://code.google.com/appengine/}
}
Google App Engine project roadmap   webpage URL 
BibTeX:
@webpage{URL_GAE_ROADMAP,,
  title = {Google App Engine project roadmap},
  url = {http://code.google.com/appengine/docs/roadmap.html}
}
TINKER   webpage URL 
BibTeX:
@webpage{URL_TINKER,,
  title = {TINKER},
  url = {http://dasher.wustl.edu/tinker/}
}
Folding@home client statistics 2010   webpage URL 
BibTeX:
@webpage{URL_FH_STATS,,
  title = {Folding@home client statistics},
  year = {2010},
  url = {http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats}
}
Protein Data Bank statistics 2010   webpage URL 
BibTeX:
@webpage{URL_PDB_STATS,,
  title = {Protein Data Bank statistics},
  year = {2010},
  url = {http://www.pdb.org/pdb/statistics/holdings.do}
}