2019 Mississippi IDeA Conference

A36 Joseph Luttrell IV (Room Grand Ballroom C)

02 Aug 19
11:00 AM - 12:15 PM

RFcon: A Web-Based Software Package for Sequence-Based Residue-Residue Contact Prediction


Joseph Luttrell IV
1, Tong Liu2, Chaoyang Zhang1, Zheng Wang2

1School of Computing Sciences and Computer Engineering, The University of Southern Mississippi, Hattiesburg, MS

2Department of Computer Science, University of Miami, Coral Gables, FL


Predicting contacting residue pairs in proteins is a challenging problem that has great potential to advance many areas of protein research. For example, contact prediction can provide assistance in identifying important functional regions of proteins and in reducing the search space of possible contacts when predicting the structure of complex proteins. Considering the growing gap between the large number of available protein sequences and the relatively low number of experimentally determined protein structures, these predictions are becoming increasingly important. Here, we have developed and benchmarked a set of machine learning methods for performing residue-residue contact prediction using only the amino acid sequence of the target protein as input. These methods were based on random forests, deep networks (stacked denoising autoencoders), support vector machines, and direct-coupling analysis. According to our own evaluations performed on targets from the CASP11 dataset at a resolution of +/− two residues, our random forest models were our top performing predictors and achieved average top 10 prediction accuracy scores of 85.13% (short range), 74.49% (medium range), and 54.49% (long range). Our best performing deep network predictors were our ensemble models which achieved average top 10 prediction accuracy scores of 75.51% (short range), 60.26% (medium range), and 43.85% (long range) using the same evaluation. These results suggested that our models achieved comparable performance to methods developed by other CASP11 groups. Due to the complexity of contact prediction problems, the community can benefit from exploring a variety of different contact prediction methods. Therefore, we have released our C++ implementation of our direct-coupling analysis method as a standalone software package along with the source code for the prediction methods used by our RFcon webserver. Furthermore, our work has produced a useful tool with a simple web interface that delivers contact predictions to users without requiring a lengthy installation process. All of this is freely available to the public at
http://dna.cs.miami.edu/RFcon/