Three-Stage Prediction of Protein Beta-Sheets by Neural Networks, Alignments, and Graph Algorithms

Jianlin Cheng and Pierre Baldi

Abstract

Protein beta-sheets play a fundamental role in protein structure, function, evolution, and bio-engineering. Accurate prediction and assembly of protein beta-sheets, however, remains challenging because protein beta-sheets require formation of hydrogen bonds between linearly distant residues. Previous approaches for predicting beta-sheet topological features, such as beta-strand alignments, in general have not exploited the global covariation and constraints characteristic of beta-sheet architectures. We propose a modular approach to the problem of predicting/assembling protein beta-sheets in a chain by integrating both local and global constraints in three steps. The first step uses recursive neural networks to predict pairing probabilities for all pairs of inter-strand beta-residues from profile, secondary structure, and solvent accessibility information. The second step applies dynamic programming techniques to these probabilities to derive binding pseudo-energies and optimal alignments between all pairs of beta-strands. Finally, the third step, uses graph matching algorithms to predict the beta-sheet architecture of the protein by optimizing the global pseudo-energy while enforcing strong global beta-strand pairing constraints. The approach is evaluated using cross-validation methods on a large non-homologous dataset and yields significant improvements over previous methods.

Download BETApro 1.0 (Linux version). See readme.txt in the zip file or click here for installation instructions. This software depends on SSpro4.0 (secondary structure predictor). You can download SSpro here.

[PDF] Download the paper at Bioinformatics website or a quick powerpoint overview

The full dataset (BetaSheet916) used in the paper.

BetaSheet916 is splitted randomly and evenly into ten folds to perform cross-validation.
Fold 1, Fold 2, Fold 3, Fold 4, Fold 5, Fold 6, Fold 7, Fold 8, Fold 9, Fold 10


Question or need help? Please send email to prigor@ics.uci.edu, pfbaldig@ics.uci.edu, or jianlinc@uci.edu.