|Topic:||Structure-based prediction of protein-peptide binding regions using Random Forest .|
|Details:|| Protein-peptide interactions are one of the most important biological interactions and play crucial
role in many diseases including cancer (1). Therefore, knowledge of these interactions provides
invaluable insights into all cellular processes, functional mechanisms, and drug discovery (2).
Protein-peptide interactions can be analyzed by studying the structures of protein-peptide
complexes. Thus, predicting peptide-binding sites computationally will be useful to increase
efficiency and cost effectiveness of experimental studies. Here, we established a machine learning
method called SPRINT-Str (Structure-based prediction of protein-Peptide Residue-level
Interaction) to use structural information for predicting protein-peptide binding regions.
The initial dataset of protein-peptide complex structures was obtained from the BioLip (3). After
removing redundant chains with sequence identity more than 30%, the final dataset consists of
1,242 protein-peptide complexes, which is divided into training set and independent test set
containing 1,116 and 125 proteins, respectively. Several structural-based features and the most
discriminative sequence-based features reported in the SPRINT (4) were extracted and integrated
by a Random Forest (RF) classifier (5) for prediction of binding residues. Predicted binding
residues were employed to infer binding sites using Density-Based Spatial Clustering of
Applications with Noise (DBSCAN) algorithm (6). The largest binding site of each protein was
then selected by setting some restrictions on the predicted binding sites.
COPYRIGHT ©2019 . All Rights Reserved.