TY - JOUR
T1 - Deepkcrot
T2 - A deep-learning architecture for general and species-specific lysine crotonylation site prediction
AU - Wei, Xilin
AU - Sha, Yutong
AU - Zhao, Yiming
AU - He, Ningning
AU - Li, Lei
N1 - Publisher Copyright:
© 2021 Copernicus GmbH. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Lysine crotonylation (Kcrot), as a post-translational modification (PTM) originally identified at histone proteins, is involved in diverse biological processes. Several conventional machine-learning (ML) predictors were developed based on the Kcrot sites from histone proteins. Recently, thousands of Kcrot sites have been experimentally verified on non-histone proteins from multiple species. Accordingly, a few predictors have been developed for predicting the Krot sites for specific organisms (i.e. humans and papaya). Nevertheless, there is a lack of research on the comparison of the crotonylomes of different organisms. Here, we collected around 20,000 Kcrot sites experimentally identified from four different species as the benchmark data set. We present the deep-learning (DL) architecture dubbed DeepKcrot for predicting Kcrot sites on the proteomes across various species. DeepKcrot includes species-specific and general classifiers using a convolutional neural network with the word embedding (CNNWE) encoding approach. CNNWE performs better than both the traditional ML-based and other DL-based classifiers in terms of ten-fold cross-validation and independent test, independent of the size of the training set. Additionally, cross-species performance for each species-specific predictor is not as good as the self-species performance whereas the cross-species performance generally increases with the size of the training dataset. Moreover, the predictors developed based on the non-histone Kcrot sites are unsuccessful for the histone Kcrot prediction, suggesting that the Kcrot-containing peptides from non-histone and histone proteins have significantly different characteristics and data integration is required. Overall, DeepKcrot is an efficient prediction tool and freely available at http://www.bioinfogo.org/deepkcrot.
AB - Lysine crotonylation (Kcrot), as a post-translational modification (PTM) originally identified at histone proteins, is involved in diverse biological processes. Several conventional machine-learning (ML) predictors were developed based on the Kcrot sites from histone proteins. Recently, thousands of Kcrot sites have been experimentally verified on non-histone proteins from multiple species. Accordingly, a few predictors have been developed for predicting the Krot sites for specific organisms (i.e. humans and papaya). Nevertheless, there is a lack of research on the comparison of the crotonylomes of different organisms. Here, we collected around 20,000 Kcrot sites experimentally identified from four different species as the benchmark data set. We present the deep-learning (DL) architecture dubbed DeepKcrot for predicting Kcrot sites on the proteomes across various species. DeepKcrot includes species-specific and general classifiers using a convolutional neural network with the word embedding (CNNWE) encoding approach. CNNWE performs better than both the traditional ML-based and other DL-based classifiers in terms of ten-fold cross-validation and independent test, independent of the size of the training set. Additionally, cross-species performance for each species-specific predictor is not as good as the self-species performance whereas the cross-species performance generally increases with the size of the training dataset. Moreover, the predictors developed based on the non-histone Kcrot sites are unsuccessful for the histone Kcrot prediction, suggesting that the Kcrot-containing peptides from non-histone and histone proteins have significantly different characteristics and data integration is required. Overall, DeepKcrot is an efficient prediction tool and freely available at http://www.bioinfogo.org/deepkcrot.
KW - Convolutional neural network
KW - Deep learning
KW - Lysine crotonylation
KW - Non-histone protein
KW - Random forest
UR - https://www.scopus.com/pages/publications/85103288734
U2 - 10.1109/ACCESS.2021.3068413
DO - 10.1109/ACCESS.2021.3068413
M3 - 文章
AN - SCOPUS:85103288734
SN - 2169-3536
VL - 9
SP - 49504
EP - 49513
JO - IEEE Access
JF - IEEE Access
M1 - 3068413
ER -