The compendium of computational resources for protein PTMs

    Protein post-translational modifications, also called as covalent modifications, play important role in all kinds of biological processes and pathways. To date, there were more than 350 types of PTMs discovered (DeltaMass). However, only a few of them were well-studied, eg., phosphorylation, glycosylation, ubiquitination/ubiquitylation, and sumoylation, etc. Besides experimental methods, numerous computational predictors were developed and popular for their convenience, accuracy, and fast-speed. Here, we collected 233 databases and computational tools for PTMs analyses. Based on previous reports (Huber and Hardin, 2004; Jensen, 2004; Mann and Jensen, 2003; Walsh, et al., 2005; Walsh and Jefferis, 2006), we roughly classified the computational tools into two groups based on their PTMs types, including Attachment of chemical groups and Peptide cleavage / Protein sorting / subcellular localization (see in CBS).

    We apologized that the computational studies without any web links of databases or tools will not be included in this compendium, since it's not easy for experimentalists to use studies directly. We are grateful for users feedback. Please inform Dr. Yu Xue to add, remove or update one or multiple web links below.

 

Index:

1. Online Databases
    <1> General PTMs datbases
    <2> Phosphorylation databases
    <3> Other databases
2. Computational tools
    <1> General PTMs
    <2> Attachment of chemical groups
           A. Phosphorylation
                <a> Prediction of non-specific or organism-specific phosphorylation sites
                <b> Prediction of kinase-specific phosphorylation sites or phospho-binding motifs
                <c> Detecting potential phosphorylation sites from mass spectrometry data
                <d> Other tools
            B. O-glycosylation
            C. GPI Modification Site
            D. N-terminal myristoylation
            E. Prenylation
            F. Sumoylation
            G. Palmitoylation
            H. Methylation
            I. Acetylation
            J. Ubiquitination/ubiquitylation
            K. Sulfation
            L. Cysteine disulfide
            M. Serine Pyruvoylation
            N. Nitration
            O. S-nitrosylation
            P. Pupylation
            Q. Glutathionylation
    <3> Peptide cleavage / Protein sorting / subcellular localization
            A. Signal peptide cleavage sites
            B. Caspase substrates cleavage
            C. Proteasomal cleavage

========================================================================================

1. Online Databases

<1> General PTMs datbases

1. RESID: a comprehensive collection of annotations and structures for protein modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link post-translational modifications (Garavelli, 1999; Garavelli, 2001; Garavelli, et al., 2004).

2. UNIMOD: The aim is to create a community supported, comprehensive database of protein modifications for mass spectrometry applications (The UniProt Consortium, 2008).

3. DeltaMass: a database of protein post-translational modifications.

4. PSI-MOD: The protein modification ontology (PSI-MOD). The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics to facilitate data comparison, exchange and verification (Montecchi-Palazzi, et al., 2008).

5. Swiss-Prot knowledge base: for each protein annotation, the "Amino acid modifications" in the "Sequence annotation (Features)" section collected the post-translational modification information of proteins (Farriol-Mathis, et al., 2004).

6. HPRD: The Human Protein Reference Database represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome (Keshava Prasad, et al., 2009; Mishra, et al., 2006; Peri, et al., 2003; Peri, et al., 2004).

7. PhosphoSitePlus: a new version of PhosphoSite, is a web-based database to collect protein modification sites, including protein phosphorylation sites from scientific literature as well as high-throughput discovery programs. Currently, PhosphoSitePlus contains 66,045 phosphorylation sites (Hornbeck, et al., 2014).

8. dbPTM 3.0: integrates experimentally verified PTMs from several databases, and to annotate the predicted PTMs on Swiss-Prot proteins (Lu, et al., 2013).

9. SysPTM 2.0: a systematic resource for proteomic research on post-translational modifications (Li, et al., 2014).

10. PTMfunc: a repository of functional predictions for protein post-translational modifications (Beltrao, et al., 2012).

11. PTMCode: a resource of known and predicted functional associations between protein post-translational modifications within and between interacting proteins (Minguez, et al., 2014).


<2> Phosphorylation databases

1. EKPD: a hierarchical database of eukaryotic protein kinases and protein phosphatases(Wang, et al., 2014).

2. PhosSNP: systematic analysis of genetic polymorphisms that influence protein phosphorylation (Ren, et al., 2010).

3. Phospho.ELM 8.1 (PhosphoBase): contains 4,384 experimentally verified phosphorylated proteins from different species with 2,166 tyrosine, 13,320 serine and 2,766 threonine sites. All instances were manually collected from scientific literature (Diella, et al., 2004; Diella, et al., 2008).

4. PhosphoNET: PhosphoNET presently holds data on more than 26,000 phosphorylation sites in over 5350 human proteins that have been collected from the scientific literature and other reputable websites. It features direct links to several other useful websites, and will continue to expand as a useful portal for phosphoproteomics information.

5. PHOSIDA: a phosphorylation site database, integrates thousands of high-confidence in vivo phosphosites identified by mass spectrometry-based proteomics in various species. For each phosphosite, PHOSIDA lists matching kinase motifs, predicted secondary structures, conservation patterns, and its dynamic regulation upon stimulus. Using support vector machines, PHOSIDA also predicts non-specific phosphosites (Gnad, et al., 2007).

6. PhosphoPep v2.0: contains MS-derived phosphorylation data from 4 different organisms, including fly (Drosophila melanogaster), human (Homo sapiens), worm (Caenorhabditis elegans), and yeast (Saccharomyces cerevisiae) (Bodenmiller, et al., 2008).

7. PhosPhAt 4.0: contains information on Arabidopsis phosphorylation sites which were identified by mass spectrometry in large scale experiments from different research groups (Zulawski, et al., 2013).

8. P(3)DB: provides a database of protein phosphorylation data from multiple plants (Yao, et al., 2014).

9. PhosphoPOINT (Obsolete): is a comprehensive human kinase interactome and phospho-protein database, containing 4195 phospho-proteins with a total of 15,738 phosphorylation sites (Yang, et al., 2008).

10. NetworKIN: is a method for predicting in vivo kinase-substrate relationships, that augments consensus motifs with context for kinases and phosphoproteins. It's a great resource and open a door for computational discovering of phospho-regulatory network (Linding, et al., 2007; Linding, et al., 2008).

11. Phospho3D: is a database of three-dimensional structures of phosphorylation sites which stores information retrieved from the phospho.ELM database and which is enriched with structural information and annotations at the residue level (Zanzoni, et al., 2011).

12. PepCyber :P~Pep 1.1: is a database of human protein-protein interactions mediated by 10 classes of phosphoprotein binding domains (PPBDs) (Gong, et al., 2008).

13. Kinase.com: explores the functions, evolution and diversity of protein kinases (Manning, et al., 2002).

14. PhosphoGRID: An online database of experimentally verified in vivo protein phosphorylation sites in the model eukaryotic organism Saccharomyces cerevisisae (Sadowski, et al., 2013).

15. RegPhos: A knowledgebased system that can let users input a group of genes/proteins to be explored the phosphorylation network associated with the information of subcellular localization (Huang, et al., 2014).

16. LymPHOS: A web-oriented DataBase containing peptidic and protein sequences and spectrometric information on the PhosphoProteome of human T-Lymphocytes (Ovelleiro, et al., 2009).


<3> Other databases

1. CPLA: an integrated database of protein lysine acetylation (Liu, et al., 2011).

2. CPLM: a database of protein lysine modifications (Liu, et al., 2014).

3. UUCD: a family-based database of ubiquitin and ubiquitin-like conjugation (Gao, et al., 2013).

4. Casbase (Obsolete): an online relational database containing an array of biological data on caspases and their substrates (Wee, et al., 2007).

5. CASBAH: The CAspase Substrate dataBAse Homepage (Luthi and Martin, 2007).

6. O-GlycBase: is a revised database of O- and C-glycosylated proteins. Version 6.00 has 242 glycoprotein entries. The criteria for inclusion are at least one experimentally verified O- or C-glycosylation site (Gupta, et al., 1999).

7. GlyProt: It is estimated that over 50% of all the proteins are glycosylated. But most of the 3D structures of proteins stored in PDB do have no attached glycans. GlyProt is capable to connect N-glycans in silico to a given 3D protein structure (Bohne-Lang and von der Lieth, 2005).

8. CSS: Carbohydrate Structure Suite, tools for analysis of carbohydrate 3D structures (Lutteke, et al., 2005).

9. SWEET-DB: is an attempt to use modern web techniques to annotate and/or cross-reference carbohydrate-related data collections which allow glycoscientists to find important data for compounds of interest in a compact and well-structured representation (Loss, et al., 2002).

10. SPdb: SPdb is a signal peptide database containing signal sequences of archaea, prokaryotes and eukaryotes. The signal-associated data is stored in a MySQL relational database and provided as DNA and protein sequences (Choo, et al., 2005).

11. OSCTdb/BALOSCTdb: The OSCTdb contains 161 oxidation-susceptible cysteines, 301 oxidation-non-susceptible cysteines, and a total of 100 polypeptides. To achieve balance between oxidization-susceptible and oxidation-non-susceptible cysteine thiols, the number of oxidation-non-susceptible cysteines was lowered to 161 in a randomized fashion to create a balanced OSCTdb (BALOSCTdb) (Sanchez, et al., 2008).

12. DSDBASE: is a database on disulphide bonds in proteins that provides information on native disulphides and those which are stereochemically possible between pairs of residues in a protein (Vinayagam, et al., 2004).

13. Analycys (Obsolete): Analycys is a database for the benefit of mapping the paired cysteine mutations and also to suggest a disulphide profile for the disulphide bond rich families of SCOP (1.67) (Thangudu, et al., 2007).

14. dbGSH: a database of S-glutathionylation (Chen, et al., 2014).

15. hUbiquitome: A database of experimentally verified ubiquitination cascades in human (Du, et al., 2011).

16. mUbiSiDa: a comprehensive database for protein ubiquitination sites in mammals (Chen, et al., 2014).

17. E3NET: A web-based system that provides a comprehensive collection of available E3-substrate specificities and a systematic framework for the analysis of E3-mediated regulatory networks of diverse cellular functions. (Han, et al., 2012).

18. DUDE: A database of ubiquitin (UB) and deubiquitinating (DUB) enzymes across multiple eukaryotic genomes (Hutchins, et al., 2013).

19. plantsUPS: The Database of Plants Ubiquitin Proteasome System (Su, et al., 2009).

20. PupDB: A database of pupylated proteins and pupylation sites (Tung, et al., 2012).

21. UbiProt: A significant volume of data concerning various protein substrates of ubiquitylation (Chernorudskiy, et al., 2007).

22. dbSNO: A database that integrates the experimentally verified cysteine S-Nitrosylation sites from multiple species (Chen , et al., 2014).

23. dbOGAP: A database of O-GlcNAcylated proteins and sites based on experimental data curated from literature as well as from collaborating labs. (Wang, et al., 2011).

24. UniPep: A database for human N-linked glycosites. (Zhang, et al., 2006).

25. GlycoSuiteDB: a curated relational database of glycoprotein glycan structures and their biological sources. (Cooper, et al., 2003).

26. GlycoProtDB: A glycoprotein database providing information of Asn (N)-glycosylated proteins and their glycosylated site(s). (Kaji, et al., 2012).


2. Computational tools


<1> General PTMs

1. PROSITE: consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them (de Castro, et al., 2006; Hulo, et al., 2008).

2. ELM: is a resource for predicting functional sites in eukaryotic proteins (Puntervoll, et al., 2003).

3. Minimotif Miner: analyzes protein queries for the presence of short functional motifs that, in at least one protein, has been demonstrated to be involved in posttranslational modifications (PTM), binding to other proteins, nucleic acids, or small molecules, or proteins trafficking (Balla, et al., 2006; Rajasekaran, et al., 2009).

4. AutoMotif 4.0: allows for identification of PTM (post-translational modification) sites, including phosphorylation sites in proteins. The AutoMotif Server 2.0 was trained support vector machine (SVM) for each type of PTM separately on proteins of the Swiss-Prot database (version 4.0) (Plewczynski, et al., 2012 ).

5. FindMod: is a tool that can predict potential protein post-translational modifications (PTM) and find potential single amino acid substitutions in peptides (Wilkins, et al., 1999).

6. TermiNator: TermiNator predicts N-terminal methionine excision, N-terminal acetylation, N-terminal myristoylation and S-palmitoylation of either prokaryotic or eukaryotic proteins originating from organellar or nuclear genomes. TermiNator also relates the predicted N-terminus to protein half-life (5-220 hours) and relative translation efficiency (relative value 1-5) of eukaryotic proteins (Martinez, et al., 2008).


<2> Attachment of chemical groups


A. Phosphorylation

 <a> Prediction of non-specific or organism-specific phosphorylation sites

1. NetPhos 2.0: produces neural network predictions for serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins (Blom, et al., 1999).

2. CRP: Cleaved Radioactivity of Phosphopeptides. CRP performs an in silico proteolytic cleavage of the sequence and reports the predicted Edman cycles in which radioactivity would be observed if a given serine, threonine or tyrosine will be phosphorylated (Mackey, et al., 2003).

3. DISPHOS 1.3: uses disorder information to improve the discrimination between phosphorylation and non-phosphorylation sites, and predicts serine, threonine and tyrosine phosphorylation sites in proteins (Iakoucheva, et al., 2004).

4. NetPhosYeast 1.0: predicts serine and threonine phosphorylation sites in yeast proteins (Ingrell, et al., 2007).

5. NetPhosBac 1.0: NetPhosBac 1.0 server predicts serine and threonine phosphorylation sites in bacterial proteins (Miller et al. 2009).

6. PhosphoSVM: Prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine (Dou et al. 2014).

7. cksaap_phsite (Obsolete): Prediction of protein phosphorylation sites by using the composition of k-spaced amino acid pairs (Zhao et al. 2012).

8. PhosphoRice: A meta-predictor of rice-specific phosphorylation site, was constructed by integrating the newly phosphorylation sites predictors, NetPhos2.0, NetPhosK, Kinasephos, Scansite, Disphos and Predphosphos with parameters selected by restricted grid search and random search (Que et al. 2012).

9. PHOSFER: A random forest-based method for applying phosphorylation data from other organisms to enhance the accuracy of predictions in a target organism. (Trost and Kusalik, 2013).

10. DAPPLE: Represents an alternative method (to machine-learning approaches) to predicting phosphorylation sites in an organism of interest. (Trost et al. 2013).


<b> Prediction of kinase-specific phosphorylation sites or phospho-binding motifs

1. GPS 1.10: The old version of GPS. We designed a novel algorithm GPS (Group-based Phosphorylation sites Prediction) and construct an easy-to-use web server for the experimentalists (Xue, et al., 2005; Zhou, et al., 2004).

2. GPS 2.1: The current version of GPS system. We renamed the tool as the Group-based Prediction System. GPS 2.1 software was implemented in JAVA and could predict kinase-specific phosphorylation sites for 408 human Protein Kinases in hierarchy (Xue, et al., 2008).

3. iGPS: GPS algorithm with the interaction filter, or in vivo GPS (Song, et al., 2012).

4. GPS-Polo: Prediction of phosphorylation and phospho-binding sites for Polo-like kinases (Liu, et al., 2013).

5. PPSP 1.0: We also developed another online program for prediction of kinase-specific phosphorylation sites, implemented in Baysian Decision Theory (BDT) (Xue, et al., 2006).

6. PKIS: Identification of the protein kinases responsible for experimentally verified P-sites through the composition of monomer spectrum (CMS) encoding strategy and support vector machines (SVMs) (Zou, et al., 2013).

7. phos_pred: Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest (Fan, et al., 2014).

8. SubPhos: Proteomic Analysis and Prediction of Human Phosphorylation Sites in Subcellular Level Reveals Subcellular Specificity (Chen, et al., 2014).

9. PhosphoMotif Finder: contains known kinase/phosphatase substrate as well as binding motifs that are curated from the published literature. It reports the PRESENCE of any literature-derived motif in the query sequence (Amanchy, et al., 2007).

10. PSEA: prediction of phosphorylation sites based on the Phosphorylation Set Enrichment Analysis (PSEA) method (Suo, et al., 2014).

11. NetPhorest: is a non-redundant collection of 125 sequence-based classifiers for linear motifs in phosphorylation-dependent signaling. The collection contains both family-based and gene-specific classifiers (Miller, et al., 2008).

12. Predikin & PredikinDB 2.0: consists of two components: (i) PredikinDB, a database of phosphorylation sites that links substrates to kinase sequences and (ii) a Perl module, which provides methods to classify protein kinases, reliably identify substrate-determining residues, generate scoring matrices and score putative phosphorylation sites in query sequences (Saunders, et al., 2008; Saunders and Kobe, 2008).

13. Scan-X: is a software tool designed to find motifs (identified using motif-x) within any sequence data set. The first large scale scan was performed using all available human, mouse, fly and yeast phosphorylation and acetylation data to perform a scan for undiscovered sites (Schwartz, et al., 2009).

14. ScanSite 3.0: searches for motifs within proteins that are likely to be phosphorylated by specific protein kinases or bind to domains such as SH2 domains, 14-3-3 domains or PDZ domains (Obenauer, et al., 2003).

15. NetPhosK 1.0: produces neural network predictions of kinase specific eukaryotic protein phosphoylation sites. Currently NetPhosK covers the following kinases: PKA, PKC, PKG, CKII, Cdc2, CaM-II, ATM, DNA PK, Cdk5, p38 MAPK, GSK3, CKI, PKB, RSK, INSR, EGFR and Src (Blom, et al., 2004).

16. KinasePhos 1.0: predicts kinase-specific phosphorylation sites within given protein sequences. Profile Hidden Markov Model (HMM) is applied for learning to each group of sequences surrounding to the phosphorylation residues (Huang, et al., 2005).

17. KinasePhos 2.0: New version of kinase-specific phosphorylation site prediction tool that is based the sequenece-based amino acid coupling-pattern analysis and solvent accessibility as new features of SVM (support vector machine) (Wong, et al., 2007).

18. PhoScan: predicts of kinase-specific phosphorylation sites with sequence features by a log-odds ratio approach (Li, et al., 2007).

19. pkaPS: Prediction of protein kinase A phosphorylation sites using the simplified kinase binding model (Neuberger, et al., 2007).

20. CRPhos 0.8: Prediction of kinase-specific phosphorylation sites using conditional random fields. Its source code is free for academic research and could be compiled in Linux/Unix OS (Dang, et al., 2008).

21. MetaPredPS: Meta-predictors make predictions by organizing and processing the predictions produced by several other predictors in a defined problem domain (Wan, et al., 2008).

22. SMALI: searches for peptide ligands in human proteins that are likely to bind to SH2 domains (Huang, et al., 2008; Li, et al., 2008).

23. PredPhospho (Obsolete): implemented in SVM algorithm, could predict kinase-specific phosphorylation sites (Kim, et al., 2004).

24. PhosphoPICK: predicting kinase substrates using cellular context information (Patrick, et al., 2014).

25. Musite: A tool for global prediction of general and kinase-specific phosphorylation sites (Gao, et al., 2010).


<c> Detecting potential phosphorylation sites from mass spectrometry data

1. PhosphoScore: is a phosphorylation assignment program that is compatible with all levels of tandem mass spectrometry spectra (MSn) generated through the Bioworks/Sequest platform. The program utilizes a cost function which takes into account both the match quality and normalized intensity of observed spectral peaks compared to a theoretical spectrum. PhosphoScore was written in Java (Ruttenberg, et al., 2008).

2. Ascore: measures the probability of correct phosphorylation site localization based on the presence and intensity of site-determining ions in MS/MS spectra (Beausoleil, et al., 2006).

3. Colander: a probability-based support vector machine algorithm for automatic screening for CID spectra of phosphopeptides prior to database search (Lu, et al., 2008).

4. DeBunker: a SVM-based software, which could automatically validate phosphopeptide identifications from tandem mass spectra (Lu, et al., 2007).

5. APIVASE 2.5: was developed for phosphopeptide validation by combining the information obtained from MS2 spectra and its corresponding neutral loss MS3 spectra (Jiang, et al., 2008).

6. InsPecT: a new scoring function was developed for phosphorylated peptide tandem mass spectra for ion-trap instruments, without the need for manual validation (Payne, et al., 2008).

7. Phosphopeptide FDR Estimator: is designed for analysis of phosphopeptide LC-MS/MS data (Du, et al., 2008).


<d> Other tools

1. Motif-X: is a software tool designed to extract overrepresented patterns from any sequence data set. The algorithm is an iterative strategy which builds successive motifs through comparison to a dynamic statistical background (Schwartz and Gygi, 2005).

2. MoDL: finds mutliple motifs in a set of phosphorylated peptides (Ritz, et al., 2009).

3. PhosphoBlast: allows the user to submit a protein query to search against the curated dataset of phosphorylated peptides (Wang and Klemke, 2008).

4. RLIMS-P: is a rule-based text-mining program specifically designed to extract protein phosphorylation information on protein kinase, substrate and phosphorylation sites from the abstracts (Hu, et al., 2005; Yuan, et al., 2006).

5. KEA: Kinase enrichment analysis (KEA) is a web-based tool with an underlying database providing users with the ability to link lists of mammalian proteins/genes with the kinases that phosphorylate them (Lachmann and Ma'ayan, 2009).

6. pptm: Literature mining of protein phosphorylation using dependency parse trees (Wang, et al., 2014).

7. PhosphoSiteAnalyzer: A bioinformatical tool for analyzing (quantitative) phosphoproteome datasets. (Bennetzen, et al., 2012).


B. O-glycosylation

1. DictyOGlyc: A sequence based prediction server for GlcNAc O-glycosylations in D. discoideum proteins (Gupta, et al., 1999).

2. NetCGlyc: NetCGlyc 1.0 produces neural network predictions of C-mannosylation sites in mammalian proteins (Julenius, 2007).

3. NetOGlyc 3.1: The NetOglyc server produces neural network predictions of mucin type GalNAc O-glycosylation sites in mammalian proteins (Julenius, et al., 2005).

4. NetGlycate: NetGlycate 1.0 server predicts glycation of amino groups of lysines in mammalian proteins (Johansen, et al., 2006).

5. NetNGlyc: The NetNglyc server predicts N-Glycosylation sites in human proteins using artificial neural networks that examine the sequence context of Asn-Xaa-Ser/Thr sequons.

6. ISOGlyP: ISOGlyP is an algorithm to predict potential O-glycosylation sites in proteins.

7. YinOYang 1.2: The YinOYang WWW server produces neural network predictions for O-ß-GlcNAc attachment sites in eukaryotic protein sequences. This server can also use NetPhos, to mark possible phosphorylated sites and hence identify "Yin-Yang" sites (Gupta and Brunak, 2002).

8. Oglyc: Predicting O-glycosylation sites in mammalian proteins by using SVMs (Li, et al., 2006).

9. GPP: This server predicts the location of N-linked and O-linked glycosylation sites from amino acid sequence (Hamby and Hirst, 2008).

10. CKSAAP_OGlySite: This predictor is based on an improved encoding method to predict mucin-type O-linked glycosylation sites in mammalian proteins (Chen, et al., 2008).

11. EnsembleGly (Obsolete): A Server for Prediction of O-, N-, and C-Linked Glycosylation Sites with Ensemble Learning (Caragea, et al., 2007).

12. Glycofragment: The purpose of this site is to the find the main fragments (Y- and B-ions) of oligosaccharides which should occur in MS-spectra (Lohmann and von der Lieth, 2004).

13. GlycoSearchMS: A web tool to support the interpretation of mass spectra of complex carbohydrates (Lohmann and von der Lieth, 2004).

14. GlySeq: A tool to crosslink the PDB and carbohydrate databases and to check the integrity of carbohydrate 3D structures (Lutteke, et al., 2005).

15. GlycoMod: GlycoMod is a tool that can predict the possible oligosaccharide structures that occur on proteins from their experimentally determined masses (Cooper, et al., 2001).


C. GPI Modification

1. big-PI Predictor: Prediction of potential GPI-modification sites in proprotein sequences (Eisenhaber, et al., 1998; Eisenhaber, et al., 1999; Eisenhaber, et al., 1999).

2. big-PI Plant Predictor: Sensitive prediction from sequence- and genome-wide studies for Arabidopsis and rice (Eisenhaber, et al., 2003).

3. DGPI (Obsolete): Detection/prediction of GPI cleavage site (GPI-anchor) in a protein (DGPI).

4. GPI-SOM: Identification of GPI anchor attachment signals by a Kohonen self-organizing map (Fankhauser and Maser, 2005).

5. PredGPI: PredGPI is a prediction system for GPI-anchored proteins. It is based on a support vector machine (SVM) for the discrimination of the anchoring signal, and on a Hidden Markov Model (HMM) for the prediction of the most probable omega-site (Pierleoni, et al., 2008).

6. FragAnchor: FragAnchor is based on the tandem use of a Neural Network predictor and a Hidden Markov Model predictor. The Neural Network is used to select the potential GPI-anchored sequences and the Hidden Markov Model classifies the selected sequences according to four different levels of precision (highly probable, probable, weakly probable, potential false positive). The Hidden Markov Model proposes also up to three possible locations for the anchor/cleavage site (Poisson, et al., 2007).

7. MemType-2L: A web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM (Chou and Shen, 2007).


D. N-terminal myristoylation

1. Myristoylator: Myristoylator predicts N-terminal myristoylation of proteins by neural networks (Bologna, et al., 2004).

2. NMT : The enzyme myristoylCoA:protein N-myristoyltransferase (NMT) recognizes certain characteristics within the N-termini of substrate proteins and finally attaches the lipid moiety to a required N-terminal glycine (Maurer-Stroh, et al., 2002; Maurer-Stroh, et al., 2002).

3. PlantsP: Myristoylator predicts N-terminal myristoylation of proteins by neural networks (Podell and Gribskov, 2004).


E. Prenylation

1. PrePS: PrePS can predict partially overlapping substrate specificities, which is of medical importance in the case of understanding cellular action of FT inhibitors as anticancer and anti-parasite agents (Maurer-Stroh and Eisenhaber, 2005; Maurer-Stroh, et al., 2007).


F. Sumoylation

1. SUMOsp 1.0: SUMOylation Sites Prediction, based on a manually curated dataset, integrating the results of two methods, GPS and MotifX, which were originally designed for phosphorylation site prediction (Xue, et al., 2006).

2. SUMOsp 2.0: SUMOsp 2.0 web server will be more efficient for sumoylation sites prediction (Ren et al., 2009).

3. GPS-SUMO: prediction of sumoylation sites and SUMO-interaction motifs based on the fourth-generation GPS algorithm integrated with the Particle Swarm Optimization (PSO) (Ren et al., 2014).

4. SSP 1.0: SUMOylation Sites Prediction, based on a manually curated dataset, integrating the results of two methods, GPS and MotifX, which were originally designed for phosphorylation site prediction (Zhou, et al., 2005).

5. SUMOpre (Obsolete): A novel method for high accuracy sumoylation site prediction from protein sequences (Xu, et al., 2008).

6. SUMOplot: The SUMOplot Analysis Program predicts and scores sumoylation sites in your protein.

7. PSFS-SUMO: Predicting the protein SUMO modification sites based on Properties Sequential Forward Selection (PSFS) (Liu, et al., 2007).


G. Palmitoylation

1. CSS-Palm 1.0: In CSS-Palm 1.0, we present a novel and comprehensive system CSS-Palm - Palmitoylation Site Prediction with a Clustering and Scoring Strategy (Zhou, et al., 2006).

2. NBA-Palm 1.0: In this work, we present a computational web service of NBA-Palm - Prediction of Palmitoylation Site Implemented in Naive Bayesian Algorithm (Xue, et al., 2006).

3. CSS-Palm 2.0: The CSS-Palm 2.0 could predict out potential palmitoylation sites for ~1,000 proteins (with an average length of ~1000aa) within five minutes (Ren, et al., 2008).

4. CKSAAP-Palm: Prediction of palmitoylation sites using the composition of K-spaced amino acid pairs (Wang, et al., 2009).

5. PPWMs: Improved prediction of palmitoylation sites using PWMs and SVM (Li, et al., 2011).


H. Methylation

1. MeMo: MeMo is the first protein methylation prediction server based on SVM (support vector machine) (Chen, et al., 2006).

2. BPB-PPMS: An in silico online tool for identification of potential methylation sites from protein sequences (Shao, et al., 2009).

3. MASA: Incorporating structural characteristics for identification of protein methylation sites (Shien, et al., 2009).

4. PLMLA: prediction of lysine methylation and lysine acetylation by combining multiple features (Shi, et al., 2012).

5. PMeS: prediction of methylation sites based on enhanced feature encoding scheme (Shi, et al., 2012).


I. Acetylation

1. PAIL: A novel online predictor for protein acetylation sites prediction of PAIL, prediction of Acetylation on Internal Lysines (Li, et al., 2006).

2. NetAcet: A server predicts substrates of N-acetyltransferase A (NatA). The method was trained on yeast data but, as mentioned in the article describing the method, it obtains similar performance values on mammalian substrates acetylated by NatA orthologs (Kiemer, et al., 2005).

3. LysAcet: A server for lysine acetylation prediction using the support vector machine (SVM) method along with coding schema for protein sequence coupling patterns (Li,et al., 2009).

4. ASEB: KAT-specific acetylation site prediction with ASEB method (Wang,et al., 2012 ).

5. SSPKA: In silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features (Li,et al., 2014 ).

6. LAceP: Lysine acetylation prediction with a logistic regression method (Hou,et al., 2014).

7. AcetylAAVs: Proteome-wide analysis of amino acid variations that influence protein lysine acetylation (Suo,et al., 2013).

8. BRABSB-PHKA: Prediction of potential Human Lysine(K) Acetylation(PHKA) sites based on Bi-Relative Binomial Score Bayes (BRBSB) combined with support vector machines (SVMs) (Shao,et al., 2012).

9. EnsemblePail: Lysine acetylation sites prediction using an ensemble of support vector machine classifiers (Xu,et al., 2010).


J. Ubiquitination/ubiquitylation

1. CKSAAP_UbSite: Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs (Chen, et al., 2011).

2. UbiProber: Incorporating key positions and amino acids features to identify general and species-specific ubiquitin conjugation sites (Chen, et al., 2013).

3. UbiPred (Obsolete): A web server to predict ubiquitylation sites accompanied with a prediction score each to help biologists in identifying promising sites for experimental verification (Tung and Ho, 2008).


K. Sulfation

1. GPS-TSP: prediction of tyrosine sulfation sites with GPS algorithm(Pan, et al., 2014).

2. Sulfinator: The Sulfinator predicts tyrosine sulfation sites in protein sequences (Monigatti, et al., 2002).

3. SulfoSite: To computationally predict sulfation sites within given protein sequences (Lee, et al., 2006).

4. PredSulSite: prediction of protein tyrosine sulfation sites with multiple features and analysis(Huang, et al., 2012).


L. Cysteine disulfide

1. COPA: Prediction of reversibly oxidized protein cysteine thiols using protein structure properties (Sanchez, et al., 2008).

2. DISULFIND: A disulfide bonding state and cysteine connectivity prediction server (Ceroni, et al., 2006).

3. MetalDetector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequence (Lippi, et al., 2008).

4. disulfide (Obsolete): Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure (Song, et al., 2007).

5. DiANNA: A web server for disulfide connectivity prediction (Ferre and Clote, 2005; Ferre and Clote, 2005).

6. Disulfide (Obsolete): Improving disulfide connectivity prediction with sequential distance between oxidized cysteines (Tsai, et al., 2005).

7. GDAP (Obsolete): a web tool for genome-wide protein disulfide bond prediction (O'Connor and Yeates, 2004).

8. Cyspred: Predictor of the bonding state of cysteines in proteins (Fariselli, et al., 1999).

9. CysState: Given a protein sequence, this will run a predictor for the bonding state of its cysteines. It will return a prediction concerning all the cysteines at a while, given the observation that in most proteins, cysteines tend to have the same state.

10. Dipro: Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching (Cheng, et al., 2006).


M. Serine Pyruvoylation

1. pyrupred: prediction of post-translational pyruvoyl residue modification sites with Feature Selection based on a Random Forest(Jiang, et al., 2013).


N. Nitration

1. GPS-YNO2: prediction of tyrosine nitration sites based on GPS algorithm (Liu, et al., 2011).


O. S-nitrosylation

1. GPS-SNO: prediction of protein S-nitrosylation sites based on GPS algorithm (Xue, et al., 2010).

2. iSNO-PseAAC: Predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition (Xu, et al., 2013).

3. iSNO-AAPair: A predictor was developed by taking into account the coupling effects for all the pairs formed by the nearest residues and the pairs by the next nearest residues along protein chains. (Xu, et al., 2013).


P. Pupylation

1. GPS-PUP: prediction of pupylation sites in prokaryotic proteins based on GPS algorithm (Liu, et al., 2011).

2. PupPred: prediction of pupylation sites (Chen, et al., 2013).

3. iPUP: prediction of pupylation sites based on the composition of k-spaced amino acid pairs and support vector machines (Tung, et al., 2013).


Q. Glutathionylation

1. SGDB: Predicion of S-glutathionylation sites (Sun, et al., 2013).


<3> Peptide cleavage / Protein sorting / subcellular localization


A. Signal peptide cleavage sites

1. ChloroP: A web server predicts the presence of chloroplast transit peptides (cTP) in protein sequences and the location of potential cTP cleavage sites (Emanuelsson, et al., 1999).

2. PCLR: a method of predicting chloroplast localization of proteins in plant cells. The prediction algorithm was trained using principal component logistic regression with stepwise variable selection (Schein, et al., 2001).

3. LipoP: a web server produces predictions of lipoproteins and discriminates between lipoprotein signal peptides, other signal peptides and n-terminal membrane helices in Gram negative bacteria (Juncker, et al., 2003).

4. SpLip: a program that predicts lipoproteins in spirochetal genomes (Setubal, et al., 2006).

5. PRED-LIPO: Prediction of Lipoprotein and Secretory Signal Peptides in Gram-positive Bacteria with Hidden Markov Models (Bagos, et al., 2008).

6. MITOPROT: Prediction of mitochondrial targeting sequences (Claros and Vincens, 1996; Guda, et al., 2004).

7. PATS: PATS identifies amino acid sequences that are potentially targeted to the apicoplast matrix of Plasmodium falciparum. Note that secondary analysis of candidate sequences is required for confirmation (Waller, et al., 1998; Zuegge, et al., 2001).

8. PlasMit: Prediction of mitochondrial transit peptides in Plasmodium falciparum (Zuegge, et al., 2001).

9. Predotar: A prediction service for identifying putative N-terminal targeting sequences (Small, et al., 2004).

10. PTS1: Prediction of peroxisomal targeting signal 1 containing proteins from amino acid sequence (Neuberger, et al., 2003; Neuberger, et al., 2003).

11. PTS1Prowler: a predictor to screen several additional eukaryotic genomes to revise previously estimated numbers of peroxisomal proteins (Hawkins, et al., 2007).

12. PeroxisomeDB: a database for the peroxisomal proteome, functional genomics and disease (Schluter, et al., 2007).

13. PeroxiP: In silico prediction of the peroxisomal proteome in fungi, plants and animals (Emanuelsson, et al., 2003).

14. SignalP: a web server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks and hidden Markov models (Nielsen, et al., 1999; Bendtsen, et al., 2004; Emanuelsson, et al., 2007; Nielsen, et al., 1997; Menne, et al., 2000).

15. TatP 1.0: a web server predicts the presence and location of Twin-arginine signal peptide cleavage sites in bacteria. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of two artificial neural networks. A postfiltering of the output based on regular expressions is possible (Bendtsen, et al., 2005).

16. ProP 1.0: a web server predicts arginine and lysine propeptide cleavage sites in eukaryotic protein sequences using an ensemble of neural networks. Furin-specific prediction is the default. It is also possible to perform a general proprotein convertase (PC) prediction (Duckert, et al., 2004).

17. TargetP 1.1: a web server predicts the subcellular location of eukaryotic proteins. The location assignment is based on the predicted presence of any of the N-terminal presequences: chloroplast transit peptide (cTP), mitochondrial targeting peptide (mTP) or secretory pathway signal peptide (SP) (Nielsen, et al., 1997; Emanuelsson, et al., 2000).

18. MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs, and amino acid composition (Hoglund, et al., 2006).

19. NetCorona 1.0: a web server predicts coronavirus 3C-like proteinase (or protease) cleavage sites using artificial neural networks on amino acid sequences. Every potential site is scored and a list is compiled in addition to a graphical representation. Refer to publication for more detailed information and performance values (Kiemer, et al., 2004).

20. NetPicoRNA 1.0: a web server produces neural network predictions of cleavage sites of picornaviral proteases. The method is described in detail in the reference article mentioned below (see CITATIONS at the bottom of this page) (Blom, et al., 1996).

21. SecretomeP 2.0: a web server produces ab initio predictions of non-classical i.e. not signal peptide triggered protein secretion. The method queries a large number of other feature prediction servers to obtain information on various post-translational and localizational aspects of the protein, which are integrated into the final secretion prediction (Bendtsen, et al., 2004; Bendtsen, et al., 2005).

22. NetNES 1.1: a web server predicts leucine-rich nuclear export signals (NES) in eukaryotic proteins using a combination of neural networks and hidden Markov models (la Cour, et al., 2004).

23. PeptideCutter: a web server predicts potential cleavage sites cleaved by proteases or chemicals in a given protein sequence. PeptideCutter returns the query sequence with the possible cleavage sites mapped on it and /or a table of cleavage site positions (Wilkins, et al., 1999).

24. SPOCTOPUS: a combined predictor of signal peptides and membrane protein topology (Viklund, et al., 2008).

25. HECTAR: a method to predict subcellular targeting in heterokonts (Gschloessl, et al., 2008).

26. Signal-BLAST: High-performance signal peptide prediction based on sequence alignment techniques (Frank and Sippl, 2008).

27. PredSL: a tool for the N-terminal sequence-based prediction of protein subcellular localization (Petsalaki, et al., 2006).

28. iPSORT: Extensive feature detection of N-terminal protein sorting signals (Bannai, et al., 2002).

29. Phobius in Sweden; Phobius in UK: A combined transmembrane topology and signal peptide predictor (Kall, et al., 2004; Kall, et al., 2007).

30. PrediSi: prediction of signal peptides and their cleavage positions (Hiller, et al., 2004).

31. RPSP (Obsolete): Prediction of signal peptides in protein sequences by neural networks (Plewczynski, et al., 2008).

32. Signal-3L: A 3-layer approach for predicting signal peptides (Shen and Chou, 2007).

33. Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides (Chou and Shen, 2007).

34. HIVcleave: Predicting HIV protease cleavage sites in proteins (Chou, 1996; Shen and Chou, 2008).

35. TISs-ST (Obsolete): a web server to evaluate polymorphic translation initiation sites and their reflections on the secretory targets (Vicentini and Menossi, 2007).


B. Caspase substrates cleavage

1. Cascleave: A novel tool developed using Java program for the high-throughput in silico identification of substrate cleavage sites for various caspases from the amino acid sequences of the substrates (Wang, et al., 2014).

2. CASVM (Obsolete): Server for SVM Prediction of Caspase Substrates Cleavage Sites (Wee, et al., 2006; Wee, et al., 2007).

3. GraBCas: A bioinformatics tool for score-based prediction of Caspase- and Granzyme B-cleavage sites in protein sequences (Backes, et al., 2005).

4. PEPS (Obsolete): is a tool for the prediction of caspase cleavage sites. It is based on the consensus motifs for caspase substrates. Software can be obtained upon request from author (Lohmuller, et al., 2003).

5. CaSPredictor (Obsolete): is a software tool for the prediction of caspase cleavage sites which utilizes position-specific scoring matrices together with indices based on neighboring PEST sequences. Software can be obtained upon request from author (Garay-Malpartida, et al., 2005).


C. Calpain substrates cleavage

1. GPS-CCD: Software for prediction of calpain cleavage sites (Liu, et al., 2011).

2. LabCaS: A bioinformatics tool for score-based prediction of Caspase- and Granzyme B-cleavage sites in protein sequences (Backes, et al., 2005).


D. Proteasomal cleavage

1. NetChop 3.1: a web server produces neural network predictions for cleavage sites of the human proteasome (Nielsen, et al., 2005).

2. PepCleave: Precise score for the prediction of peptides cleaved by the proteasome (Ginodi, et al., 2008).

3. MAPPP: MHC class I antigenic peptide processing prediction (Hakenberg, et al., 2003).

4. Paproc: a prediction tool for cleavages by human and yeast 20S proteasomes, based on experimental cleavage data (Nussbaum, et al., 2001).

5. Pcleavage: an SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic sequences (Vicentini and Menossi, 2007).