An Evaluation of beta-turn prediction methods

DATASET

PREDICTION METHODS

ORIGINAL PARAMETERS AND THRESHOLDS

THRESHOLD DEPENDENT MEASURES

RESULTS

REFERENCES



Beta-turn is an important element of protein structure. In past three decades numerous beta-turn prediction methods have been developed based on various strategies. At present, it is difficult to say which method is better. This is because these methods were developed on different sets of data. Thus, it is important to evaluate the performance of beta-turn prediction methods.

We evaluate the performance of six methods of beta-turn prediction. Original parameters available in the literature are used to test the methods on a set of 426 non-homologus protein chains. It is observed that the performance of neural network based method BTPRED is significantly better than the statistical methods. We also train, test and evaluate the performance of all methods except BTPRED and GORBTURN, on a new data set using sevenfold cross-validation technique. There is a siginficant improvement in performance of all methods when secondary structure information is incorporated. In this study, both threshold dependent and independent (ROC) measures are used for evaluation.





DATASET

The representative protein chains are selected so that no two chains have more than 25% sequence identity. Protein chains determined by X-ray crystallography at 2.0 resolution or better and containing atleast one beta-turn are used in the analysis. Following are the pdb codes of 426 protein chains used for analysis. It is the same data set as described by K.Guruprasad.

_____________________________________________________________________________________________________________________________________________________________

119l 153l 1aliA 1a1x 1a28B 1a2pA 1a2yA 1a2zA 1a2yA 1a2zA 1a34A 1a62 1a68 1a6q 1a7tA 1a8e 1a8i 1a9s 1aac 1aba
1ad2 1adoA 1af7 1afwA 1agjA 1agqD 1ah7 1aho 1aj2 1ajj 1ajsA 1ak0 1ak1 1ako 1akz 1al3 1alo 1alu 1alvA 1aly
1amm 1amp 1amuA 1amx 1anf 1aocA 1aohA 1aol 1aop 1aoqA 1aozA 1apyB 1aq0A 1aq6A 1aqb 1aqzB 1arb 1arv 1at0 1at1A
1atzB 1avmA 1awd 1awsA 1axn 1ay1 1azo 1ba1 1bbpA 1bdmB 1bdo 1bebA 1benB 1bfd 1bfg 1bftA 1bgc 1bgp 1bkf 1bkrA
1brt 1btkB 1btn 1bv1 1byb 1c52 1cbn 1cem 1ceo 1cewI 1cex 1cfb 1chd 1chmA 1ckaA 1c1c 1cnv 1cpcB 1cpo 1cseE
1cseI 1csh 1csn 1ctj 1cydA 1dad 1dkzA 1dokA 1dorA 1dosA 1dun 1dupA 1dxy 1eca 1ecl 1ecpA 1ede 1edg 1edmB 1edt
1erv 1ezm 1fdr 1fds 1fit 1fleI 1fmtB 1fna 1fua 1furA 1fus 1fvkA 1fwcA 1g3p 1gai 1garA 1gd1O 1gdoA 1gifA 1gky
1gnd 1gotB 1gotG 1gsa 1guqA 1gvp 1ha1 1havA 1hcrA 1hfc 1hgxA 1hoe 1hsbA 1htrP 1hxn 1iakA 1idaA 1idk 1ido 1ifc
1igd 1iibA 1iso 1isuA 1ixh 1jdw 1jer 1jetA 1jfrA 1jpc 1kid 1knb 1kpf 1kptA 1kuh 1kveA 1kveB 1kvu 1kwaB 1lam
1latB 1lbu 1lcl 1lis 1lit 1lki 1lkkA 1lmb3 1lml 1lt5D 1ltsA 1lucB 1mai 1mbd 1mkaA 1mldA 1mml 1molA 1mpgA 1mrj
1mrp 1msc 1msi 1msk 1mtyB 1mtyD 1mtyG 1mucA 1mugA 1mwe 1mzm 1nar 1nbaB 1nbcA 1nciB 1neu 1nfn 1nif 1nls 1nox
1np1A 1npk 1nulB 1nwpA 1nxb 1ois 1onc 1onrA 1opd 1opy 1orc 1ospO 1ovaA 1oyc 1pcfA 1pda 1pdo 1pgs 1phe 1phnA
1php 1pii 1ple 1pmi 1pne 1pnkB 1poa 1poc 1pot 1ppn 1ppt 1prxB 1ptq 1pty 1pud 1qba 1qnf 1r69 1ra9 1rcf
1rec 1regY 1reqD 1rgeA 1rhs 1rie 1rmg 1rro 1rss 1rsy 1rvaA 1ryp1 1ryp2 1rypF 1rypI 1rypJ 1sbp 1sfp 1sftB 1sgpI
1skz 1sltA 1sluA 1smd 1spuA 1sra 1stmA 1svb 1svpA 1tadC 1tca 1tfe 1thv 1thx 1tib 1tif 1tml 1trkA 1tsp 1tvxA
1tys 1uae 1ubi 1uch 1unkA 1urnA 1uxy 1v39 1vcaA 1vcc 1vhh 1vid 1vif 1vin 1vjs 1vls 1vpsA 1vsd 1vwlB 1wab
1wba 1wdcA 1wer 1whi 1who 1whtB 1wpoB 1xgsA 1xikA 1xjo 1xnb 1xsoA 1xyzA 1yaiC 1yasA 1ycc 1yer 1ytbA 1yveI 1zin
256bA 2a0b 2abk 2acy 2arcA 2ayh 2baa 2bbkH 2bbkL 2bopA 2cba 2ccyA 2chsA 2ctc 2cyp 2dri 2end 2eng 2erl 2fdn
2fha 2fivA 2gdm 2hbg 2hft 2hmzA 2hpdA 2hts 2ilb 2ilk 2kinA 2kinB 2lbd 2mcm 2msbB 2nacA 2pgd 2phy 2pia 2pii
2plc 2por 2pspA 2pth 2rn2 2rspB 2sak 2scpA 2sicI 2sil 2sn3 2sns 2tgi 2tysA 2vhbB 2wea 3b5c 3chy 3cla 3cox
3cyr 3daaA 3grs 3lzt 3nul 3pcgM 3pte 3sdhA 3seb 3tss 3vub 4bcl 4mt2 4pgaA 4xis 5csmA 5hpgA 5icb 5p2I 5pti
5ptp 6cel 6gsvA 7ahlA 7rsa 8abp 8rucI 8rxnA

___________________________________________________________________________________________________________________________________________________________________





PREDICTION METHODS EVALUATED

Chou-Fasman algorithm
Thornton's algorithm
1-4 & 2-3 Correlation Model
Sequence Coupled Model
GORBTURN
BTPRED





ORIGINAL PARAMETERS AND THRESHOLDS

Original conformational parameters and positional frequencies for helix,ß-sheet and ß-turn residues.

Original Threshold = 0.000075

Original conformational parameters and positional frequencies:

Type I beta-turns

Type II beta-turns

Original Threshold for Type I turn = 4.0

Original Threshold for Type II turn = 2.7

Original probabilities and conditional probabilities:

P1(X1)

P2(X2)

P3(X3|X2)

P4(X4|X1)

Original Threshold = 0.1875

Original probabilities and conditional probabilities for turns:

Pi+(Ri)

Pi+1+(Ri+1|Ri)

Pi+2+(Ri+2|Ri+1 )

P+i+3 (Ri+3|Ri+2)

Original probabilitis and conditional probabilities for non-turns:

P-i(Ri)

P-i+1 (Ri+1 |Ri)

P-i+2(Ri+2|Ri+1)

P-i+3(Ri+3|Ri+2)

Original Threshold = 0

Original positional frequencies for Type I turns

Original positional frequencies for Type II turns



New Threshold : A new threshold value is chosen at which the sensitivity and specificity are nearly equal.





THRESHOLD DEPENDENT MEASURES

Following four different parameters were used to measure the performance of prediction methods:

Qtotal, the percentage of correctly predicted residues, defined as

MCC, the Matthews Correlation Coefficient, defined as

Qpredicted, probability of correct prediction, defined as

Qobserved, percent coverage, defined as





RESULTS

Table 1 Results of 7-fold cross-validation at original thresholds.

Methods

Qtotal

Qpredicted

Qobserved

MCC

Chou-Fasman algorithm 74.9 (65.2) 46.1 (37.6) 16.9 (63.5) 0.16 (0.26)
Thornton's algorithm 74.5 (68.0) 44.0 (38.6) 16.7 (52.4) 0.15 (0.23)
1-4 & 2-3 Correlation Model 63.2 (59.1) 35.3 (32.4) 60.4 (61.9) 0.21 (0.17)
Sequence Coupled Model 50.6 (53.3) 31.7 (32.4) 88.4 (72.8) 0.23 (0.17)

*values in brackets are calculated by using original parameters of methods.



Following are the results of cross-validation in terms of Qtotal, Qpredicted, Qobserved and MCC.

     

(a) Qtotal

    

(b) Qpredicted

   

(c) Qobserved

   

(d) MCC



Effect of different secondary structure methods on performance of BTPRED

   

Table 2 ROC values for different methods without cross-validation.

Methods

ROC

(without cross validation)

Chou-Fasman

0.69

Thornton's algorithm

0.66

1-4 & 2-3 Correlation Model

0.64

Sequence coupled model

0.64

Table 3 ROC values for different methods with cross validation.

Methods

ROC

(with cross validation)

Chou-Fasman

0.59

Thornton's algorithm

0.57

1-4 & 2-3 Correlation model

0.67

Sequence coupled model

0.70



ROC with and without cross-validation

  





REFERENCES

Chou,P.Y. and Fasman,G.D. (1974) Conformational parameters for amino acids in helical, beta-sheet and random coil regions calculated from proteins. Biochemistry, 13, 211-222.

Chou,P.Y. and Fasman,G.D. (1979) Prediction of beta-turns. Biophys. J., 26, 367-384. [Abstract]

Chou,K.C. (1997) Prediction of beta-turns. J. Pept. Res., 49, 120-144. [Abstract]

Chou,K.C. and Blinn,J.R. (1997) Classification and prediction of beta-turn types. J. Protein Chem., 16, 575-595. [Abstract]

Chou,K.C. (2000) Prediction of tight turns and their types in proteins. Analytical Biochem., 286, 1-16. [Abstract]

Cohen,F.E., Abarbanel,R.M., Kuntz,I.D. and Fletterick,R.J. (1986) Turn prediction in proteins using a pattern-matching approach. Biochemistry, 25, 266-275. [Abstract]

Deleo,J.M. (1993) Preceedings of the Second International Symposium on Uncertainity Modelling and Analysis, pp. 318-325. IEEE, Computer Society Press, College Park, MD.

Garnier,J., Osguthorpe,D.J. and Robson,B. (1978) Analysis and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120, 97-120. [Abstract]

Gibrat,J.-F., Garnier,J. and Robson,B. (1987) J. Mol. Biol., 198, 425-433. [Abstract]

Guruprasad,K. and Rajkumar,S. (2000) Beta- and gamma-turns in proteins revisited: A new set of amino acid dependent positional preferences and potential. J. Biosci. 25(2), 143-156. [Abstract]

Hutchinson,G. and Thornton,J.M. (1996) PROMOTIF-a program to identify and analyze structural motifs in proteins. Protein Sci., 5, 212-220.[Abstract]

Hutchinson, G. & Thornton,J.M. (1994) Revised set of potentials for beta-turn formation in proteins. Protein Science, 3, 2207-2216. [Abstract]

Jones,D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292(2), 195-202. [Abstract]

Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577-2637. [Abstract]

King,R.D. and Sternberg,M.J.E. (1996) Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci., 5(11), 2298-2310. [Abstract]

Matthews,B.W. (1975) Biochim. Biophys. Acta, 405, 442-451.

McGregor,M.J., Flores,T.P. and Sternberg,M.J. (1989) Prediction of beta-turns in proteins using neural networks. Protein Eng., 2, 521-526. [Abstract]

McGuffin,L.J., Bryson,K. and Jones,D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics, 16(4), 404-405. [Abstract]

Quali,M. and King,R.D. (2000) Cascaded multiple classifiers for secondary structure prediction. Protein Sci., 9, 1162-1176. [Abstract]

Rost,B. and Sander,C. (1993) Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. U.S.A., 90, 7558-7562. [Abstract]

Rost,B. and Sander,C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 19, 55-72. [Abstract]

Shepherd,A.J., Gorse,D. and Thornton,J.M. (1999) Prediction of the location and type of beta-tuurns in proteins using neural networks. Protein Sci., 8, 1045-1055. [Abstract]

Wilmot,C.M. and Thornton,J.M. (1988) Analysis and prediction of the different types of beta-turns in proteins. J.Mol.Biol., 203, 221-232. [Abstract]

Wilmot,C.M. and Thornton,J.M. (1990) Beta-turns and their distortion: a proposed new nomenclature. Protein Engg., 3(6), 479-493. [Abstract]

Zhang,C.T. and Chou,K.C. (1997) Prediction of beta-turns in proteins by 1-4 & 2-3 Correlation Model. Biopolymers, 41, 673-702. [Abstract]