1-4 & 2-3 Correlation model

The pattern of residues are important for the ß-turn prediction. This is equivalent to saying that the correlation between different amino acids play an important role in the ß-turn prediction. In view of this, a model called 1-4 & 2-3 correlation model was proposed to predict ß-turns in proteins. When a tetrapeptide folds into a ß-turn, the interaction between its 1st and 4th residues and the interaction between its 2nd and 3rd residues becomes remarkable. Particularly, a H-bond may form between the backbone C=O of the 1st residue and the backbone NH of the 4th residue. Therefore, the 1-4 & 2-3 correlation model reflects in some sense the essence of the intrachain interaction of a ß-turn.

The ß-turn structure involves four consecutive residues, and hence can be generally expressed by

X1X2X3X4

where X1 represents the amino acid at the 1st position, X2 represents the amino acid at the 2nd position, and so forth. Since there are 20 different amino acids, the number of possible tetrapeptides would be 20*20*20*20=1.6*105. Tetrapeptides can be classified into two categories: the ß-turn set denoted by S(t), and non ß-turns set denoted by S(nt). An attributive function is used to describe the relevancy of a tetrapeptide to the ß-turn set S(t).

An attributive function Ø used to define intrachain interaction between the 1st and 4th residues and between the 2nd and 3rd residues is of the following form:

Ø(X1X2X3X4)=gP1(X1)P2(X2)P3(X3|X2)P4(X4|X1)

where g=104 is an amplifying factor used for making the data in a range easier to handle. P1(X1) is the probability of amino acid X1 occurring at 1st position, P2(X2) is the probability of amino acid X2 occurring at second position, P3(X3|X2) is the probability of amino acid X3 occurring at the 3rd position given that X2 has occurred at the 2nd position and P4(X4|X1) is the probability of amino acid X4 occurring at the 4th position given that X1 has occurred at the 1st position. All these probabilities can be derived from a training set of data consisting of tetrapeptides known to be folded into a ß-turn in proteins.

The larger the Ø of a tetrapeptide, the closer its attribute to the ß-turn set S(t), and hence the higher its propensity to form a ß-turn in proteins. For a given tetrapeptide, when the attributive parameter Ø is greater than the threshold value T, it is assumed to be able to form a ß-turn; otherwise, it is not. Thus the criterion for predicting the propensity to form a ß-turn for a given tetrapeptide in proteins can be defined by the value of D, as formulated as follows:

a tetrapeptide will form a ß-turn, if its D>=0,

a tetrapeptide will not form a ß-turn , otherwise.

where the discriminant function D is given by

D(X1X2X3X4)=Ø(X1X2X3X4)-T

where the value of the threshold T can be determined via an optimization procedure.

Table3 : The probabilities and conditional probabilities derived from training set S(t)

1. The probabilities P1(X1)

A C D E F G H I K L M N P Q R S T V W Y
0.06 0.04 0.10 0.04 0.03 0.11 0.04 0.03 0.04 0.05 0.01 0.09 0.05 0.03 0.02 0.11 0.05 0.05 0.01 0.04

2 The probabilities P2(X2)

A C D E F G H I K L M N P Q R S T V W Y
0.09 0.02 0.07 0.04 0.02 0.09 0.01 0.02 0.10 0.02 0.01 0.05 0.13 0.04 0.04 0.11 0.07 0.04 0.00 0.03

3 The conditional probabilities P3(X3|X2)

A C D E F G H I K L M N P Q R S T V W Y
A 0.08 0.03 0.18 0.00 0.00 0.28 0.03 0.03 0.08 0.00 0.00 0.10 0.00 0.00 0.03 0.10 0.03 0.03 0.03 0.00
C 014 0.00 0.00 0.00 0.00 0.14 0.00 0.00 0.00 0.14 0.00 0.29 0.00 0.00 0.14 0.00 0.00 0.14 0.00 0.00
D 0.09 0.00 0.06 0.18 0.00 0.15 0.00 0.06 0.06 0.00 0.00 0.15 0.00 0.03 0.06 0.12 0.03 0.03 0.00 0.00
E 0.05 0.00 0.15 0.05 0.00 0.05 0.05 0.00 0.05 0.10 0.00 0.05 0.05 0.00 0.10 0.10 0.10 0.05 0.05 0.00
F 0.14 0.00 0.00 0.14 0.00 0.43 0.00 0.00 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.14
G 0.05 0.03 0.20 0.03 0.05 0.08 0.03 0.00 0.05 0.03 0.03 0.03 0.03 0.05 0.05 0.17 0.05 0.03 0.03 0.03
H 0.00 0.33 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.33
I 0.00 0.00 0.22 0.00 0.00 0.11 0.11 0.11 0.11 0.11 0.00 0.11 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00
K 0.02 0.00 0.16 0.02 0.00 0.27 0.09 0.02 0.07 0.02 0.00 0.09 0.04 0.00 0.02 0.00 0.04 0.02 0.00 0.11
L 0.00 0.09 0.00 0.00 0.09 0.09 0.00 0.00 0.00 0.00 0.00 0.18 0.18 0.00 0.00 0.00 0.09 0.00 0.00 0.27
M 0.00 0.00 0.17 0.00 0.00 0.17 0.17 0.00 0.17 0.00 0.00 0.17 0.00 0.00 0.00 0.17 0.00 0.00 0.00 0.00
N 0.05 0.00 0.00 0.10 0.10 0.29 0.00 0.00 0.00 0.00 0.00 0.10 0.05 0.00 0.05 0.10 0.14 0.00 0.00 0.05
P 0.02 0.03 0.12 0.10 0.00 0.24 0.02 0.00 0.05 0.02 0.00 0.10 0.02 0.02 0.03 0.12 0.02 0.03 0.02 0.03
Q 0.00 0.06 0.19 0.06 0.06 0.25 0.00 0.00 0.06 0.06 0.00 0.12 0.00 0.00 0.00 0.00 0.12 0.00 0.00 0.00
R 0.00 0.00 0.06 0.17 0.06 0.11 0.00 0.00 0.00 0.22 0.00 0.22 0.00 0.00 0.06 0.00 0.00 0.00 0.06 0.06
S 0.02 0.04 0.10 0.02 0.08 0.16 0.02 0.02 0.06 0.02 0.00 0.08 0.02 0.02 0.04 0.10 0.10 0.04 0.02 0.06
T 0.03 0.00 0.15 0.00 0.00 0.12 0.03 0.00 0.15 0.00 0.00 0.09 0.00 0.06 0.06 0.18 0.03 0.00 0.00 0.09
V 0.11 0.11 0.16 0.00 0.00 0.05 0.05 0.00 0.00 0.05 0.00 0.11 0.00 0.00 0.05 0.26 0.05 0.00 0.00 0.00
W 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
Y 0.07 0.00 0.07 0.07 0.00 0.21 0.00 0.00 0.07 0.00 0.00 0.07 0.14 0.00 0.07 0.14 0.00 0.00 0.00 0.07

4. The conditional probabilities P4(X4|X1)

A C D E F G H I K L M N P Q R S T V W Y
A 0.04 0.07 0.07 0.07 0.04 0.11 0.04 0.04 0.04 0.00 0.04 0.07 0.00 0.00 0.04 0.19 0.04 0.11 0.00 0.00
C 0.00 0.12 0.12 0.06 0.06 0.12 0.06 0.06 0.00 0.06 0.06 0.06 0.00 0.00 0.00 0.00 0.06 0.06 0.06 0.00
D 0.05 0.00 0.07 0.02 0.02 0.23 0.02 0.02 0.09 0.05 0.00 0.05 0.05 0.02 0.07 0.14 0.07 0.02 0.00 0.02
E 0.06 0.00 0.00 0.18 0.06 0.24 0.00 0.00 0.00 0.06 0.00 0.06 0.00 0.06 0.00 0.06 0.06 0.06 0.00 0.12
F 0.07 0.00 0.00 0.00 0.36 0.07 0.00 0.14 0.14 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.07 0.00 0.00
G 0.06 0.00 0.02 0.02 0.04 0.18 0.00 0.06 0.08 0.06 0.02 0.02 0.04 0.04 0.02 0.08 0.04 0.10 0.02 0.08
H 0.00 0.00 0.21 0.05 0.00 0.11 0.00 0.00 0.00 0.16 0.05 0.00 0.00 0.00 0.05 0.11 0.00 0.11 0.11 0.05
I 0.00 0.07 0.27 0.00 0.00 0.07 0.00 0.07 0.20 0.07 0.00 0.07 0.00 0.00 0.00 0.07 0.07 0.00 0.00 0.07
K 0.17 0.06 0.00 0.11 0.06 0.11 0.06 0.00 0.00 0.06 0.11 0.11 0.00 0.11 0.00 0.00 0.00 0.00 0.00 0.06
L 0.09 0.00 0.04 0.09 0.00 0.04 0.04 0.04 0.13 0.13 0.00 0.00 0.04 0.04 0.09 0.00 0.09 0.04 0.00 0.09
M 0.00 0.00 0.00 0.17 0.00 0.33 0.00 0.00 0.17 0.00 0.00 0.00 0.17 0.00 0.00 0.00 0.00 0.00 0.17 0.00
N 0.10 0.03 0.08 0.08 0.00 0.15 0.00 0.05 0.03 0.00 0.00 0.05 0.05 0.05 0.00 0.05 0.08 0.05 0.10 0.05
P 0.09 0.00 0.00 0.04 0.04 0.17 0.00 0.00 0.04 0.04 0.00 0.00 0.00 0.17 0.04 0.04 0.00 0.13 0.09 0.09
Q 0.07 0.07 0.00 0.00 0.00 0.40 0.00 0.00 0.07 0.07 0.00 0.07 0.07 0.00 0.00 0.13 0.00 0.07 0.00 0.00
R 0.20 0.00 0.00 0.00 0.00 0.40 0.00 0.00 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.00 0.10 0.10
S 0.06 0.06 0.08 0.02 0.00 0.14 0.00 0.04 0.08 0.02 0.00 0.04 0.00 0.04 0.00 0.10 0.12 0.04 0.04 0.10
T 0.00 0.00 0.04 0.00 0.04 0.16 0.04 0.04 0.12 0.12 0.00 0.08 0.04 0.00 0.04 0.12 0.04 0.04 0.04 0.04
V 0.00 0.00 0.05 0.05 0.00 0.14 0.05 0.00 0.14 0.09 0.00 0.05 0.14 0.05 0.00 0.18 0.00 0.05 0.00 0.05
W 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.17 0.00 0.33 0.00 0.00 0.17 0.00 0.00 0.00
Y 0.11 0.06 0.00 0.00 0.00 0.11 0.00 0.06 0.06 0.06 0.00 0.11 0.00 0.06 0.06 0.06 0.11 0.06 0.00 0.11

Back to submission form