Protein backbone angle restraints from searching a database for chemical shift and sequence homology
Gabriel Cornilescu, Frank Delaglio and Ad Bax
Laboratory of Chemical Physics
National Institute of Diabetes and Digestive
and Kidney Diseases
National Institutes of Health
Bethesda, Maryland 20892-0520
A Reference Guide to this software can be found at: http://spin.niddk.nih.gov/bax/software/TALOS
The TALOS software is part of the NMRPipe package, download instructions can be found at:
http://spin.niddk.nih.gov/NMRPipe
Contact:
delaglio@nih.gov
Although most of the earlier reports on
the relation between chemical shift and protein structure focus on 1Ha
and 1HN, with the advent of heteronuclear
isotopic enrichment additional chemical shifts have become accessible and
offer the potential to make the relation between chemical shift and structure
more quantitative. The secondary 13Ca
and
13Cb
chemical shifts of a given residue were found to correlate closely with
its phi and psi torsion angles (Ando et al., 1984; Saito, 1986; Spera and
Bax, 1991), and thereby also with secondary structure (Wishart et al.,
1991). Methods have been developed to obtain backbone torsion angle restraints
and secondary structure information from either 1Ha
and 13Ca (Luginbühl
et al., 1995), or 13Ca,
13Cb,
13C',
and 1Ha (Wishart and
Sykes, 1994). The empirical correlation between phi and psi backbone torsion
angles and the
13Ca
and 13Cb chemical
shifts also was found useful for identification of N-terminal helix-capping
boxes (Gronenborn and Clore, 1994). This same group also introduced an
effective method for incorporating the empirical secondary
13Ca
and 13Cb chemical
shift profiles into the structure calculation protocol (Kuszewski et al.,
1995, Celda et al., 1995). Ab initio calculations (de Dios and Oldfield,
1993) confirm that the backbone phi and psi torsion angles strongly affect
13Ca
and 13Cb shielding,
and the use of experimental 13Ca,
13Cb
and 1Ha shifts, in
conjunction with residue-specific chemical shift surfaces from
ab initio
methods, has been proposed as a tool for structure refinement (Pearson
et al. 1995). Beger and Bolton (1997) proposed an approach to obtain the
most probable phi and psi angles from correlation maps between backbone
chemical shifts of
13Ca,
13Cb,
1Ha
,
1HN
and 15N of a given residue and its backbone
torsion angles. They also showed that this information considerably improves
structural quality when used in cases where only a very small number of
NOE restraints is available.
The similarity in secondary chemical shifts
in homologous proteins also has been well recognized (Redfield and Robertson,
1991). Wishart et al. (1997) developed an elegant approach to utilize this
similarity during the resonance assignment process. However, a minimum
of ca 30% sequence identity is quoted as the requirement for making
this procedure reliable.
Here, we describe a hybrid approach which
utilizes both sequence and chemical shift homology to predict the most
likely backbone angles for a given residue. The idea is based on the notion
that if a string of adjacent amino acids shows high similarity in secondary
chemical shifts with a string of amino acids in a database, the central
residues in the two strings are likely to have similar backbone torsion
angles. In particular, when qualitative similarity in the residue types
of the two strings is used as an additional criterion, the approach becomes
remarkably robust. In essence, this is a generalization of the idea that
helix-capping boxes can be identified best by combined use of their characteristic
patterns of chemical shifts and the residue types involved (Gronenborn
& Clore, 1994).
When using collections of chemical shifts
of proteins reported by different groups, it is critical to ensure that
the same chemical shift referencing convention is used for all these proteins.
This is particularly important for 13C
and 15N, where a wide variety of direct
and indirect referencing methods have been used. Rather than relying on
the information supplied with the deposited chemical shift data, we evaluate
the need for applying a correction to these shifts by calculating how much,
on average, the secondary shifts (calculated by subtracting the random
coil shifts of Spera and Bax, 1991) deviate from the corresponding secondary
chemical shifts predicted by the (phi,psi)-surfaces of Spera and Bax. These
averages are conveniently calculated with a routine added to the X-PLOR
program (Brünger, 1993) by Kuszewski et al. (1995), and intended for
use of the secondary 13Ca
and 13Cb
shifts during
structure calculation. We apply a chemical shift correction only if the
average deviation for a given protein exceeds by more than a factor of
three the expected random variation in this average [i.e., the standard
error of ca 1 ppm (Spera and Bax, 1991) divided by the square root
of the number of shifts used]. This manner of correcting the deposited
chemical shifts ensures that all secondary shifts are defined in the same
manner, and corresponds to subtraction of the random coil 13Ca
and 13Cb shifts of
Spera and Bax (1991) and 13C' (Wishart
et al., 1995a) from experimentally determined shifts relative to internal
trimethylsilyl propionate (TSP). Note that TSP resonates upfield from the
IUPAC-recommended standard (Markley et al., 1998), dimethylsilapentane-5-sulfonic
acid or DSS, by an insignificant amount (0.12 ppm at pH 7) (Wishart et
al., 1995b). The same correction procedure must be used for all other new
proteins added to the database. Only a small fraction of the proteins required
the above correction procedure. For 15N,
the chemical shift reference standard is liquid ammonia at 25 degrees C,
and the need for application of a correction was evaluated by calculating
the average 15N chemical shifts for all
non-Gly, non-Ser, non-Thr residues in alpha-helical and beta-strand regions
of the protein and comparing them with the database averages (119.47 ppm
for a-helices and 122.38 ppm for b-strands). Whenever the average of the
a-helix and b-strand 15N chemical shift
deviations (weighted according to the number of residues used for each
type of secondary structure) is larger than 1 ppm, a correction to the
chemical shifts needs to be applied. Alphalytic protease was the
only protein for which such 15N chemical
shift adjustment (by -2.26 ppm) needed to be used. For 1H,
where historically chemical shift referencing has been much less of a problem,
no such corrections were applied.
To investigate whether the 13C'
chemical shift is strongly influenced by the hydrogen bond length, hydrogens
were added to the 1.1 Å crystal structure of basic pancreatic trypsin
inhibitor (Wlodawer et al., 1984) with the program X-PLOR (Brünger,
1993) . For the 24 carbonyls involved in stable backbone-backbone hydrogen
bonds, no significant correlation was found between the lengths of the
backbone-backbone hydrogen bonds, calculated from this structure, and the
corresponding 13C' secondary shifts. This
result suggests that the 13C' secondary
shift is primarily a function of the backbone geometry, in agreement with
its previously reported correlation with secondary structure (Kricheldorf
and Muller, 1983; Wishart et al., 1991). Therefore, we decided to include
the 13C' shift information in the evaluation,
even while for several proteins no
13C'
shifts have been reported in the database.
Although the 15N
chemical shift is known to be influenced by hydrogen bonding (de Dios et
al., 1993), it is also influenced by backbone geometry and therefore is
included as an input parameter in the torsion angle prediction procedure.
However, as discussed below, optimization of the torsion angle prediction
program results in a relatively low weighting factor for this chemical
shift.
TALOS reads the experimental
protein chemical shift tables and converts them to secondary chemical shifts
before entering them in the database. In its current implementation, TALOS
evaluates the similarity in amino acid sequence and secondary shifts for
a string of three sequential amino acids relative to all triplets of sequential
residues contained in the database. Although we expect that further improvement
in performance might be attainable for string lengths longer than three,
the number of residues in the database is presently too small to yield
a sufficient sampling for such longer strings. For each query triplet of
consecutive residues, the similarity to a triplet with center-residue j
in the database is evaluated by computing a similarity factor, S
(i,j), given by:
S (i,j) = Sn=-1
[k0n DResType2 + k1n(DdCai+n-DdCaj+n)2
+ k2n(DdNi+n-DdNj+n)2
+
k3n(DdCbi+n-DdCbj+n)2
+
k4n(DdC¢i+n-DdC¢j+n)2
+ k5n(DdHai+n-DdHaj+n)2]
(1)
and the value of S(i,j)
is evaluated for all triplets j in the database. Dd denotes the
secondary shifts of the 13Ca,
13Cb,
13C',
1Ha
and 15N nuclei. For Gly residues, 1Ha
shifts are calculated as the average of 1Ha2
and 1Ha3. Values for
the weighting factors, k0n through
k5n
are optimized as described below and are given in Table 2; the residue-type
similarity matrix ascribes a number to how similar two types of amino acids
are and this 20 ¥ 20 matrix is shown in Table 3. The composition of
this similarity matrix is largely based on empirical knowledge that, for
example, Gly frequently has a positive phi angle, Pro has a very restricted
range of phi angles, and Cb-branched residues are
frequently found in beta-sheets. There has been some empirical adjustment
of the similarity matrix during the process of optimizing the performance
of the TALOS program, but results were not found to be particularly sensitive
to small changes (by ±1) in the Table 3 matrix elements. Using the
empirical k values of Table 2, and DResType
of Table 3, S(i,j) values typically range from 5 to 600.
For all database triplets, j, that
yield a S(i,j) value lower than an adjustable threshold (typically
~150), TALOS reports the corresponding X-ray crystal structure phi and
psi angles of residue j, together with the S(i,j)
value. The threshold is set sufficiently large to obtain a minimum of at
least 10 matches for each residue i.
Optimization of the 15 chemical shift weighting
factors made use of a scheme which finds all triplets of residues in the
database for which the central residue has phi/psi angles within 15 degrees
of those of a query residue. We then calculate the average and the standard
deviation of the secondary chemical shifts for each of the 15 types of
chemical shifts (5 nuclei for residue i-1, i, and i+1)
over this ensemble of triplets. The rms value of all database secondary
chemical shifts of a given type of nucleus, divided by the standard deviation
derived in the above described manner, provides a measure for how useful
a given type of secondary chemical shift (e.g., DdNi-1)
is at providing information on the phi/psi angles of residue i.
This ratio was calculated 183 times, each time using a different cutinase
residue as the query residue. The chemical shift weighting factors listed
in Table 2 are derived from the averages of these respective ratios, after
scaling to compensate for the intrinsically different widths of the secondary
shift distributions of the types of atoms involved (i.e., using the root-mean-square
(rms) values of the 15N,
1Ha,13Ca,
13Cb
and 13C' secondary chemical shift values
in the entire database).
The relative weight of the residue type
homology versus secondary shifts in the S(i,j) formula (k0n
factors in eq 1) was optimized empirically, by searching for k0n
factors that minimize the number of erroneous predictions, using all residues
present in the database for test purposes.
If a particular chemical shift is missing,
the corresponding secondary chemical shift difference between the query
and the corresponding database chemical shift is set to 1.5 times the rms
value of the corresponding secondary chemical shift (rms values are 4.56
ppm for 15N, 2.49 ppm for 13Ca,
0.51 ppm for 1Ha,
2.01 ppm for
13Cb,
and 2.02 ppm for 13C'). This way of dealing
with incomplete assignments decreases the likelihood that database residues
with incomplete assignments contribute to the phi/psi output of TALOS,
but does not exclude them altogether.
To date, the database used by TALOS contains
only 20 structures for which both a high-resolution X-ray structure and
nearly complete resonance assignments are available. The reason we felt
it is not warranted to include proteins for which a high-resolution NMR
structure but no crystal structure is available is that, as discussed below,
the agreement between the phi and psi angles of most NMR structures and
the output of TALOS is considerably lower than for the high-resolution
crystal structures in the database.
The TALOS output for the phi and psi backbone
angles of the center residue in each string consists of the average of
the corresponding angles in the 10 strings in the database with the highest
degree of similarity (cf eq 1). In a first, fully automated but
very conservative mode of analysis, the program classifies only those predictions
for which at least nine out of ten predictions fall in the same populated
(gray shaded) region of the Ramachandran map (Figure 2), and none of the
center residues in the 10 strings has a positive f angle. If a single residue
falls well outside the Ramachandran region in which the remaining 9 residues
are located, its f/y values are excluded from calculating the average and
rmsd. This procedure typically results in predictions for only about 40%
of the residues.
A subsequent interactive inspection of
the results, using the graphical interface described below, permits additional
predictions to be made. For example, if several predictions fall just outside
the most populated region of the Ramachandran map, but generally cluster
well with the other phi/psi predictions, the prediction should be accepted.
In some cases, there is one center-residue in the ensemble of 10 most similar
triplets for which either f or y deviates by more than 2 standard deviations
from the average value for that angle. Empirical testing indicates that
it is safe to remove (at most) one such triplet from the ensemble of 10
(TALOS then recalculates the new average phi and psi angles and their rmsd),
provided that the outlier does not have its f angle in the 0 degrees <
phi < +150 degrees range, and the average S(i,j) value
is less than 80. When the TALOS output for a given query residue yields
a cluster where at least 9 residues have positive f angles, this prediction
also should be accepted.
The standard deviations and the range of
(f,y) values in the 10 (or 9) most similar database strings provide a measure
for the uncertainty in these averages. When this standard deviation exceeds
45s, the prediction must be deemed "ambiguous", and it is recommended that
the result of the prediction not be used without careful further inspection
of other data, such as the daN(i-1,i)/daN(i,i)
NOE intensity ratio (which provides information on the y angle), the 3JHNHa
coupling (f angle), and 1JCaHa
(primarily for identifying positive f angles; Vuister et al., 1992, 1993).
Not including such cases where NOE or J coupling information is needed,
the above described protocol typically allows a definitive prediction of
the phi and psi angles to be made for about two thirds of the residues.
Because the number of proteins for which
complete NMR assignments and high resolution crystal structures are available
is still very limited, the TALOS database usually contains insufficient
entries for unambiguous identification of residues with positive f angles.
However, testing indicates that if the center-residue of a query triplet
has a positive f angle, this frequently results in a significant fraction
of center-residues which also have positive f angles in the ten most similar
database triplets. These positive f angle triplets typically yield the
lowest S(i,j) values, suggesting that the program will successfully
predict most of the positive f angles once the database becomes sufficiently
large. For now, unambiguous identification of such positive f angles in
most cases requires additional experimental data, such as a very small
1JCaHa
(<136 Hz) (Vuister et al., 1992, 1993), or the presence of an
exceptionally strong intraresidue HN-Ha
NOE.
A graphical interface for inspecting and
interactively updating the TALOS output is available. An example of its
use is shown in Figure 2 for the HIV protease. The interface consists of
three windows: the sequence display, the prediction display, and the Ramachandran
display.
The sequence display lists the residues
in the protein whose backbone angles are being predicted. The residues
are color-coded according to whether the overall prediction for a given
residue was designated as good, ambiguous, or bad. In the initial display,
before interactive analysis, residues are color-coded as green (prediction
accepted in automated mode) and gray (requires inspection). If the true
f/y angles are known, residues for which a wrong prediction was accepted
can be classified as bad (red), which is convenient for testing purposes.
All residues for which TALOS has made predictions which meet the criteria
listed above, are highlighted in green. Residues shaded in yellow are those
for which no firm prediction can be made, but which nevertheless may contain
useful information. For example, if for a given residue 5 out of the 10
triplets show a positive f angle, this suggests that there is a high likelihood
that the center residue of the query triplet has a positive f angle.
When a given residue is selected in the
sequence display (K20 in Figure 2), the f, y, and S(i,j)
parameters are listed in the prediction display, together with the residue
numbers and the names of the proteins from which the triplets were taken.
The ten f/y pairs are graphed in the Ramachandran display, which also shows
the most populated areas of the entire database, shaded in gray. If a reference
or trial structure for the query protein is available, its f/y angles will
also be graphed on the Ramachandran display (blue square). By clicking
on an individual match in the Ramachandran display, it is possible to include
or remove this entry from the overall prediction, which is based on the
average and standard deviations of the selected matches.
The final results are summarized in an
ASCII text table which gives the average f/y angles and their standard
deviations for each residue. Versions of the TALOS program are available
for most types of UNIX platforms.
Figure 3 plots the predicted phi and psi
angles of ubiquitin versus those of the high resolution crystal structure.
As can be seen from this plot, TALOS does considerably more than classifying
residues by their type of secondary structure, and there is a good correlation
between predicted and crystallographic torsion angles, even when considering
only the residues with a positive y angle, for example.
Figure 4 shows the predicted phi and psi
angles as a function of residue number, together with the corresponding
crystallographically determined angles. The error bars correspond to the
standard deviation from the average angle for the center-residue of the
10 (or 9) best fitting triplets in the database. No result is shown if
this standard deviation exceeds 45s, or if any (but less than 9) of the
f angles of the center-residues have a positive f angle.
Tests of the accuracy of TALOS predictions
were made by eliminating each protein from the database and using the program
to predict its backbone angles (Table 4). We found that for about 2% of
the residues in the database (i.e., 3% of the predictions made) TALOS predicts
the wrong torsion angles. Some examples are:1.Thr45
in cutinase: Predicted y = -4 ± 10s; X-ray y = 163s. Although the
B factor is not unusually high, 15N relaxation data
indicate this residue is located in the middle of a flexible loop which
differs in conformation relative to the crystal structure (Prompers et
al., 1997).
2. Asp159 of beta-hydroxydecanoyl
thiol ester dehydrase: Predicted f = -57 ± 7s, y = -36 ±10s;
X-ray f = 56s, y = 52s.
3. Asp19 of staphylococcal
nuclease: Predicted f = -90 ± 12s; y = 8 ±11s; X-ray f =
-156s, y= -166s.Both for Asp159 and Asp19
there is no doubt regarding the similarity in backbone angles in solution
and in the crystalline state, but TALOS fails to predict the unusual backbone
angles of these residues. The user therefore should be aware that a small
fraction of the TALOS predictions may be in error. However, as shown below,
for the vast majority of cases, the output of TALOS is highly accurate.
When listing the rms differences between the predicted f/y angles and those
of the crystal structure, the small fraction of erroneous predictions are
not included.
For ubiquitin, TALOS yields 53 f/y angle
predictions (76 % of its database residues) and the rms differences between
the predicted f/y angles and those of the crystal structure are 12s/9s.
Similarly, for cutinase f/y predictions are made for 127 residues (69%,
including 5 bad predictions, but excluding the disordered N-terminal tail),
with rmsds of 12s/12s relative to the crystallographically determined f
and y angles.
BPTI yielded the worst performance of all
proteins tested. Only 32 f/y predictions (65%, 4 bad predictions) were
made, which agree to within rmsds of 16s and 17s with the 1.1 Å crystal
structure. Differences relative to the solution structure (Berndt et al.,
1992) are slightly larger (18/19s). For the same set of phi and psi angles,
the rms differences between the average solution structure and crystal
structure are 14s and 12s, respectively.
For human thioredoxin the NMR data have
been derived for a mutant which differs from the sequence used for the
crystal structure. The f angles predicted by TALOS are nevertheless in
very good agreement with those of the crystal structure (Supplementary
Material), with 80 (78%) f/y predictions (rmsds of 15s and 12s from the
X-ray structure, respectively), including one erroneous prediction. For
reference, the rmsds relative to the solution structure for the same group
of phi and psi angles are 20s and 22s, respectively. The pairwise rmsd
between the crystal structure and solution structure angles is 16s (f)
and 20s (y).
Three sets of calculations were performed:
(A) using only 273 NOEs, randomly taken from the total set of 2727 NOE
cross peaks, peak-picked from 3D and 4D NOESY spectra (J.L. Marquardt,
unpublished results); (B) additionally using TALOS-f/y constraints for
the 53 residues for which a (correct) prediction had been made; (C) as
B, but deliberately introducing two serious errors in the f/y constraints
by interchanging the TALOS-derived angles of Ala46
(TALOS: f = 54±7s, y = 39±9s; X-ray: f = 48s, y = 46s) with
those of Arg54 (TALOS: f = -102±22s, y = 150±17s;
X-ray: f = -85s, y = 165s). Starting from a fully extended strand and using
an X-PLOR based simulated annealing protocol (Nilges et al, 1988), set
A yielded convergence for 9 out of 30 calculated structures. The backbone
rmsd (residues 2-70) from the average was 1.52 Å, and the backbone
rmsd displacement between the average of these NMR structures and the crystal
structure was 1.36 Å. For set B, f- and y-constraints were included
as "harmonic-well" potentials with zero energy over the range fTALOS
± SD and yTALOS ± SD, where SD is the
standard deviation in the set of 10 (or 9) residues from which fTALOS
and yTALOS were derived. Outside the well, the energy
increased quadratically with 200 kcal/rad2. With 13
out of 30 calculations converging, the yield was 50% higher than in the
absence of TALOS constraints. Moreover, the rmsd from the average was also
considerably lower (0.75 Å), as was the difference relative to the
X-ray structure (0.89 Å). For set C, which includes the erroneous
backbone constraints, convergence was worst (7 out of 30), but the rmsd
from the average (0.87 Å) and between the averaged NMR and crystal
structure (1.04 Å) were intermediate. The errors introduced in the
NMR structure by the wrong TALOS constraints were highly localized.
Although preliminary and clearly incomplete,
the above results for ubiquitin are quite encouraging. They suggest that
a substantial improvement in quality of the structure can be obtained by
including the TALOS-derived f/y-restraints, particularly when the number
of NOEs per residue is low. The introduction of two serious errors in the
TALOS-derived torsion angle restraints decreases the quality of the structure,
but it remains better than in the absence of the TALOS-derived constraints.
Nevertheless, it is recommended that the constraints are used with care,
keeping in mind that they may contain errors. Thus, if either a TALOS-
or NOE-constraint (or both) is violated consistently during structure calculations,
it is essential to recheck the quality of the constraint(s) involved. In
this respect, an erroneous TALOS-derived restraint is no different from
a wrongly assigned NOE connectivity.
At the outset of developing this approach,
we anticipated being able to obtain c1 angle predictions
too. However, these c1 results so far appear insufficiently
reliable for general use. Three possible reasons for this are that (1)
chemical shifts of the backbone nuclei are not sufficiently sensitive to
c1, (2) in the crystal structures it is not possible
to reliably and routinely separate residues with a single c1
conformation from those which undergo c1 rotameric
averaging, and (3) there are practical difficulties in comparing c1
angles for residues with different types of sidechains, i.e., a Cb-branched
residue such as Thr with a non-branched residue. Although it may be feasible
to develop criteria which yield useful TALOS c1 predictions,
it is expected that it will be difficult to make predictions that are more
reliable than those based on residue type and a residue's own backbone
angles, as implemented by Kuszewski et al. (1997).
Our results indicate that concerted use
of 15N,
13Ca,
1Ha,
13Cb
and 13C¢ chemical shifts of triplets of adjacent
residues can be used to predict the backbone torsion angles for the majority
of residues in assigned proteins. When using the crystal structure as the
standard, the accuracy of the TALOS prediction appears to exceed that of
even some of the best solution structures calculated on the basis of NOEs
and J couplings. In principle, one could possibly argue that, as the angles
in the database are all derived from crystal structures, one might expect
the TALOS output to be closer to the crystal structure than to the solution
structure. However, this argument is clearly invalid as it would require
a systematic (as opposed to a random) difference between torsion angles
in crystal structures and in solution. Second, when comparing the TALOS
output for ubiquitin with a solution structure calculated by including
a large number of 13Ca-1Ha,
13Ca-13C¢,
1H-15N,
13C¢-15N
and 13Ca-13Cb
dipolar couplings (Tjandra and Bax, 1997; Marquardt et al., unpublished
results) the agreement of the TALOS-predicted angles with the solution
structure is actually better than with the crystal structure, with rmsd's
of 10s (solution) and 12s (X-ray) for f and 8s (solution) and 9s (X-ray)
for y. The rmsd between crystal structure and solution structure torsion
angles is 7s for both phi and psi.
The 3% fraction of TALOS predictions which
are found to be in disagreement with the crystal structure includes residues
which may adopt a different conformation in the solution and crystal structures
(e.g., Thr45 in cutinase, discussed above), although
most of these regions where differences occur are excluded by the B-factor
criterion (see Materials and Methods). For most proteins used in our database,
no high resolution solution structure is available, and it therefore was
not possible to exclude these residues from the database. A set of residues
in the database for which the solution backbone angles differ strongly
from those in the crystalline state does not increase the number of errors
when TALOS is applied to a new protein. Instead, if their chemical shifts
match those of the query triplet, they result in an outlier in the display
of Figure 2. The same is true if a small fraction of residues in the database
is wrongly assigned.
It also should be pointed out that a database
approach such as the one described here tends to predict torsion angles
that fall closer to the most commonly occupied regions of the Ramachandran
map than the true value. This is a direct result of the fact that TALOS
angles are derived from a set of triplets with the most similar chemical
shifts: First, if the true backbone angles of a given center-residue position
it somewhere on the edge of the most populated region of the Ramachandran
map, there statistically will be a larger number of "hits" inside than
outside the most populated region, simply because the density of residues
is higher in the most populated region. This effect is visible in Figure
3B, for example, where for residues with X-ray y angles in the -25s to
+25s range the predicted y angles are shifted in the direction of the a-helical
region of the Ramachandran map. Similarly, for residues with unusually
large y angles in the X-ray structure, the predicted values consistently
are shifted slightly towards the more populated region near y = 130s. Second,
in rare cases where residues are located far outside the populated region
of the Ramachandran map (such as Asp19 in Staphylococcal
nuclease), no other triplet with such unusual angles may be present in
the database. If TALOS finds a cluster of triplets which accidentally match
the shifts and residue types of the query triplet, it is likely that the
torsion angles in this cluster fall in the highly populated region of the
Ramachandran map. Both these types of problems will be alleviated when
the database becomes larger.
It is important to realize that the TALOS-derived
f/y-values are empirical in nature. In a conservative approach, deviations
between these f/y-values and those in structures calculated on the basis
of regular experimental restraints can be used for "trouble-shooting" purposes.
Alternatively, in cases where an insufficient number of regular experimental
constraints is available, preliminary results on ubiquitin suggest that
incorporation of the TALOS-derived f/y-values can enhance structural quality
considerably. Collecting a large number of NOEs can be particularly difficult
in larger proteins, which require extensive deuteration. It is expected
that the use of TALOS-derived torsion angle restraints, when combined with
one-bond dipolar couplings measured in dilute liquid crystalline media
(Bax and Tjandra, 1997; Clore et al, 1998; Hansen et al., 1998; Bewley
et al., 1998; Wang et al., 1998), will make it possible to obtain reliable
backbone structures for such larger systems, even if only a limited number
of NOEs is available.
Archer, S.J., Vinson, V.K., Pollard T.D.
and Torchia, D.A. (1994) FEBS Lett., 337, 145-151.
Bax, A., Tjandra, N. (1997) J. Biomol.
NMR10, 289-292.
Beger, D.B. and Bolton, P.H. (1997) J.
Biomol. NMR, 10, 129-142.
Berndt, K.D., Guntert, P., Orbons, L.P.
and Wüthrich, K. (1992) J. Mol. Biol., 227, 757-775.
Betzel, C., Klupsch, S., Papendorf, G.,
Hastrup S., Branner, S. and Wilson, K.S. (1992) J. Mol. Biol., 223,
427-445.
Bewley, C.A., Gustafson, K.R., Boyd, M.R.,
Covell, D.G., Bax, A., Clore, G.M. and Gronenborn, A.M. (1998) Nature,
Struct. Biol. 5, 571-578.
Brünger, A.T. (1993) XPLOR Manual
Version 3.1, Yale University, New Haven, CT.
Celda , B., Biamonti, C., Arnau, M.J.,
Tejero, R. and Montelione, G.T. (1995) J. Biomol. NMR, 5,
161-172.
Chattopadhyaya, R., Meador, W.E., Means,
A.R. and Quiocho, F.A. (1992) J. Mol. Biol., 228, 1177-1192.
Clore, G.M., Bax, A., Driscoll, P.C., Wingfield,
P. and Gronenborn, A. (1990) Biochemistry, 29, 8172-8184.
Clore, G.M., Starich, M.R., Gronenborn,
A.M. (1998) J. Am. Chem. Soc. 120, 10571-10572.
Concha, N.O., Rasmussen, B.A., Bush, K.
and Herzberg, O. (1996) Structure, 4, 823-836.
Copie, V., Battles, J.A., Schwab, J.M.
and Torchia, D.A. (1996) J. Biomol. NMR, 7, 335-340.
Davis, J.H., Agard, D.A., Handel, T.M.
and Basus, V.J. (1997) J. Biomol. NMR, 10, 21-27.
de Dios, A.C. and Oldfield, E. (1993) J.
Am. Chem. Soc., 116, 5307-5314.
de Dios, A.C., Pearson, J.G. and Oldfield,
E. (1993) Science., 260, 1491-1495.
Delaglio, F., Grzesiek, S., Vuister, G.,
Zhu, G., Pfeifer, J. and Bax, A. (1995) J. Biomol. NMR, 6,
277-293.
Drakenberg, T., Hofman, T. and Chazin,
W.J. (1989)
Biochemistry,
28, 5946-5954.
Fedorov, A.A., Magnus, K.A., Graupe, M.H.,
Lattman, E.E., Pollard, T.D. and Almo, S.C. (1994) Proc. Natl. Acad.
Sci. U.S.A.,
30, 8636-8640.
Fogh, R.H., Schipper, D., Boelens, R. and
Kaptein R. (1995) J. Biomol. NMR, 5, 259-270.
Fujinaga, M., Delbaere, L.T.J., Brayer,
G.D. and James, M.N.G. (1985) J. Mol. Biol., 184, 479-502.
Gardner, K.H., Zhang, X., Gehring, K. and
Kay, L.E. (1998) J. Am. Chem. Soc., in press.
Gronenborn, A.M. and Clore, G.M. (1994)
J.
Biomol. NMR, 4, 455-458.
Gronwald, W., Boyko, R.F., Sönnichsen,
F.D., Wishart, D.S. and Sykes, B.D. (1997) J. Biomol. NMR 10,
165-179.
Hansen, P.E. (1991) Biochemistry,
30, 10457-10466.
Hansen, M.R., Rance, M., Pardi, A. (1998)
J.
Am. Chem. Soc. in press.
Ikura, M., Kay, L.E. and Bax, A. (1990)
Biochemistry,
29,
4659-4667.
Ikura, M., Kay, L.E., Krinks, M. and Bax,
A. (1991) Biochemistry, 30, 5498-5504.
Ke, H., Zydowsky, L.D., Liu, J., and Walsh,
C.T. (1991) Proc. Nat. Acad. Sci. USA 88, 9483-9487.
Kricheldorf, H.R. and Muller, D. (1983)
Macromolecules,
16,
615-623.
Kumar, V. and Kannan, K.K. (1994)J.
Mol. Biol.,
241, 226-232.
Kuntz, I.D., Kosen, P.A. and Craig, E.C.
(1991)
J. Am. Chem. Soc., 113, 1406-1408.
Kuszewski, J., Qin, J., Gronenborn A.M.
and Clore, G.M. (1995) J. Magn. Reson. B, 106, 92-96.
Kuszewski, J., Gronenborn A.M. and Clore,
G.M. (1997) J. Magn. Reson. 125, 171-177.
Lam, P.Y.S., Jadhav, P.K., Eyerman, C.J.,
Hodge, C.N., Ru, Y., Bacheler, L.T., Meek, J.L., Otto, M.J., Rayner, M.M.,
Wong, Y.N., Chang, C.-H., Weber, P.C., Jackson, D.A., Sharpe, T.R. and
Erickson-Viitanen, S. (1994) Science, 263, 380-384.
Leesong, M., Henderson, B.S., Gillig, J.R.,
Schwab, J.M. and Smith, J.L. (1996) Structure, 4, 253-256.
Loll, P.J. and Lattman, E.E. (1989) Proteins.
Struct., Funct., 5, 183-201.
Longhi, S., Czjzek, M., Lamzin, V., Nicolas,
A. and Cambillau, C. (1997) J. Mol. Biol., 268, 779-799.
Luginbühl P., Szyperski T. and Wüthrich,
K. (1995) J. Magn. Reson., 109, 229-233.
Markley, J.L., Bax, A., Arata, Y., Hilbers,
C.W., Kaptein, R., Sykes, B.D., Wright, P.E., Wüthrich, K. (1998)
J.
Biomol. NMR 12, 1-23.
Meador, W.E., Means, A.R. and Quiocho,
F.A. (1992)
Science,
257, 1251-1255.
Nilges, M., Gronenborn, A.M., Brünger,
A.T. & Clore, G.M. (1988) Protein Engineering 2, 27-38.
Ösapay K. and Case, D.A. (1994) J.
Biomol. NMR, 4, 215-230.
Ottiger, M., Zerbe, O., Güntert, P.
and Wüthrich, K. (1997) J. Mol. Biol., 272, 64-81.
Ousterhout, J.K., (1994) Tcl and the
Tk Toolkit, Addison-Wesley, Reading MA.
Pardi, A., Wagner, G. and Wüthrich
K. (1983)
Eur. J. Biochem., 137, 445-454.
Pastore A. and Saudek V. (1990) J. Magn.
Reson.,
90, 165-176.
Pearson J.G., Wang J., Markley J.L., Le
H. and Oldfield, E. (1995) J. Am. Chem. Soc., 117, 8823-8829
.
Pelton, J.G., Torchia, D.A., Meadow, N.D.,
Wong, C. and Roseman, S. (1991) Biochemistry, 30, 10043-10057.
Prompers, J.J., Groenewegen, A., van Schaik,
R.C., Pepermans, H.A.M. and Hilbers, C.W. (1997) Protein Sci., 6,
2375-2384.
Qin, J., Clore, G.C. and Gronenborn, A.M.
(1996)
Biochemistry,
35, 7-13.
Redfield, C. and Robertson, J. (1991) Proceedings
of a NATO Advanced Research Workshop on Computational Aspects of the Study
of Biological Macromolecules By NMR, Plenum Press, New York NY.
Saito, H. (1986) Magn. Reson. Chem.24,
835-852.
Scrofani, S.D.B., Wright, P.E. and Dyson,
J.H. (1998) J. Biomol. NMR, 12, 201-202.
Seavey, B.R., Farr, E.A., Westler, W.M.
and Markley, L. (1991) J. Biomol. NMR, 1, 217-236.
Sethson, I., Edlund, U., Holak, T.A., Ross,
A. and Johnson, B-H. (1996) J. Biomol. NMR, 8, 417-428.
Sharff, A.J., Rodseth, L.E. and Quiocho,
F.A. (1993) Biochemistry, 32, 10553-10559.
Spera S. and Bax A. (1991) J. Am. Chem.
Soc.,
113, 5491-5492.
Svensson, L.A., Thulin, E. and Forsen,
S. (1992)
J. Mol. Biol., 223, 601-606.
Veerapandian, B., Gilliland, G.L., Raag,
R., Svensson, L.A., Masui, Y. and Hirai, Y., Poulos, T.L. (1992) Proteins.
Struct., Funct., 12, 10-23.
Vijay-Kumar, S., Bugg, C.E. and Cook, W.J.
(1987)
J. Mol. Biol., 194, 531-544.
Vuister, G.W., Delaglio, F. , Bax, A. (1992)
J.
Am. Chem. Soc., 114, 9674-9675.
Vuister, G.W., Delaglio, F. , Bax, A. (1993)
J.
Biomol. NMR 3, 67-80.
Wang, A.C., Grzesiek, S., Tschudin, R.,
Lodi, P.J. and Bax, A. (1995) J. Biomol. NMR, 5, 376-382.
Wang, Y.-X., Marquardt, J.L., Wingfield,
P., Stahl, S.J., Lee-Huang, S., Torchia, D.A. and Bax, A. (1998) J.
Am. Chem. Soc. 120, 7385-7386.
Weichsel, A., Gasdaska, J.R., Powis, G.
and Montfort, W.R. (1996) Structure, 15, 735-751.
Williamson, M. (1990) Biopolymers,
29,
1423-1431.
Wishart, D.S. and Sykes, B.D. (1994) J.
Biomol. NMR, 4, 171-180.
Wishart, D.S., Sykes, B.D. and Richards,
F. M. (1991) J. Mol. Biol., 222, 311-333.
Wishart, D.S., Colin, G.B., Holm, A., Hodges,
R.S. and Sykes, B.D. (1995a) J. Biomol. NMR, 5, 67-81.
Wishart, D.S., Colin, G.B., Yao, J., Abildgaard,
F., Dyson, H.J., Oldfield, E., Markley, J.L. and Sykes, B.D. (1995b) J.
Biomol. NMR, 6, 135-140.
Wishart, D.S , Watson, M.S., Boyko, R.F.,
and Sykes, B.D. (1997) J. Biomol. NMR 10, 329-336.
Wlodawer, A., Walter, J., Huber, R. and
Sjolin, L. (1984) J. Mol. Biol., 198, 469-480.
Worthylake, D., Meadow, N.D., Roseman,
S., Liao, D.-I., Herzberg, O. and Remington, S.J. (1991) Proc. Nat.
Acad. Sci. USA, 88, 10382-10386.
Yamazaki, T., Hinck, A.P., Wang, Y.-X.,
Nicholson, L.K., Torchia, D.A., Wingfield, P.T., Stahl, S.J., Kaufman,
J.D., Chang, C.-H., Domaille, P.J. and Lam, P.Y.S. (1996) Protein Science,
5,
495-506.
Chemical shifts ref.
(*BioMagResBank no.) (*PDB code) (*3ezm) n = -1, 0, 1), for weighting the relative
importance of a given chemical shift or residue type in determining the
similarity score, S(i,j) of eq 1.
Res. Homology 15N
1Ha13C¢13Ca13Cb
n = -1 0.74 0.16 14.66 1.15 0.72 0.76
n = 0 1.48 0.18 17.54 1.21 0.99 0.91
n = 1 0.74 0.20 15.25 1.04 0.72 0.70
Table 3. Residue similarity factors,
DResType, used by TALOS in eq 1.
Table 4. Summary of TALOS results when
applied to predicting backone angles of proteins included in the database.
Listed are the number of "Good" predictions, and the percentage relative
to the total number of residues with acceptable B factors (Avail.), the
number of "Bad" predictions, and the number of residues for which no predictions
could be made (Ambig.), plus the total number of residues (All).
Name Good (%) Bad (%) Ambig. (%) Avail.
All
HIV-1protease 65 67.0 1 1.0 31 32.0 97
99
Total:
predictions: 2910
Figure 1. Flow chart of the TALOS program.
Figure 2. Graphical display of TALOS output
for HIV protease. The lower right window shows the amino acid sequence,
with predictions for each residue designated as "good" (green), "ambiguous"
(yellow), or "bad" (red). The prediction data for the selected residue,
K20, are listed in the prediction display (top right) and graphed in the
Ramachandran display (left). The 10 individual matches from the database
are indicated as small green squares in the Ramachandran display, and for
reference purposes, the known f/y position from the HIV protease X-ray
structure (blue square) is also shown. Clicking on any of the squares highlights
the corresponding triplet in the prediction display.
Figure 3. Plots of the backbone angles
(A) phi, and (B) psi predicted by TALOS, versus those observed in the crystal
structure, for ubiquitin.
Figure 4. Predicted backbone angles (A)
phi, and (B) psi of ubiquitin. The length of the error bars represents
the standard deviation from the average of the dihedral angles of the 10
residues from the database having the highest chemical shift and sequence
similarity with the query residues. Triangles correspond to the angles
observed in the crystal structure.
Figure 5. (Supplementary Figure) Predicted
backbone angles (A) phi, and (B) psi for the reduced form of human thioredoxin.
The length of the error bars represents the standard deviation from the
average of the dihedral angles of the 10 residues from the database having
the highest chemical shift and sequence similarity with the query residues.
Triangles correspond to the angles observed in the crystal structure.
Web:
http://spin.niddk.nih.gov/bax
Keywords
Backbone angles, Chemical shift, Protein structure,
Homology of chemical shift, Sequence homology, TALOS, phi angle,
psi angle
Abstract
Chemical shifts of backbone atoms in proteins
are exquisitely sensitive to local conformation, and homologous proteins
show quite similar patterns of secondary chemical shifts. The inverse of
this relation is used to search a database for triplets of adjacent residues
with secondary chemical shifts and sequence similarity which provide the
best match to the query triplet of interest. The database contains 13Ca,
13Cb,
13C',
1Ha
and 15N chemical shifts for 20 proteins
for which a high resolution X-ray structure is available. The computer
program TALOS was developed to search this database for strings of residues
with chemical shift and residue type homology. The relative importance
of the weighting factors attached to the secondary chemical shifts of the
five types of resonances relative to that of sequence similarity was optimized
empirically. TALOS yields the 10 triplets which have the closest similarity
in secondary chemical shift and amino acid sequence to those of the query
sequence. If the central residues in these 10 triplets exhibit similar
phi and psi backbone angles, their averages can reliably be used as angular
restraints for the protein whose structure is being studied. Tests carried
out for proteins of known structure indicate that the root-mean-square
difference (rmsd) between the output of TALOS and the X-ray derived backbone
angles is about 15s. Approximately 3% of the predictions made by TALOS
are found to be in error.
Introduction
The strong dependence of isotropic chemical
shifts on protein structure has long been recognized. In particular, the
striking correlation between 1Ha
chemical shift and secondary structure has been studied extensively (Pastore
and Saudek, 1990; Williamson, 1990; Wishart et al., 1991; Ösapay and
Case, 1994) and the 1HN
shift was found to be sensitive to both hydrogen bonding and secondary
structure(Pardi et al., 1983; Williamson, 1990; Wishart et al. 1991). The
periodicity of the HN shifts observed in many alpha-helical
structures, in conjunction with the well-established relation between HN
chemical shift and hydrogen bond length (Wagner et al., 1983), suggests
that they also contain information on helix bending (Kuntz et al., 1991).
Similar correlations between the backbone torsion angles phi and psi with
the 1Ha and 1Hb
chemical shifts have been identified, which appear particularly useful
for characterization of turns (Ösapay and Case, 1994).
Materials and Methods
A database was created which contains nearly
complete 13Ca,
13Cb,
13C¢,
1Ha
and 15N chemical shifts assignments of 20 proteins
(Table 1), together with the backbone torsion angles phi and psi, derived
from crystal structures solved at a resolution ¾ 2.2 Å (nearly
3,000 residues, 14,000 chemical shifts). The format is such that the database
can easily be extended by adding new structures for which at least four
of the five chemical shifts are available per residue, and for which the
structure is known accurately. The structural data follows the Brookhaven
Protein Databank (PDB) format and the chemical shifts are in the BioMagResBank
(Seavey et al., 1991) format. Residues with missing crystallographic coordinates
(e. g. residues 1-17 of cutinase and the amino- and carboxy-terminal residues)
as well as residues with multiple conformations in the X-ray structure
have been excluded. Residues with high temperature (B) factors for
the backbone atoms, exceeding 1.5 times the average B-factor for
that protein, were also excluded. This includes the vast majority of cases
where differences between crystal and solution structures previously have
been noted.
Results and Discussion
The backbone torsion angle prediction package
TALOS (Torsion Angle Likelihood Obtained from Shifts and sequence similarity)
is written in the Tcl/Tk language (Ousterhout, 1994) and uses NMRWish,
a companion package to the NMRPipe processing and analysis system (Delaglio
et al, 1995). NMRWish is a version of the Tcl/Tk script interpreter "wish",
(Ousterhout, 1994) which has been customized to include a relational database
engine for manipulation of spectral information and molecular coordinates.
An outline of the prediction method used by TALOS is presented in Figure
1.
Use of TALOS output in structure calculation.
The dihedral constraints for the backbone
torsion angles obtained from TALOS are available immediately after completion
of the resonance assignment and therefore can be used at the very early
stages of structure calculation. It is, however, important to realize that
a small fraction of the TALOS predictions is likely to be in error. Preliminary
testing on the effect of inclusion of TALOS constraints in calculation
of a protein structure was carried out for ubiquitin.
Concluding Remarks
The approach described in this paper is the
first to combine both chemical shift and residue type information for predicting
the backbone torsion angles. Also, instead of using the chemical shift
information of only a single residue, it considers the chemical shifts
and residue types of a string (of length 3, in the present case) to obtain
this information. The weight of a particular secondary shift was adjusted
by considering the width of its distribution over a narrow range of backbone
torsion angles relative to the entire range of secondary chemical shifts
in the database. The relative importance of the chemical shifts versus
residue homology has been adjusted empirically to yield the most reliable
predictions for proteins of known structure. Remarkably, the weighting
factors for the center-residue in the string of 3 residues in Table 1 is
only slightly higher than for its two flanking residues, indicating that
they are of comparable value when predicting a residue's f/y angles. The
contribution from the residue type homology to the similarity factor S
is rather modest, typically about 25%. Nevertheless, reliability
of TALOS predictions is considerably improved when including this residue
type homology.
Software availability
The software, installation instructions and
examples, are available upon request by electronic mail to delaglio@speck.niddk.nih.gov.
For further information see: http://spin.niddk.nih.gov/bax
Acknowledgements
We thank Sharon Archer, Vladimir Basus, Rolf
Boelens, Walter Chazin, Marius Clore, Bennett Farmer, Stephen Fesik, Kevin
Gardner, Poul Hansen, Mitsuhiko Ikura, Marcel Ottiger, Jeanine Prompers,
Sergio Scrofani, and Dennis Torchia for providing chemical shift assignments
included in the database, and John Marquardt, Marcel Ottiger, and Jin-Shan
Hu for useful discussions. Work by G. Cornilescu is in partial fulfillment
for the Ph.D. degree at the University of Maryland, College Park, MD.
References
Ando, I., Saito, H., Tabeta, R., Shoji, A.
and Ozaki, T. (1984) Macromolecules, 17, 457-461.
Tables
Table 1. Proteins contained in the database.
Also listed are references describing the chemical shifts, the X-ray structure,
the accession codes for data deposited in the BMRB and PDB databeses, the
resolution at which the crystal structure was solved, and the types of
nuclei for which chemical shifts are available.
Table 2. Empirically optimized k factors,
kmn
(m : homology, Ca, N, Cb, C¢,
Ha;
Protein
No. of resi-dues
X-ray structure ref.
Reso-lution
Shifts
Alpha-lytic protease (Davis
et al., 1997)
198
Fujinaga et al., 1985, (*2alp)
1.7 Å
Ca
, Cb , C', Ha
, N
Basic pancreatic trypsin
inhibitor (Hansen P.E., 1991)
58
Wlodawer et al., 1984, (*5pti)
1.1 Å
Ca
, Cb , C', Ha
, N
Calbindin (Drakenberg et
al., 1989), (*390)
76
Svensson et al., 1992, (*4icb)
1.6 Å
Ca
, Cb , Ha
, N
Calmodulin (Ikura and Bax,
1990), (*547)
148
Chattopadhyaya et al., 1992,
(*1cll)
1.7Å
Ca
, Cb , C', Ha
, N
Calmodulin/M13 (Ikura et.
al, 1991), (*1634)
147
Meador et al., 1992, (*1cdl)
2.2Å
Ca
, Cb , C', Ha
, N
Cutinase (Pompers et al.,
1997), (*4101)
214
Longhi et al., 1997, (*1cex)
1.0Å
Ca
, Cb , C', Ha
, N
Cyclophilin (Ottiger et al.,
1997)
165
Ke et al., 1992, (*2cpl)
1.63Å
Ca
, Cb , Ha
, N
Cyanovirin-N (Bewley et al.,
1998)
101
Yang et al., in press,
1.5Å
Ca
, Cb , C', Ha
, N
Dehydrase (Copie et al.,
1996)
171
Leesong et al., 1996, (*1mka)
2.0Å
Ca
, Cb , C', Ha
, N
D-maltodextrin-binding protein
(Gardner et al., 1998)
370
Sharff et al., 1993, (*1dmb)
1.8Å
Ca
, Cb , C', Ha
, N
HIV-1 protease (Yamazaki
et al., 1996)
99
Lam et. al, 1994
1.8Å
Ca
, Cb , C', Ha
, N
Human carbonic anhydrase
I (Sethson et al., 1996), (*4022)
260
Kumar and Kannan, 1994, (*1hcb)
1.6 Å
Ca
, Cb , C', Ha
, N
Human thioredoxin in reduced
form (Qin et al., 1996)
105
Weichsel et al., 1996, (*1ert)
1.7 Å
Ca
, Cb , Ha
, N
III-glc (Pelton et al., 1991)
168
Worthylake et al., 1991,
(*1f3g)
2.1Å
Ca
, Cb , C', Ha
, N
Interleukin-1á
(Clore et al., 1990), (*1061)
153
Veerapandian et al., 1992,
(*4i1b)
2.0Å
Ca
, Cb , Ha
, N
Metallo-á
-lactamase (Scrofani et al., 1998), (*4102)
232
Concha et al., 1996, (*1znb)
1.85Å
Ca
, Cb , C', Ha
, N
Profilin (Archer et. al 1994)
125
Fedorov et al., 1994, (*1acf)
2.0Å
Ca
, Cb , C', Ha
, N
Serine protease PB 92 (Fogh
et al., 1995)
269
Betzel et al., 1992, (*1svn)
1.4Å
Ca
, Cb , C', Ha
, N
Staph nuclease (D. A. Torchia,
personal communication)
141
Loll and Lattman, 1989, (*1snc)
1.65Å
Ca
, Cb , C', Ha
, N
Ubiquitin (Wang et. al 1995)
76
Vijay-Kumar et al., 1987,
(*1ubq)
1.8Å
Ca
, Cb , C', Ha
, N
Residue
A
R
D
N
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
A
0
1
1
1
1
1
1
2
1
2
1
1
1
2
3
1
2
2
2
2
R
1
0
1
1
1
1
1
2
1
2
1
0
1
1
3
1
2
1
1
2
D
1
1
0
0
1
1
1
2
1
2
1
1
1
1
3
1
2
1
1
2
N
1
1
0
0
1
1
1
2
1
2
1
1
1
1
3
1
2
1
1
2
C
1
1
1
1
0
1
1
2
1
2
1
1
1
1
3
1
2
1
1
2
Q
1
1
1
1
1
0
1
2
1
2
1
1
1
1
3
1
2
1
1
2
E
1
1
1
1
1
1
0
2
1
2
2
2
1
1
3
1
2
1
1
2
G
2
2
2
2
2
2
2
0
3
3
3
3
3
3
3
3
3
3
3
3
H
1
1
1
1
1
1
1
3
0
2
1
2
2
1
3
2
2
1
1
2
I
2
2
2
2
2
2
2
3
2
0
1
2
2
2
3
2
1
2
2
0
L
1
1
1
1
1
1
2
3
1
1
0
1
1
1
3
2
2
1
1
2
K
1
0
1
1
1
1
2
3
2
2
1
0
1
2
3
1
2
2
2
2
M
1
1
1
1
1
1
1
3
2
2
1
1
0
2
3
1
2
2
2
2
F
2
1
1
1
1
1
1
3
1
2
1
2
2
0
3
2
2
0
0
1
P
3
3
3
3
3
3
3
3
3
3
3
3
3
3
0
3
3
3
3
3
S
1
1
1
1
1
1
1
3
2
2
2
1
1
2
3
0
1
2
2
2
T
2
2
2
2
2
2
2
3
2
1
2
2
2
2
3
1
0
1
1
1
W
2
1
1
1
1
1
1
3
1
2
1
2
2
0
3
2
1
0
0
1
Y
2
1
1
1
1
1
1
3
1
2
1
2
2
0
3
2
1
0
0
1
V
2
2
2
2
2
2
2
3
2
0
2
2
2
1
3
2
1
1
1
0
III-glc 87 61.7 4 2.8 50 35.5 141 168
Alpha-lytic protease 101 54.6 3 1.6 81
43.8 185 198
BPTI 32 58.2 4 7.3 19 34.5 55 58
Calbindin 48 72.7 0 0.0 18 27.3 66 75
Calmodulin 107 84.9 0 0.0 19 15.1 126
148
Calmodulin/M13 98 80.3 0 0.0 24 19.7 122
148
Cutinase 122 66.7 5 2.7 56 30.6 183 214
Cyanovirin-N 55 61.1 1 1.1 34 37.8 90
101
Cyclophilin 87 54.0 5 3.1 69 42.9 161
165
Dehydrase 91 62.7 3 2.1 51 35.2 145 171
HCA I 149 60.3 7 2.8 91 36.9 247 260
Interleukin-1b 75 61.0 3 2.4 45 36.6 123
153
Lactamase 137 66.2 4 1.9 66 31.9 207 232
Serine protease PB 92 161 62.7 8 3.1 88
34.2 257 269
D-MBP 217 62.0 5 1.4 128 36.6 350 370
Profilin 72 67.9 0 0.0 34 32.1 106 125
Staph nuclease 81 67.5 4 3.3 35 29.2 120
141
Human thioredoxin 79 76.7 1 1.0 23 22.3
103 105
Ubiquitin 53 75.7 0 0.0 17 24.3 70 76
correct: 1920 (65.3%)
incorrect: 58 (2.0%)