refmac XYZIN foo_cycle_i.brk HKLIN foo.mtz
PROTOUT protout.cycle_i PROTCOUNTS counts.cycle_i
XYZOUT foo_cycle_j.brk HKLOUT foo_cycle_j.mtz
[Keyworded input]
The REFMAC program can carry out rigid body, restrained or unrestrained refinement against Xray data, or idealisation of a macromolecular structure. It minimises the coordinate parameters to satisfy either a Maximum Likelihood or Least Squares residual. There are options to use different minimization methods. If the user wishes to invoke geometric restraints the program PROTIN which analyses the protein geometry and produces an output file containing restraints information must be run prior to running REFMAC. REFMAC also produces a MTZ output file containing weighted coefficients for SigmaA weighted mFo-DFcalc and 2mFo-DFcalc maps, where "missing data" have been restored.
[Program organisation and communication with it is illustrated in the EXAMPLES section.]
If you use REFMAC please refer to one of these papers!!!
Anything input on a line after an exclamation mark ("!" or "#") is ignored and lines can be continued by using a minus sign. The program only checks the first 4 characters of each keyword. The order of the cards is not important except that an END card must be last. Each keyword has various subsiduary keywords.
Principal keywords controlling Xray refinement:
LABIN, REFIne, SCALe, SIGMaa, WEIGht
All keywords for Xray refinement :
BINS/RANGE, BLIMit, CELL, DAMP, FREEflag, LABIn, LABOut, MODE, MONItor, NCYCLE, PHASE, REFIne, RIGIdbody, SCALe, SCPArt, SIGMaa, SYMMetry, WEIGht, END/GO
Keywords controlling geometric restraints:
BFAC/TEMP, CHIRal,
DISTance, NCSR/NONX,
PLANe, RBONd,
SPHEricity, TORSION,
VDWR, HOLD
(Their function is very similar to that in PROLSQ)
Optional keywords controlling the data harvesting functionality:
PNAME, DNAME, PRIVATE, USECWD, RSIZE, NOHARVEST
This card is essential for all refinement except idealisation of geometry. To some extent the course of the refinement is governed by the assignments given.
The following program labels can be assigned:
FP SIGFP FREE FPARTi PHIPi HLA HLB HLC HLD or PHIB FOM
Assignments for FP and SIGFP are always required.
The use of the free R flag is highly recommended. This is an important component in using maximum likelihood sensibly. If FREE is assigned, reflections which are flagged with nfree_exclud (default 0) are excluded from the gradient calculation, and therefore the agreement between them and Fc is not part of the refinement procedure.
REMARK 0: It is strongly recommended to run the uniqueify script on the first dataset as soon as possible, e.g. after TRUNCATE. One purpose of this script is to add a column of free-R flags, and it is important for the validity of the free-R approach that this is done before any model refinement. If you are continuing model refinement with a new data set, it is important to preserve the FreeR assignment used before. See FreeR assignment.
REMARK 1: Reflections flagged for free R flag calculation are omitted from any parameter refinement, and also from the scale and B-factor calculation. For ML refinement the default is to use them to estimate SigmaA. [See SCALe and SIGMA keywords.]
If you wish to add known FPART(s) to the structure factors, assign FPARTi and PHIPi. This option gives the program a lot of flexibility. Possible examples of it use are: 1) Adding in calculated hydrogen contributions. 2) Adding in anisotropic metal atoms. 3) Adding in contributions from unmodelled parts of a structure. Eg: A solvent continuum, Uninterpretable parts of an MIR map.
REMARK 2: See SCPART keyword: The FPARTi will be added to the FC without any further scaling unless this is set.
PHIB FOM An input phase and its figure of merit.
Or: HLA HLB HLC HLD The Hendrickson-Lattman coefficients describing an input phase. These can be obtained from the usual routes; MIR, MAD plus density modification.
REMARK 3: In our experience using DM phases gave better result than using MIR phases. The FOM weighting may need to be changed. See PHASe keyword
FP SIGFP (and FREE if available) plus the following additional columns are output. Labels can be assigned: FC PHIC FWT PHWT DELFWT PHDELWT FOM [PHCOMB]
By default, FP and SIGFP are scaled to match FC, using 1/K0. The overall B correction, which can be anisotropic, is applied to the FC terms. IF you have APPLY OBSERVED the inverse of this B factor is applied to FP.
FC PHIC are the structure factor contributions from the input atoms only, WITHOUT any FPARTi terms added.
FWT and DELFWT are similar to the terms output from SIGMAA:
PHCOMB is the combined phase to use with FP. It is output only if experimental phases were input.
If prior phase probabilty distribution is available then all output Fourier coefficients correspond to combined phases.
FOM = <m> - The "figure of merit" for this reflection . See Paper for definition ( Ref ??)
For the FWT and DELFWT terms, FP is scaled by 1/K0, FCalc= vector sum of {Dc FC + D1 FPART1 + D2 FPART2 + .. }, Di are the resolution dependent SIgmAA term, and m is the figure of merit of this reflection.. (PHDELWT and PHWT will be "combined" phases if HLA etc or PHIB FOM have been assigned.) Otherwise they equal either PHIC or PHIC+180.
Missing Data: For those reflections where the FP are missing mFo is set equal to dFc. Hence the terms become FWT=dFC and DELFWT=0.0 The m and D are based on the program's estimate of SigmaA.
Rebuilding into these 2mFo-DFcalc and mFo-DFcalc maps seems easier than using classic nFO-(n-1)FC and difference maps, consistent with the established technique for SigmaA style maps. One advantage here is that since the m and D values are based on the Free set of reflections they are less biased than the values obtained by the ccp4 version of SIGMAA after refinement.
For example:
REFI TYPE RESTrained RESOLUTION 20 1.50 REFI RESI MLKF REFI BREF ISOTropic METH CDIR or REFI TYPE REST RESO 20 1.50 RESI MLKF BREF ISOT METH CDIR or REFI TYPE RIGID (all other definitions are defaults)
This keyword controls the type of refinement or idealization. Subkeywords:
In more detail, these subkeywords are:
Program will apply blurring as follows:
HLAnew = HLA*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLBnew = HLB*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLCnew = HLC*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLDnew = HLD*scblur*exp(-(sin(theta)/lambda)**2*bblur)
or
If PHASE and FOM are given: the program first generates the Hendrickson-Lattmann coefficients using the formula HLA = Func(FOM)*COS(DEGTOR*PHASE), HLB = Func(FOM)*SIN(DEGTOR*PHASE), HLC = HLD = 0. ie the Phase probability distribution is unimodal.
Fxray = SUM(Whkl*(|FO|-|FC|)**2)
Fxray = SUM(LLKcentric_hkl) + SUM(LLKacentric_hkl)
REMARK: If you are using IDEAlise it is better to use CGRAD option. When using RESTrained option CDIR will work much better. Maybe for NCS the CGMAT is more appropriate; it is the only one which considers off-diagonal terms for atom-atom interactions, and these can be large when the NCS restraints are tight. If the minimisation fails it may be worth trying another option.
Include all well measured data, not omitting the weak observations; it will be weighted appropriately. The low resolution data helps define the solvent shell. However if you have lost strong terms by some accident of data collection the scaling may not behave well.
The SCALE keyword has several different options.. See below for keywords for estimation of SIGMaA, triggered by SCALe MLSC. For example:
SCAL TYPE BULK LSSC ANIS RESO 10 2.1 SCAL TYPE SIMP LSSC EXPE SCALe LSSC FIXBulk BBULk 70.0 SBULk -0.75 ! Fix bulk solvent parameters for LSQ scaling
These keyword controls the type of scaling of Fo and Fc. Subkeywords:
In more detail, these subkeywords are:
KB = K0*exp(-B0*s^2) * (1- K1*exp(-B1*s^2))
The scale formulation is based on the Babinet principal and described by Dale Tronrud and others. Ref?? TNT manual. Better results can be obtained by more sophisticated modelling, but this is a simple compromise.
KB = K0*exp(-B0*s^2) (Simple Wilson scaling) ie: K1 = 0
This may be more appropriate if you are adding bulk solvent contributions as an FPARTi term or you have parameterised most of the expected waters.
The isotropic overall B factor B0, is replaced by:
B11*s1*s1+2.0*B12*s1*s2+2.0*B13*s1*s3+B22*s2*s2+2.0*B23*s2*s3+B33*s3*s3
i.e. [s] [B] [s]* where:
[s] is vector in reciprocal orthogonal system = [s1 s2 s3] = [ h k l ] [RFR] , [h k l] are the Miller indices; RFR is the fractionalisation matrix. [B11 B12 B13] [B] = [B21 B22 B23] ; B21 = B12, B31 = B32, B32 = B23. [B31 B32 B33]
Anisotropic scaling of data should ideally be done at the merging stage but often the distortion aligns with the crystal axes, and therefore cannot be detected ifrom symmetry equivalent reflections alone. Large improvements in Rfactors can result from this correction.
Defines resolution limit for scaling. If you are using the Wilson plot option with only one gaussian for scaling this keyword allows you to exclude the low resolution data which is affected by solvent. Advice: use 4.5A to the highest resolution.
NB: Before applying bulk solvent scaling and including all low resolution data Check your distribution of <F> looks sensible. This is the raw material for all overall scaling algorithms. A good way to check this is to look at a <Fsq> plot against resolution.
This should look something like this:
+ + + + + + + + + + + <10A 5A 4.5A ............
If the low resolution looks strange - it may mean your backstop was causing problems, intensities were saturated etc etc, and including such data may give crazy solvent scales. A sensible sort of value would be : Fc 1.0 0.0 Bulk Solvent -0.4 200.0
Program always first calculates log scaling assuming Bs = exp(-B1*s^2) This minimises: LFoKBFc = sum(ln(|FO|-ln(K)-ln(|FC|)+B1*S^2)^2 where the FC is simply derived from the input coordinates, without any addition of FPARTi terms. (This option does not require iterative cycles of scaling) and after this it scales KFoBFc =sum(K*|FO|-Bs*|FC|)^2 where the FC include FPARTi and Bulk solvent correction if requested.
NB-=======================================================================
NB-=======================================================================
NB-=======================================================================
We are not really sure how best to handle scaling. If you have problems
please get in touch. In our experience there have been no problems with
data sets with resolution 2.5A or higher, unless there was some obvious
flaw; huge ice rings or Is labelled as Fs or some such thing. But with
one unusual data set which died at 2.7A there has been a problem, which
we got round by tweaking parameters, but these cases should be automatically
checked.
NB-=======================================================================
NB-=======================================================================
NB-=======================================================================
NOTE: When doing ML refinement the scale factors are only used to calculate R values.
For example:
SCALe MLSC FIXBulk BVALue 100.0 SCVAlue -0.1
The SigmA estimate SA is generally fitted to the normalised Free reflections using a 4 parameter equation of an analogous form to the bulk scaling:
SA = SA0*exp(-T0*s^2) * (1- SA1*exp(-T1*s^2))
These keywords controls the estimation of SA. Subkeywords:
[Default EXPE MATR 0.5]
This keyword controls the weighting of the terms
This sub-keyword allows you not to use experimental sigmas of the observations for the Xray residual. The default action is to use them.
The remaining sub-keywords control the relative weighting of the X-ray and geometry terms in the residual.
Number of bins <nbins>. Default 20, maximum allowed 100. For least-square refinements it is useful only for monitoring statistics on resolution ranges. Used for normalisation to convert Fs to Es.
Or <range>. If the value after BINS/RANGE is less than 1.0 this is assumed to define the bin width in units of 4*sin**2/Lambda**2
[Default 2.0 1000.0]
Bmin and Bmax are minimum and maximum B-factors allowed. This keyword is
not recommended. The lower limit is required by the program, but is set
by default.
Defines parameters of the cell. If this keyword is specified then cell parameters from MTZ and coordinate files will be overridden. This keyword could be important when cell dimensions are suspect and the user wants to change them.
[Default 1.0 1.0 for resolution > 2.5A, 0.5 0.5 for
resolutions < 2.5]
<Pdamp> <Bdamp> scales shifts after each cycle of refinement. If
there is limited data, or you are using unrestrained refinement it is
sensible to scale down the shifts.
[Default 0]
The default value for exclusion for the FreeR is zero. However there is
an opportunity to reset this exclusion flag here. See the description of
the program FREERFLAG.
<subkeyword 1> can be one of:
The subsequent <subkeyword i> are (only used if output requested is MEDIum):
[Default 5]
This keyword defines number of cycles of refinement done before PROTIN
is run again.
(See above; this functionality can also be input on the REFI line.) If experimental phases are being used it may be necessary to reduce their assigned figure of merits,especially after some density modification calculations.
Program will apply blurring as follows:
HLAnew = HLA*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLBnew = HLB*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLCnew = HLC*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLDnew = HLD*scblur*exp(-(sin(theta)/lambda)**2*bblur)
or
If PHASE and FOM are given: the program first generates HLA and HLB using the formula HLA = Func(FOM)*COS(DEGTOR*PHASE), HLB = Func(FOM)*SIN(DEGTOR*PHASE), HLC = HLD = 0. ie the Phase probability distribution is unimodal.
Rigid body refinement in REFMAC is performed as following. Program first finds the center of mass of all domains. This is defined as (Tgx,Tgy,Tgz) where:
TgX = sum(Occ(i)*X(i))/Natom TgY = sum(Occ(i)*Y(i))/Natom summed over all atoms in the domain. TgZ = sum(Occ(i)*Z(i))/Natom
Both Euler and polar rotation angles and the coordinates of the center of mass are refined for each domain. (i.e. program finds Delta_Alpha, Delta_Beta, Delta_Gamma, and deltaTg).
REMARK: In the solution of linear equation to obtain the shifts the program uses singular value decomposition, thus avoiding problems with singularities such as that when Beta =~ 0.0 Only alpha+gamma can be determined.
Subkeywords:
RIGIDbody GROUP 1 FROM 10 A TO 100 A } A domain containing most of the A chain, and RIGIDbody GROUP 1 FROM 108 A TO 176 A } an embedded bit of a B chain, consisting of RIGIDbody GROUP 1 FROM 200 B TO 220 B } residues 200-220. .. RIGIDbody GROUP 2 FROM 10 B TO 100 B } for example the equivalent domain across a NCS axis RIGIDbody GROUP 2 FROM 108 B TO 176 B } RIGIDbody GROUP 2 FROM 200 A TO 220 A }
Note: For rigid body refinement you don't need to run protin or other supporting programs.
If NSCi are set, the FPARTnsci is scaled relative of the FC FC_tot = FC_calc(PHIcalc) +nsc1*FPART1(PHIP1) + nsc2*FPART2(PHI2) + ....
If NSCi set - that partial structure factor will be scaled
Defines space group symmetry. If MTZ file is defined then symmetry from MTZ will be used. This keyword is essential for idealisation.
The following key words are used to set the weighting of the restraints. We have attempted to make the default values agree with those used for PROLSQ.
[Default 1.0 0.02 0.04 0.05 0.05 0.02]
<WDSKAL> is a multiplier used in calculating the weights for the distance restraints. The weights are of the form: Weight = WDSKAL*(model distance - ideal distance) / SIGD**2
<SIGD1> is the a value associated with bonded distances (e.g. Calpha-C).
<SIGD2> is the a value associated with a non-bonded distance determining an angle to be restrained (e.g. N - C).
<SIGD3> is the a value for a planar 1-4 distance (e.g. O(i-1)-Calpha(i)).
<SIGD4> is the a value associated with a bond distance involving hydrogen (e.g. Calpha - Halpha).
<SIGD5> is reserved for special distance groups (as set up with KDWT = 5 in the dictionary of ideal atomic dis- tances).
[Default 1.0 0.03 0.02]
<WPSKAL> is a multiplier for the distance restraints defining planes. The weight used for the peptide planes is (WPSKAL / SIGPP)**2, and for the aromatics (WPSKAL / SIGPA)**2.
<SIGPP> is the sigma value associated with distances used to define peptide planes (e.g. 0.03).
<SIGPA> is the sigma value associated with distances used to define aromatic planes (e.g. 0.02 ).
[Default 1.0 0.15]
<WCSKAL> is used in weighting Chiral groups restraints. The weight used is (<WCSKAL> / <SIGC>)**2
<SIGC> is the value associated with Chiral groups restraints (e.g. 0.02 to 0.04).
[Default 1.0 2.0 3.0 2.0 3.0 ]
<WBSKAL> is the used in calculating the weight for the tem- perature factor restraint parameters based on the types of bonding in which the atoms are involved. (See DISTANCE keyword.)
[Default 2.0 ]
<sigsph> is used to restraint atomic anisotropic tensor to be spherical
[Default 1.0 ]
<sigrbond> is used to make minimum of projections of anisotropic tensors along bond for the bonded atoms.
[Default 1.0 0.05 0.5 5.0 0.5 2.0 10.0]
<WSSCAL> is used in weighting restraints involving non- crystallographic symmetry. The weight is given by (<WSSCAL> / <SIGS>)**2
<SIGSP1>, <SIGSP2>, <SIGSP3> are a values associated with non- crystallographic positional restraints and are for tight, medium and loose restraints respectively.
<SIGSB1>, <SIGSB2>, <SIGSB3> are a values associated with non- crystallographic thermal restraints and are for tight, medium and loose restraints respectively.
[Default 1.0 15.0 3.0 15.0 20.0]
<WTSCAL> is a multiplier used in calculating the weights for the torsional angle restraints. The weight is of the form Weight = (<WTSKAL> / <SIGT>)**2
<SIGT1> is the a value associated with a pre-specified angle (usually Phi and Psi of a regular secondary structure).
<SIGT2> is the a value associated with a planar angle (e.g. omega).
<SIGT3> is the a value associated with a staggered angle (e.g. Chi1).
<SIGT4> is the a value associated with an orthonormal angle (e.g. Chi2 of aromatics).
Starting values of say 20 degress are suggested for <SIGT1> to <SIGT4>.
[Default 1.0 1.0 -0.3,0.0]
<WVSKAL> is used in the weighting scheme for Van der Waals contacts. The weight used is (<WVSKAL> / <SIGV>)**2
<SIGV> is the sigma value associated with Van der Waals dis- tances.
<DINC(1)>, <DINC(2)>, <DINC(3)>, <DINC(4)>. The relevant matrix elements are only incremented if the distance between the non-bonded atoms is less than the sum of the Van der Waals radii. The values of DINC are used to change the ideal distance for each sort of contact.
<DINC(1)> is for atoms where relative position is determined by only one torsion angle: single torsion contact.
<DINC(2)> is for atoms where two or more torsion angles determine their relative positions: multiple torsion con- tact.
<DINC(3)> is for possible hydrogen bonds (contacts between any N and any O except Nmain - Nmain or Omain - Omain).
[Default 0.3 3.0 0.2]
Restraints to current values <PDEL> is a positional shifts magnitude restraint. This goes into the matrix as (a/<PDEL>)**2, (b/<PDEL>)**2, (c/<PDEL>)**2 where a, b, c are the unit cell parameters.
<BDEL> is a shifts magnitude restraint on individual thermal parameters and goes into the matrix as (1/<BDEL>)**2. It is not used if ITEMP = 0.
<QDEL> is the shifts magnitude restraint on variable occu- pancy factors (not used if NOCC=0). This goes into the matrix as (1/<QDEL>)**2.
[Also see keyword MONItor]
$HARVESTHOME/DepositFiles/<projectname>/ <datasetname>.refmac
The environment variable $HARVESTHOME defaults to the user's home directory, but could be changed, for example, to a group project directory.
End of keywords. Time to do work.
It MUST contain reflections from only one asymmetric unit. REFMAC resorts the hkl list, so sort order and choice of asymmetric unit are not important. It is very sensible to have followed the "complete" script given as Example 0. See more reasons for this in the description of HKLOUT.
It is EXTREMELY ADVISEABLE to assign FreeR_flags for each reflection if you are using the Maximum Likelihood option. The FreeR set is used to estimate comparatively unbiased SigmaA values.
IF FPARTi and PHIPi have been assigned, the calculated structure factor will equal the scaled FPARTi vector(s) plus the contribution calculated from the input coordinates.
PROTSCR,SCRAT1,REFSCR. These will be automatically deleted.
The printer output (logfile) is divided into a number of sections as described below. Not all the sections need appear on a given run of the program as their output depends on the options selected via the control data.
Scale and B factors calculated by log scaling = 0.4365 -7.855 log scaling is minimization of sum(ln(FO)-ln(FC)-log(scale)+sin^2(th/lm)*B) Number of reflection used for scaling= 13868 *************************************** Ls initialisation is o.k. **** Estimating Overall scales or Sigmaa values **** Maximum number of refinement cycles 10 Initial values of K-s and B-s 1 0.437 -7.855 2 -0.400 150.000 Final values of K-s and B-s after 10 cycles 1 0.469 -4.220 2 -0.400 150.000 Initial values of K-s and B-s 1 0.469 -4.220 0.000 0.000 -4.220 0.000 -4.220 2 -0.400 150.000 Final values of K-s and B-s after 10 cycles 1 0.472 -12.268 0.000 -3.527 5.880 0.000 -3.284 2 -0.400 150.000 Overall isotropic B = -3.3507
Various errors may be reported by the input-parsing routines and should be self-explanatory.
If a specific space group is requested and the required subroutines are not available then the program will stop with the error message:
SPACE GROUP UNAVAILABLE
If the limits of the program's capacity are exceeded then one of the following messages will be printed and the program will stop:
THE NUMBER OF VECTOR OR MATRIX ELEMENTS EXCEEDS AVAILABLE STORAGE OF n OR m RESPECTIVELY THE NUMBER OF ATOMS OR DISTANCES EXCEEDS AVAILABLE STORAGE OF n OR m RESPECTIVELY
It may help to read the ps version of Garib's talk at Chester ($CDOC/garib_talk.ps) or one of the references.
Terminology:
|Fo(h)| - observed amplitude of structure factor. |Fc(h)| - calculated amplitude of structure factor Io(h) - observed intensity of structure factor IO = |FO|**2 Ic(h) - calculated intensity of structure factor IC = |FC|**2 |Eo(h)| - normalised observed amplitude of structure factor. |Ec(h)| - normalised calculated amplitude of structure factor s : 2.0*sin(theta)/lambda D : <cos(2*pi*s^2*delta(r))> Luzzati 1952. sA : SigmaA = (ref - Read, Srinavasan) Theoretically,if EPS is the multiplicity for the reflection zone SigmaA = D*sqrt(SigmaP/SigmaN) where SigmaN = <Fo**2/EPS> and SigmaP = <Fc**2/EPS>. X(h) : 2|Eo|*|Ec|*sA/Var_D Var_D(h) : variance of distribution Var_D(h) : W*SigmaExpt(Eo) **2 + Epsilon * (1- sA**2) ) ; W=1 if SigmaExpt used, W=0 otherwise. Define I0(X) : 0 order first type modified Bessel function I1(X) : 1st order first type modified Bessel function cosh : hyberbolic cose (EXP(X)+EXP(-X))/2.0 sinh : hyberbolic sine (EXP(X)-EXP(-X))/2.0 tanh : hyberbolic tangent sinh(X)/cosh(X) m(h): figure of merit (m = I1(X)/I0(X) for acentric, tanh(X) for centric reflections) m(h) should equal <cos(AlphaTrue - AlphaCalc)> The program estimates SigmaA by minimising -logLikelihood calculated with normalised Eo and Ec. In REFMAC SigmaA can be estimated from the FreeR reflections and fitted by a two gaussian approximation similar to Tronrud's scaling. The figure of merit m which should equal <cos(AlphaTrue - AlphaCalc)> is calculated from Eo, Ec and SigmaA. where Eo = Fo/sqrt(EPS*SIGMAN) and Ec = Fc/sqrt(EPS*SIGMAN) Geometry: D_12_id : ideal value of bond distances D_12_mod: real value of bond distances D_13_id : ideal value of angle distances D_13_mod: real value of angle distances D_14_id : ideal value of distance between atoms connected by torsion angle D_14_mod: real value of this
Description of program
It minimises:
Ftot = Wxray *Fxray + Wgeom *Fgeom where Wgeom = 0 or 1 (Wgeom = 1 for restrained refinement) (Wgeom = 0 for unrestrained refinement) Wxray = 1/(SIGMA**2) where SIGMAis given on keyword REFI (Geometric Idealisation is equivalent to Wxray = 0) Fgeom is based on the geometric quantities refined in PROLSQ. ie: Fgeom = Wds*SUM((D_12_id-D_12_mod)**2/SIGD1**2) ===== +SUM (D_13_id-D_13_mod)**2/SIGD2**2) +SUM((D_14_id-D_14_mod)**2/SIGD3)**2 + terms for planarity, chirality etc etc... exactly as in PROTIN and PROLSQ.
Choosing of Fxray has two options so far; amplitude based least squares, and a loglikelihood residual:
LSQF defines amplitude based least-square residual. ==== SUM : sum over hkl FXRAY = SUM(Whkl*(|FO|-|FC|)**2) MLKF defines simplest version of -loglikelihood residual which comes ==== from Rice distribution for centric and acentric cases assuming there is no knowledge of the phases. Ref:?? Fxray = SUM(LLK_cen) + SUM(LLK_acen) where LLK_cen is -loglikelihood function for centric reflections LLK_acen is -loglikelihood function for acentric reflections LLK_cen = (|EO|**2+sA**2*|EC|**2)/Var_D - LN(cosh(X)) + LN(Var_D)/2 LLK_acen = (|EO|**2+sA**2*|EC|**2)/Var_D - LN(I0(X)) + LN(Var_D)
All this now uses Normalised structure factors.
Var_D = W*SigmaExpt(Eo) **2 + (1- Sum(sA(j)**2) )
where the sA(j) are the SigmaA for each independent partial structures, (W=1 if experimental sigmas used; 0 otherwise).
A two Gaussian fit to the <Eo-Ec>**2 is used to define SigmaA.
sA(j) = sA(j)0 exp(-B(j)0*s**2) ( 1-sAsolv * exp(-bsolv s**2) )
This gives a smooth curve against resolution. If no bulk solvent (ie a one Gaussian approximation) sAsolv = 0 .
The values of m and sA are best estimated using the FREE reflections. Here the sA which are a function of resolution are fitted to the FreeR reflections as a second order Gaussian. In the CCP4 version of SIGMAA the sA are deduced from the fit over resolution bins of <Eo**2> and <Ecalc**2>.
The SigmaA technique means that sA tends to 1.0 as the Rfactor falls even if this is being achieved by some unrealistic technique; loosening geometric restraints, adding too many waters etc.. Using the FreeR set helps avoid this problem ,but the number of reflections per bin were too few to give a smooth curve without interpolation. Fitting the gaussian seems a satisfactory compromise.)
The gradients are estimated from a modified difference map with coefficients {(m|FO| - D |FC|}, PHIcalc. At the end of each set of cycles an HKLOUT file is written containing h k l Sc*Fobs Fc PHIcalc FWT =(~2mFO-dFC) PHIWT DELFWt =(~ mFO-dFC) PHIDELWT Rebuilding into the FWT and DELFWT maps seems better than using classic 2Fo-Fc and Fo-Fc maps. The "missing Data" hkl terms for FWT are set = dFC , DELFWt = 0.0. This restoration of terms helps reduce systematic noise in the 2mFO-dFC maps. See example 0 for details on how to include missing data.
Scaling of Fobs to Fc now uses the Bulk solvent correction of the same type as TNT, Shelx etc. The scale is Kexp(-bs) ( 1-Ksolv exp(-bsolv))
This is excellent as long as your data is reasonable at low resolution; Dont include complete junk; eg reflections behind the Hamburg backstop. Ie - look at your xloggraph from hklplot, or some similar program which gives <F**2> v resolution. If it has one minima at ~ 5A, and a peak at 4.5A then falls away smoothly this is appropriate; if the shape is abnormal, worry about why!
The following examples are included to illustrate the various ways in which REFMAC can be run:
Completing the data to include all possible hkls. Should do this after data reduction, and certainly before using REFMAC. This is now done with the uniqueify script. Martyn - I think all this should be deleted?
#! /bin/sh # set -e # # Replace missing data with Missing Number Flags (in this case NaNs). # mtzmnf hklin $CEXAM/toxd/toxd_old.mtz hklout $CCP4_SCR/toxd_nan.mtz \ <<eof TITLE toxd data, testing LABI F1=FTOXD3 SIGF1=SIGFTOXD3 - D2=ANAU20 SIGD2=SIGANAU20 - F2=FAU20 SIGF2=SIGFAU20 - F3=FMM11 SIGF3=SIGFMM11 - F4=FI100 SIGF4=SIGFI100 END eof # # Case (1) # # Complete dataset and add free-R column. # Keep systematic absences with -s switch (though you probably wouldn't # want to do this). # uniqueify -s $CCP4_SCR/toxd_nan.mtz $CCP4_SCR/toxd-unique.mtz # # Case (2) # # Complete dataset and add free-R column. # Increase the fraction of reflections tagged with each free-R # indicator above the default 0.05 (sensible for toxd which has # small dataset). # uniqueify -p 0.1 $CCP4_SCR/toxd_nan.mtz $CCP4_SCR/toxd-unique2.mtz # # Case (3) # # First add free-R column to incomplete dataset. # (Silly thing to do - this is just to create a dataset with an existing # free-R column for illustrating use of uniqueify with -f switch.) # freerflag hklin $CCP4_SCR/toxd_nan.mtz hklout $CCP4_SCR/toxd_free.mtz << eof END eof # # Now complete with -f switch to indicate free-R column already present. # uniqueify -f FreeR_flag $CCP4_SCR/toxd_free.mtz $CCP4_SCR/toxd-unique3.mtz #
Restrained refinement with overall B-factor refinement. Method is sparse matrix method. PROTIN is run first.
#!/bin/csh -f # # Example of refinement by refmac # # # Set parameters # set CTEST=/y/programs/xtal/ccp4/examples cp $CTEST/toxd/toxd.pdb $CCP4_SCR/toxd0.pdb # uniqueify $CTEST/toxd/toxd.mtz set inmtz=toxd-unique.mtz start: set name = toxd set last = 0 set cycles = 4 set count = 0 while ($count != $cycles) echo '*******************************************************************' echo $count echo '*******************************************************************' @ curr = $last + 1 # # # Coomplete data and add freeR flag. You may not need this step # See complete_toxd.sh # # # Run protin to set up geometric restraints # PROTCOUNTS contains the number of distances, chiral centres,etc.. # PROTOUT contains atomic coordinates plus all possible restraint # pairings. # protin \ XYZIN $CCP4_SCR/${name}${last}.pdb \ PROTOUT $SCRATCH/protout.dat \ PROTCOUNTS $SCRATCH/counts.dat \ << eop !CHNNAME ID=chain iden;CHNTYP=match CHNTYP;ROFFSET=start resid CHNNAM ID B CHNTYP 1 CHNNAM ID W CHNTYP 2 CHNTYP 1 NTER 1 GLN 3 CTER 59 GLY 2 CHNTYP 2 WAT PEPP 4 SYMM 19 VDWRadii 1 CA 7 3.8 VDWCUT 5 END eop if ($status) exit # # Refmac step. Refine # PROTSCR - an abbreviated version of PROTOUT # it contains atomic coordinates plus all restraint # pairings for atoms which actually are present. # refmac: refmac \ HKLIN $inmtz \ HKLOUT $CCP4_SCR/${name}${last}.mtz \ PROTOUT $SCRATCH/protout.dat \ PROTCOUNTS $SCRATCH/counts.dat \ PROTSCR $SCRATCH/counts.scr \ XYZIN $CCP4_SCR/${name}${last}.pdb \ XYZOUT $CCP4_SCR/${name}${curr}.pdb \ << eor LABIN FP=FTOXD3 SIGFP=SIGFTOXD3 FREE=FreeR_flag LABO FC=FC PHIC=PHIC FWT=2FOFCWT PHWT=PH2FOFCWT - DELFWT=FOFCWT PHDELWT=PHFOFCWT REFI TYPE RESTrained RESOLUTION 20 1.50 REFI RESI MLKF REFI BREF ISOTropic METH CGMAT WEIGHT MATRIX 0.35 !Scaling parameters SCAL TYPE BULK NCYC 2 MONI FEW BINS 20 end eor # if ($status) exit # make maps. # # Sigmaa style 2mfo-dfc map with restored data # fft: fft \ hklin $CCP4_SCR/${name}${last}.mtz \ mapout $CCP4_SCR/2mfodfc_${name}${last}.mtz \ <<eof title Sigmaa style 2mfo-Dfc map calculated with refmac coefficients labi F1=2FOFCWT PHI=PH2FOFCWT binmapout end eof if ($status) exit # # Sigmaa style mfo-dfc map # fft \ hklin $CCP4_SCR/${name}${last}.mtz \ mapout $CCP4_SCR/mfodfc_${name}${last}.mtz \ <<eof title Sigmaa style mfo-Dfc map calculated with refmac coefficients labi F1=FOFCWT PHI=PHFOFCWT binmapout end eof if ($status) exit # @ last++ @ count++ end
Unrestrained refinement by maximum likelihood method.
#!/bin/csh -f # # Example of refinement by refmac # set CTEST=/y/programs/xtal/ccp4/examples cp $CTEST/toxd/toxd.pdb $CCP4_SCR/toxd0.pdb # uniqueify $CTEST/toxd/toxd.mtz set inmtz=toxd-unique.mtz start: set name = toxd set last = 0 set cycles = 4 set count = 0 while ($count != $cycles) echo '*******************************************************************' echo $count echo '*******************************************************************' @ curr = $last + 1 # # # Coomplete data and add freeR flag. You may not need this step # See complete_toxd.sh # # # # # # # Refmac step. Refine # refmac: refmac \ HKLIN $inmtz \ HKLOUT $CCP4_SCR/${name}${last}.mtz \ XYZIN $CCP4_SCR/${name}${last}.pdb \ XYZOUT $CCP4_SCR/${name}${curr}.pdb \ << eor LABIN FP=FTOXD3 SIGFP=SIGFTOXD3 FREE=FreeR_flag LABO FC=FC PHIC=PHIC FWT=2FOFCWT PHWT=PH2FOFCWT - DELFWT=FOFCWT PHDELWT=PHFOFCWT REFI TYPE UNREstrained RESOLUTION 20 1.50 REFI RESI MLKF REFI BREF ISOTropic #WEIGHT MATRIX 0.35 !Scaling parameters SCAL TYPE BULK NCYC 2 MONI FEW BINS 20 end eor # if ($status) exit # make maps. # # Sigmaa style 2mfo-dfc map with restored data # fft: fft \ hklin $CCP4_SCR/${name}${last}.mtz \ mapout $CCP4_SCR/2mfodfc_${name}${last}.mtz \ <<eof title Sigmaa style 2mfo-Dfc map calculated with refmac coefficients labi F1=2FOFCWT PHI=PH2FOFCWT binmapout end eof # if ($status) exit # Sigmaa style mfo-dfc map # fft \ hklin $CCP4_SCR/${name}${last}.mtz \ mapout $CCP4_SCR/mfodfc_${name}${last}.mtz \ <<eof title Sigmaa style mfo-Dfc map calculated with refmac coefficients labi F1=FOFCWT PHI=PHFOFCWT binmapout end eof if ($status) exit
Idealization. Method of minimization is conjugate gradient method
#!/bin/csh -f # # Example of refinement by refmac # # # Set parameters # set CTEST=/y/programs/xtal/ccp4/examples set crdin=$CTEST/toxd/toxd.pdb set crdout=toxd_1.pdb # # # # Run protin to set up geometric restraints # PROTCOUNTS contains the number of distances, chiral centres,etc.. # PROTOUT contains atomic coordinates plus all possible restraint # pairings. # protin \ XYZIN $crdin \ PROTOUT $SCRATCH/protout.dat \ PROTCOUNTS $SCRATCH/counts.dat \ << 'END-protin ' !CHNNAME ID=chain iden;CHNTYP=match CHNTYP;ROFFSET=start resid CHNNAM ID B CHNTYP 1 CHNNAM ID W CHNTYP 2 CHNTYP 1 NTER 1 GLN 3 CTER 59 GLY 2 CHNTYP 2 WAT PEPP 4 SYMM 19 VDWRadii 1 CA 7 3.8 VDWCUT 5 END 'END-protin ' if ($status) exit # # Refmac step. Refine # PROTSCR - an abbreviated version of PROTOUT # it contains atomic coordinates plus all restraint # pairings for atoms which actually are present. # refmac: refmac \ PROTOUT $SCRATCH/protout.dat \ PROTCOUNTS $SCRATCH/counts.dat \ PROTSCR $SCRATCH/counts.scr \ XYZIN $crdin \ XYZOUT $crdout \ << eor REFI TYPE IDEAlise REFI RESI MLKF REFI METH CGMAT NCYC 2 MONI FEW end eor if ($status) exit #
Restrained refinement with Partial contribution from hydrogens.
#!/bin/csh -f # start: set name = kak1_ set last = 1 set cycles = 4 set count = 0 while ($count != $cycles) echo '*******************************************************************' echo $count echo '*******************************************************************' @ curr = $last + 1 # # Run protin to set RESTRAINT lists. # protin: protin \ XYZIN $SCRATCH/${name}${last}.pdb \ PROTOUT $SCRATCH/kak_protout.dat \ PROTCOUNTS $SCRATCH/kak_counts.dat \ << END-protin CHNNAME ID A CHNTYP 1 CHNNAME ID B CHNTYP 2 CHNNAME ID C CHNTYP 3 CHNNAME ID D CHNTYP 4 CHNNAME ID E CHNTYP 5 CHNTYP 1 NTER 1 ALA 3 CTER 517 HIS 2 CISPRO 1 283 CHNTYP 2 NTER 1 LYS 3 CTER 3 LYS 2 CHNTYP 3 WAT # You can call a CHAIN WAT as long as there is no connectivity.. CHNTYP 4 WAT CHNTYP 5 WAT SYMMETRY 19 END END-protin if ($status) exit # # Run hgen to generate hydrogen position # hgen: hgen \ XYZIN $SCRATCH/${name}${last}.pdb \ XYZOUT $SCRATCH/hyd_${name}${last}.pdb \ <<eof HYDROGENS SEPARATE eof if ($status) exit # # Run sfall to calculate structure factors from hydrogens # sfall: sfall \ HKLIN $SCRATCH/kak_freer_unobs.mtz \ HKLOUT $SCRATCH/kak_freer_unobs_hyd.mtz \ XYZIN $SCRATCH/hyd_${name}${last}.pdb \ << eof NOSCALE MODE SFCALC XYZIN HKLIN LABI FP=F SIGFP=SIGF FREE=FreeR_flag LABO FC=FC_hyd PHIC=PHIC_hyd !Refinement parameters BINS 20 END eof if ($status) exit # # Run refmac - refinement adding in hydrogen contributions # refmac: refmac \ HKLIN $SCRATCH/kak_freer_unobs_hyd.mtz \ XYZIN $SCRATCH/${name}${last}.pdb \ HKLOUT $SCRATCH/${name}${last}.mtz \ PROTOUT $SCRATCH/kak_protout.dat \ PROTCOUNTS $SCRATCH/kak_counts.dat \ REFSCR $SCRATCH/prot_scr.mtz \ XYZOUT $SCRATCH/${name}${curr}.pdb << eop MODE HKRF LABI FP=F SIGFP=SIGF FREE=FreeR_flag FPART1=FC_hyd PHIP1=PHIC_hyd LABO FC=FC_p PHIC=PHIC_p FWT=2FOFCWT_p DELFWT=FOFCWT_p !Refinement parameters SCPART 1 MONI MANY PLAN 0.3 REFI TYPE REST REFI RESI MLKF RESO 20.0 1.2 REFI BREF ISOT METH CDIR WEIGHT MATRIX 0.5 !Scaling parameters SCAL TYPE BULK SCAL LSSC RESO 20 1.2 NCYC 5 BINS 20 END eop if ($status) exit # @ last++ @ count++ end
Restrained refinement with maximum likelihood method followed by ARPP to place waters
# #!/bin/csh -f # # A script which refined a structure # After refinement the REFMAC maps are calculated, and density # passed to ARPP to fit waters into. # # Step 1: protin and REFMAC. # # Step 2: Calculate ffts for 2mFodFc and mFodFc maps . # Step 3: Expand the maps as ARPP requires ( you need the "edges") # # Step 4: Run arpp to find the waters # #goto start # ########################################################################### ########################################################################### # Step 1: protin and REFMAC. # NCS restrains on molecules A B C AND water chains W X Y # start: set name = 'cutm_' set last = 6 set cycles = 3 set count = 0 set data = 'cute_dnpp.free.mtz' # while ($count != $cycles) @ curr = $last + 1 # # protin \ XYZIN $SCRATCH/${name}${last}.pdb \ PROTOUT $SCRATCH/cut_protout.dat \ PROTCOUNTS $SCRATCH/cut_counts.dat \ << eof TITL Cutinase monoclinic + DNPP CHNNAM ID A CHNTYP 1 CHNNAM ID B CHNTYP 1 CHNNAM ID C CHNTYP 1 CHNNAM ID W CHNTYP 2 CHNNAM ID X CHNTYP 2 CHNNAM ID Y CHNTYP 2 CHNTYP 1 NTERM 3 GLY 3 CTERM 193 ARG 2 DISUL 2 17 94 156 163 CHNTYP 2 WAT NONX 3 CHNID A B C NSPANS 1 1 200 1 NONX 3 CHNID W X Y NSPANS 1 1 200 2 SYMMETRY 4 LIST FEW END eof # if ($status) exit # refmac: refmac \ HKLIN $data \ PROTOUT $SCRATCH/cut_protout.dat \ PROTCOUNTS $SCRATCH/cut_counts.dat \ PROTSCR $SCRATCH/cut_counts.scr \ XYZIN $SCRATCH/${name}${last}.pdb \ HKLOUT $SCRATCH/${name}${curr}.mtz \ XYZOUT $SCRATCH/${name}${curr}.pdb << eop # LABI FP=FP SIGFP=SIGFP FREE=FreeR_flag LABO FC=FC PHIC=PHIC FWT=FWT DELFWT=DELFWT !Refinement parameters REFI TYPE REST REFI RESI MLKF RESO 20.0 2.05 PLAN 1 0.04 0.015 REFI BREF ISOT METH CGMAT WEIGH MATRIX 0.8 TEMP 1.0 3.0 5.0 6.0 8.0 1.0 !Scaling parameters SCAL TYPE BULK LSSC ANIS RESO 20 2.05 NCYC 4 ! cycles round refinement NCYC times before redoing PROTIN MONI FEW BINS 20 END eop # if ($status) exit # ########################################################################### ########################################################################### #---- fft 2Fo-Fc map # # # Step 2: Calculate ffts for 2mFodFc and mFodFc maps # ( this facilitates the averaging) # fft HKLIN $SCRATCH/${name}${curr}.mtz \ MAPOUT $SCRATCH/${name}${curr}_32_p21.map << eof TITLE 2FO-1FC RESOLUTION 20.0 2.05 GRID 128 128 128 XYZLIM 0 127 0 63 0 127 BINMAPOUT LABIN F1=FWT PHI=PHWT END eof # # # if ($status) exit # # fft HKLIN $SCRATCH/${name}${curr}.mtz \ MAPOUT $SCRATCH/${name}${curr}_11_p21.map \ << eof TITLE 1FO-1FC RESOLUTION 20.0 2.05 GRID 128 128 128 XYZLIM 0 127 0 63 0 127 BINMAPOUT LABIN F1=DELFWT PHI=PHDELWT END eof # # # if ($status) exit 2: # ########################################################################### ########################################################################### # Step 3: Expand the maps to fill the edges for ARPP # mapmask \ MAPIN $SCRATCH/${name}${curr}_32_p21.map \ MAPOUT $SCRATCH/${name}${curr}_32_p21.ext \ << eof SYMM P21 XYZLIM 0 128 0 64 0 128 eof # # if ($status) exit # mapmask \ MAPIN $SCRATCH/${name}${curr}_11_p21.map \ MAPOUT $SCRATCH/${name}${curr}_11_p21.ext \ << eof SYMM P21 XYZLIM 0 128 0 64 0 128 eof # #exit # if ($status) exit # # ########################################################################### ########################################################################### # Step 4: Run arpp to find the waters # # arpp: # arpp \ xyzin $SCRATCH/${name}${curr}.pdb \ MAPIN1 $SCRATCH/${name}${curr}_32_p21.ext \ MAPIN2 $SCRATCH/${name}${curr}_11_p21.ext \ xyzout $SCRATCH/temp_arp.pdb << eof SYMMETRY P21 REMOVE ATOMS 10 ANALYSE WATERS CUTSIGMA 0.50 MERGE 1.8 FIND ATOMS 60 BFACTOR 30 CHAIN Y CUTSIGMA AUTO #FIND ATOMS 100 BFACTOR 30 CHAIN Y CUTSIGMA 3 FDISTANCE NEWOLD 2.2 3.3 NEWNEW 2.5 REFINE WATERS END eof # # if ($status) exit # mv $SCRATCH/temp_arp.pdb $SCRATCH/${name}${curr}.pdb # /bin/rm *TMP*,*ARP* # @ last++ @ count++ end # exit # # echo " check log file error in one step " #
Restrained refinement with maximum likelihood method etc. An extremely complicated script incorporating averaging, arp, etc etc.
#!/bin/csh -f # # A script which refined a structure with 3 molecules in the # asymmetric unit; # After refinement the REFMAC maps are averaged, and density around one # molecule only is passed to ARPP to fit waters into. # This set of waters is expanded to include those around the other two molecules. then the process is cycled.. # Step 1: Find the rotations which convert the master molecule ( in this case C) # to overlap the other two ( here labelled A & B) # # Step 2: protin and REFMAC. # NCS restrains on molecules A B C AND water chains W X Y # # Step 3: Calculate ffts for 2mFodFc and mFodFc maps in P1. # ( this facilitates the averaging) # Step 4: Extract atoms for chains C and Y from refmac output coordinates. # Use ncs mask to define a mask round Molecule C and its associated waters Y # # Step 5: Average 2mFodFc and mFodFc maps over this mask. # Output two maps - wrkout only covers the masked region. # mapout is a map expanded to the volume # given on the MAPIN limits ( usually P1) # It contains the averaged density in the # volume under the mask, and its non # crystallographic equivalents, and the # unmodified density everywhere else. # # Step 6: Expand the wrkmap to lie within a P1 cell with zeros padded # where there is no density. ( ARPP requires a full cell) # # Step 7: Run arpp in P1 to find the common waters which should show up # in the averaged density. # # Step 8: Rebuild a coordinate set containing molecules A B C plus # the new ARPP waters, plus those generated by applying the # ncs symmetry to these. # ( Use grep and pdbset) goto start ########################################################################### ########################################################################### # Get rotations to average C coordinates over A and B # I assume these wont change during cycles... # Step 1: Find the rotations which convert the master molecule ( in this case C) # to overlap the other two ( here labelled A & B) # lsqkab \ workcd cutm_0.pdb \ refrcd cutm_0.pdb \ << eof FIT WRES 1 to 193 WCHAIN C MATCH RRES 1 to 193 RCHAIN A OUTPUT RMS END eof if ($status) exit # lsqkab \ workcd cutm_0.pdb \ refrcd cutm_0.pdb \ << eof FIT WRES 1 to 193 WCHAIN C MATCH RRES 1 to 193 RCHAIN B OUTPUT RMS END eof if ($status) exit # ########################################################################### ########################################################################### # Check the transformations are correct. They are! cutm_0_A.pdb is the same # as the A chain cordinates in cutm_0.pdb.. pdbset: pdbset \ xyzin $SCRATCH/cutm_0_C.pdb \ xyzout $SCRATCH/cutm_0_A.pdb \ << eof ROTATE EULER 9.70152 -119.69961 7.80144 SHIFT 29.55211 8.54860 19.71765 CHAIN W eof if ($status) exit # pdbset \ xyzin $SCRATCH/cutm_0_C.pdb \ xyzout $SCRATCH/cutm_0_B.pdb \ << eof ROTATE EULER 175.90645 -120.30214 172.70070 SHIFT -1.62819 4.70139 35.11710 CHAIN X eof if ($status) exit # exit # ########################################################################### ########################################################################### # Step 2: protin and REFMAC. # NCS restrains on molecules A B C AND water chains W X Y # start: set name = 'cutm_' set last = 6 set cycles = 3 set count = 0 set data = 'cute_dnpp.free.mtz' # while ($count != $cycles) @ curr = $last + 1 # # protin \ XYZIN $SCRATCH/${name}${last}.pdb \ PROTOUT $SCRATCH/cut_protout.dat \ PROTCOUNTS $SCRATCH/cut_counts.dat \ << eof TITL Cutinase monoclinic + DNPP CHNNAM ID A CHNTYP 1 CHNNAM ID B CHNTYP 1 CHNNAM ID C CHNTYP 1 CHNNAM ID W CHNTYP 2 CHNNAM ID X CHNTYP 2 CHNNAM ID Y CHNTYP 2 CHNTYP 1 NTERM 3 GLY 3 CTERM 193 ARG 2 DISUL 2 17 94 156 163 CHNTYP 2 WAT NONX 3 CHNID A B C NSPANS 1 1 200 1 NONX 3 CHNID W X Y NSPANS 1 1 200 2 SYMMETRY 4 LIST FEW END eof # if ($status) exit # refmac: refmac \ HKLIN $data \ PROTOUT $SCRATCH/cut_protout.dat \ PROTCOUNTS $SCRATCH/cut_counts.dat \ PROTSCR $SCRATCH/cut_counts.scr \ XYZIN $SCRATCH/${name}${last}.pdb \ HKLOUT $SCRATCH/${name}${curr}.mtz \ XYZOUT $SCRATCH/${name}${curr}.pdb << eop # LABI FP=FP SIGFP=SIGFP FREE=FreeR_flag LABO FC=FC PHIC=PHIC FWT=FWT DELFWT=DELFWT !Refinement parameters REFI TYPE REST REFI RESI MLKF RESO 20.0 2.05 PLAN 1 0.04 0.015 REFI BREF ISOT METH CGMAT WEIGH MATRIX 0.5 # (default) TEMP 1.0 3.0 5.0 6.0 8.0 1.0 !Scaling parameters SCAL TYPE BULK LSSC ANIS RESO 20 2.05 NCYC 4 ! cycles round refinement NCYC times before redoing PROTIN MONI FEW BINS 20 END eop # if ($status) exit # ########################################################################### ########################################################################### #---- fft 2Fo-Fc map in P1 cell ( needed for averaging) # # # Step 3: Calculate ffts for 2mFodFc and mFodFc maps in P1. # ( this facilitates the averaging) # fft HKLIN $SCRATCH/${name}${curr}.mtz \ MAPOUT $SCRATCH/${name}${curr}_32_p1.map << eof TITLE 2FO-1FC FFTSYMMETRY P1 GRID 128 128 128 XYZLIM 0 127 0 127 0 127 BINMAPOUT LABIN F1=FWT PHI=PHWT END eof # # # if ($status) exit # # fft HKLIN $SCRATCH/${name}${curr}.mtz \ MAPOUT $SCRATCH/${name}${curr}_11_p1.map \ << eof TITLE 1FO-1FC FFTSYMMETRY P1 GRID 128 128 128 XYZLIM 0 127 0 127 0 127 BINMAPOUT LABIN F1=DELFWT PHI=PHDELWT END eof # # # if ($status) exit 2: ########################################################################### ########################################################################### # # Remove A B W and X chains from REFMAC output coordinates grep -i -v " b " $SCRATCH/${name}${curr}.pdb > $SCRATCH/junk1.pdb grep -i -v " a " $SCRATCH/junk1.pdb > $SCRATCH/junk2.pdb grep -i -v " w " $SCRATCH/junk2.pdb > $SCRATCH/junk3.pdb grep -i -v " x " $SCRATCH/junk3.pdb > $SCRATCH/${name}${curr}_CY.pdb # # make a mask from the CY atoms only # Step 4: Use ncs mask to define a mask round Molecule C # and its associated Waters Y # rot0: ncsmask \ xyzin $SCRATCH/${name}${curr}_CY.pdb \ mskout $SCRATCH/${name}${curr}_CY.msk \ <<eof RADIUS 4 SYMM P1 GRID 128 128 128 eof # # if ($status) exit # ########################################################################### ########################################################################### # # Step 5: Average 2mFodFc and mFodFc maps over this mask. # Output two maps - wrkout only covers the masked region. # mapout is a map expanded to the volume # given on the MAPIN limits ( usually P1) # It contains the averaged density in the # volume under the mask, and its non # crystallographic equivalents, and the # unmodified density everywhere else. # # rot: # wrkout map - Averaged map within the mask only - # not expanded. One molecule # mapout map - Averaged map expanded by symm ops to # cover the extent of mapin- three ncs *nsym mols. maprot \ MAPIN $SCRATCH/${name}${curr}_32_p1.map \ mskin $SCRATCH/${name}${curr}_CY.msk \ wrkout $SCRATCH/${name}${curr}_32_CY.map \ mapout $SCRATCH/${name}${curr}_32_all.map \ <<eof MODE BOTH SCALE 0.3333 AVERAGE 3 # ROTATE EULER 0 0 0 TRANSLATE 0 0 0 ROTATE EULER 9.70152 -119.69961 7.80144 TRANSLAT 29.55211 8.54860 19.71765 ROTATE EULER 175.90645 -120.30214 172.70070 TRANSLAT -1.62819 4.70139 35.11710 END eof # if ($status) exit # Now the difference map maprot \ MAPIN $SCRATCH/${name}${curr}_11_p1.map \ mskin $SCRATCH/${name}${curr}_CY.msk \ wrkout $SCRATCH/${name}${curr}_11_CY.map \ mapout $SCRATCH/${name}${curr}_11_all.map \ <<eof MODE BOTH SCALE 0.3333 AVERAGE 3 # ROTATE EULER 0 0 0 TRANSLATE 0 0 0 ROTATE EULER 9.70152 -119.69961 7.80144 TRANSLAT 29.55211 8.54860 19.71765 ROTATE EULER 175.90645 -120.30214 172.70070 TRANSLAT -1.62819 4.70139 35.11710 END eof # # if ($status) exit # ########################################################################### ########################################################################### # Step 6: Expand the wrkmap to lie within a P1 cell with zeros padded # where there is no density. ( ARPP requires a full cell) # # Pad - work map to fill P1 cell, move it to 0-1,0-1,0-1 # ready for arpp # mapmask \ MAPIN $SCRATCH/${name}${curr}_32_CY.map \ MAPOUT $SCRATCH/${name}${curr}_32_CY_p1.map \ << eof PAD 0.0 SYMM P1 XYZLIM 0 128 0 128 0 128 eof # # if ($status) exit # mapmask \ MAPIN $SCRATCH/${name}${curr}_11_CY.map \ MAPOUT $SCRATCH/${name}${curr}_11_CY_p1.map \ << eof PAD 0.0 SYMM P1 XYZLIM 0 128 0 128 0 128 eof # #exit # if ($status) exit # # ########################################################################### ########################################################################### # Step 7: Run arpp to find the common waters which should show up # in the averaged density. # # arpp: # arpp \ xyzin $SCRATCH/${name}${curr}_CY.pdb \ MAPIN1 $SCRATCH/${name}${curr}_32_CY_p1.map \ MAPIN2 $SCRATCH/${name}${curr}_11_CY_p1.map \ xyzout $SCRATCH/temp_arp.pdb << eof SYMMETRY 1 REMOVE ATOMS 10 ANALYSE WATERS CUTSIGMA 0.50 MERGE 1.8 FIND ATOMS 60 BFACTOR 30 CHAIN Y CUTSIGMA AUTO #FIND ATOMS 100 BFACTOR 30 CHAIN Y CUTSIGMA 3 FDISTANCE NEWOLD 2.2 3.3 NEWNEW 2.5 REFINE WATERS END eof # # if ($status) exit # ########################################################################### ########################################################################### # Step 8: Rebuild a coordinate set containing molecules A B C plus # the new ARPP waters, plus those generated by applying the # ncs symmetry to these. # grep -i wat $SCRATCH/temp_arp.pdb > $SCRATCH/temp_arp_Y.pdb # coordinates from refmac - # # output CHAINS A B C of the output coordinates from refmac to # $SCRATCH/${name}${curr}_nowat.pdb # grep -i -v wat $SCRATCH/${name}${curr}.pdb > $SCRATCH/${name}${curr}_nowat.pdb # Expand the Y waters to lie near the A molecule; call them W ... pdbset \ xyzin $SCRATCH/temp_arp_Y.pdb \ xyzout $SCRATCH/temp_arp_W.pdb \ << eof ROTATE EULER 9.70152 -119.69961 7.80144 SHIFT 29.55211 8.54860 19.71765 CHAIN W eof # if ($status) exit # Expand the Y waters to lie near the B molecule; call them X ... pdbset \ xyzin $SCRATCH/temp_arp_Y.pdb \ xyzout $SCRATCH/temp_arp_X.pdb \ << eof ROTATE EULER 175.90645 -120.30214 172.70070 SHIFT -1.62819 4.70139 35.11710 CHAIN X eof # if ($status) exit # # # Put them all back together.. cat $SCRATCH/${name}${curr}_nowat.pdb $SCRATCH/temp_arp_W.pdb $SCRATCH/temp_arp_X.pdb $SCRATCH/temp_arp_Y.pdb > $SCRATCH/temp_arp.pdb mv $SCRATCH/temp_arp.pdb $SCRATCH/${name}${curr}.pdb # #/bin/rm *TMP* # @ last++ @ count++ end # exit #
Restrained refinement with maximum likelihood method etc. 3A data requires fixing of protein Bfactor and BULK scales
#!/bin/csh -f # set verbose set name = piitrig- set last = 0 set cycles = 4 set count = 0 while ($count != $cycles) @ curr = $last + 1 # #goto refmac protin: protin XYZIN $SCRATCH/$name$last.pdb \ PROTOUT $SCRATCH/${name}_protout.dat \ PROTCOUNTS $SCRATCH/${name}_counts.dat \ << 'END-protin' TITLE trigonal pII at 3.0A SYMMETRY 154 CHNNAME ID A CHNTYP 1 CHNNAME ID B CHNTYP 1 CHNNAME ID C CHNTYP 1 CHNNAME ID W CHNTYP 2 CHNTYP 1 NTER 1 MET 3 CTER 95 ASP 2 CHNTYP 2 WAT NONX 3 CHNID A B C NSPANS 1 1 95 1 LIST FEW PEPP 5 END 'END-protin' if ($status) exit # refmac: refmac \ HKLIN pii_154free.mtz \ HKLOUT $SCRATCH/$name$last.mtz \ PROTOUT $SCRATCH/${name}_protout.dat \ PROTCOUNTS $SCRATCH/${name}_counts.dat \ PROTSCR $SCRATCH/${name}_counts.scr \ XYZIN $SCRATCH/$name$last.pdb \ XYZOUT $SCRATCH/$name$curr.pdb \ << 'END-sfrk' LABIN FP=FP SIGFP=SIGFP FREE=FreeR_flag !Refinement parameters REFI TYPE RESTrained RESOLUTION 20 3.0 REFI RESI MLKF REFI BREF ISOTropic METH CDIR WEIGHT MATRIX 0.1 DAMP 0.5 0.5 !Scales shifts to avoid large shifts !Scaling parameters SCAL TYPE BULK LSSC FIXBulk SCBUlk -0.75 BBULk 70 ! Fixes bulk solvent parameters SCAL LSSC ANISO APPLy OBSErved !Scales aniso B and applies to observed SCAL BAVERAGE 30.0 ! Keeps average B of molecule constatnt BFAC 1 2.0 2.5 3.0 4.5 NCYC 4 MONI FEW BINS 20 LABO FC=FC PHIC=PHIC FWT=2FOFCWT DELFWT=FOFCWT 'END-sfrk' if ($status) exit # @ last++ @ count++ end
Example of rigid body refinement in refmac. Ordinary case with several domains. Side chains excluded from refinement
#!/bin/csh -f # ################################################################### ##################################################################### set name = cytc_refmac_ set inmtz=/y/ccp4/cytc/pseudo_alc_cprime1_sc_feph_sq_sf2_sfx_sq_a+p-unique_pt2p.mtz set last = 0 set cycles = 1 set count = 0 while ($count != $cycles) @ curr = $last + 1 refmac \ HKLIN $inmtz \ HKLOUT ${name}${curr}.mtz \ XYZOUT ${name}${curr}.pdb \ XYZIN ${name}${last}.pdb \ << eop LABI FP=Fnata SIGFP=SIGFnata FREE=FreeR_flag ! !Refinement parameters ! REFI TYPE RIGID RESI MLKF RESO 15 2.0 ! Maximum likelihood refinement. Resolution limit 15 to 2.0 A !Scaling parameters ! SCALe TYPE BULK LSSC ANIS FIXBulk BBULk 70.0 SCBUlk -0.75 ! Fix bulk solvent parameters for LSQ scaling SCAL MLSC FIXBulk BBULk 100.0 SCBUlk -0.1 ! Fix bulk solvent parameters for ML scaling ! !Rigid body parameters ! RIGIdbody NCYCle 10 ! Number of cycles for rigid body refi RIGIdbody GROUp 1 FROM 2 A TO 32 A EXCLU SCHAIns ! Define domains. Exclude side chains RIGIdbody GROUp 2 FROM 38 A TO 55 A EXCLU SCHAIns RIGIdbody GROUp 3 FROM 76 A TO 99 A EXCLU SCHAIns RIGIdbody GROUp 4 FROM 101 A TO 126 A EXCLU SCHAIns RIGIDbody GROUp 5 FROM 56 A to 75 A EXCLU SCHAins RIGIDbody PRINt EULErangles MATRix ! Print Euler angles and Matrices MONI MANY END eop if ($status) exit # @ last++ @ count++ end
Same problem but now using experimental phases.
#!/bin/csh -f set inmtz=/y/ccp4/cytc/pseudo_alc_cprime1_sc_feph_sq_sf2_sfx_sq_a+p-unique_pt2p.mtz set name = cytc_refmac_ set last = 0 set cycles = 1 set count = 0 while ($count != $cycles) @ curr = $last + 1 refmac \ HKLIN $inmtz \ HKLOUT ${name}${curr}.mtz \ XYZOUT ${name}${curr}.pdb \ XYZIN ${name}${last}.pdb \ << eop LABI FP=Fnata SIGFP=SIGFnata FREE=FreeR_flag - HLA=HLA_pt2 HLB=HLB_pt2 HLC=HLC_pt2 HLD=HLD_pt2 LABO FC=FC_DP5 PHIC=PHIC_1 FWT=2FOFCWT DELFWT=FOFCWT ! REFI PHASE BBLUrring 30.0 SCBLurring 0.9 ! " Blur" (ie Scale down) FOMs REFI PHASE ! use phased Fo Fc differences for sigmaA estimation ! REFI TYPE RIGI RESI MLKF RESO 15 2.0 ! Maximum likelihood refinement ! !Scaling parameters SCALe LSSC FIXBulk BBULk 70.0 SCBUlk -0.75 ! Fix bulk solvent parameters for LSQ scaling ! SCAL MLSC WORK ! Use working reflections for sigmaA estimation. ! Useful at the very early stages of refinement SCAL MLSC FIXBulk BBULk 100.0 SCBUlk -0.1 ! Fix bulk solvent parameters for ML scaling ! !Rigid body parameters ! RIGIdbody NCYCle 10 ! Number of cycles for rigid body refi RIGIdbody GROUp 1 FROM 2 A TO 32 A EXCLU SCHAIns ! Define domains. Exclude side chains RIGIdbody GROUp 2 FROM 38 A TO 55 A EXCLU SCHAIns RIGIdbody GROUp 3 FROM 76 A TO 99 A EXCLU SCHAIns RIGIdbody GROUp 4 FROM 101 A TO 126 A EXCLU SCHAIns RIGIDbody GROUp 5 FROM 56 A to 75 A EXCLU SCHAins RIGIDbody PRINt EULErangles MATRix ! Print Euler angles and Matrices MONI FEW END eop if ($status) exit # @ last++ @ count++ end
Example of using experimental phase information . Very bad model ( RMS error 2A)
#!/bin/csh -f # #!/bin/csh -f #set verbose set name = tnc_test_ set last = 0 set cycles = 100 set count = 0 while ($count != $cycles) @ curr = $last + 1 #goto refmac # protin \ XYZIN ${name}${last}.pdb \ << END-protin CHNNAM ID 5 CHNTYP 1 CHNTYP 1 NTERM 2 SER 3 CTERM 162 GLY 2 SYMMETRY 154 LIST FEW TITLE TNC very bad model END END-protin if ($status) exit refi: date refmac: refmac \ HKLIN tnc_test.mtz \ HKLOUT ${name}${curr}.mtz \ XYZOUT ${name}${curr}.pdb \ XYZIN ${name}${last}.pdb \ << eop LABI FP=Fnati SIGFP=SIGFnati FREE=FreeR_flag - HLA=HLA_t HLB=HLBt HLC=HLC_t HLD=HLD_t !HL coefficients. It tells program !use phased refinement #PHIB=PHIDM_t FOM=FOMDM_t ! Option if there are no HL coefficients. # LABO FC=FC_main PHIC=PHIC_main FWT=2FOFCWT DELFWT=FOFCWT !Labels for output file !Refinement parameters REFI TYPE REST RESO 12 2.8 !restrained refinement. resolution for refinement REFI RESI MLKF ! Use maximum likelihood residual for refinement REFI BREF OVER ! Refine overall B-values REFI METH CGMAT ! Sparse matrix for minimisaion WEIG MATR 0.15 ! Weight geometry and x-ray according to diagonal terms of second ! derivative matrix DAMP 0.5 0.5 ! Scales down shifts to avoid large shifts. Sometimes this value ! should be decreased. Especially at early stages of refi REFI PHASed BBLUrring 20.0 SCBLurring 0.7 ! You may need to play with blurring ! factors if you think the phase ! weighting is overestimated. !Scaling parameters SCAL TYPE BULK LSSC ANISO !Aniso scaling. Gaussian bulk solvent ! correction SCALe LSSC FIXBulk BBULk 70.0 SCBUlk -0.75 ! Fix bulk solvent parameters - often ! ill determined for resolutions < 2.8A SCAL MLSC FIXBulk BBULk 100.0 SCBUlk -0.50 ! Fix second gaussian parameters for ! SigmaA estimaton NCYC 5 ! Number of refinement cycles MONI FEW ! Give just overall statistics BINS 20 ! Number of resolution bins END eop if ($status) exit #exit # @ last++ @ count++ end
Example of refinement of individual anisotropic B values. Hydrogens must be added prior to refinement.
#!/bin/csh -f #set verbose # ########################################################################### # You have to edit labi, reso, name, data, spgr and protin # # # # data - mtz file # # name - name of coordinate file # # reso - resolution of data or resoloution limits you want to refine # # labi - input file labels # # spgr - space group of crystal # # protin - program which makes list of restraints # # # #######Following things should be optimised according to data used # # # # WEIG MATR <number> weighting x-ray and geometry. At low resolution# # it should be small (sometimes 0.1) at high # # resolution high (sometimes 6.0 or even higher) # # SPHEricity restraints on sphericity of atoms. Larger # # less spherical # # # # NB: with current version only CDIR (conjugate direction) method of # # minimisation is active # # # ########################################################################### set name = 'p1lys_' set last = 0 set cycles = 3 set count = 0 set data = 'p1lys_val-unique' set reso = '100 0.92' set spgr = '1' set labi = 'FP=FO SIGFP=SIGFO FREE=FreeR_flag' while ($count != $cycles) @ curr = $last + 1 #goto refmac # # # Generate hudrogens and calculate contribution of them. ################################################################### hgen: hgen \ XYZIN ${name}${last}.pdb \ XYZOUT ${name}_hydr.pdb \ << eof HYDROgens SEPArate ! Hydrogens in seperate file eof # # # Calculate contribution from hydrogens. NOSCALE option should # be used to preserve absolute scale of structure factros of # hydrogens ################################################################### sfall: sfall \ HKLIN ${data}.mtz \ HKLOUT ${data}_hydr.mtz \ XYZIN ${name}_hydr.pdb \ << eof NOSCALE MODE SFCALC XYZIN HKLIN LABI $labi LABO FC=FC_hyd PHIC=PHIC_hyd !Refinement parameters RESO $reso BINS 20 END eof # # Run protin to generate list of restraints ################################################################### protin \ XYZIN ${name}${last}.pdb \ << eof TITL PAV2 2.01A data RXIS at 120K CHNNAME ID A CHNTYP 1 ROFFSET 0 CHNNAME ID W CHNTYP 2 ROFFSET 0 CHNTYP 1 NTER 1 LYS 3 CTER 129 LEU 2 CHNTYP 2 WAT LIST FEW #[few]/some (for >20A)/all PEPP 5 #no. of atoms restrained to be in a plane [5] SYMMETRY $spgr LIST FEW END eof refi: date refmac: # # Run refmac to refine structure. It is refinement with # anisotropic B values ################################################################### refmac \ HKLIN ${data}_hydr.mtz \ HKLOUT ${name}${curr}.mtz \ XYZOUT ${name}${curr}.pdb \ XYZIN ${name}${last}.pdb \ << eop LABI $labi FPART1=FC_hyd PHIP1=PHIC_hyd LABO FC=FC_main PHIC=PHIC_main FWT=2FOFCWT DELFWT=FOFCWT !Labels for output file !Refinement parameters REFI TYPE REST RESO $reso !restrained refinement. resolution for refinement REFI RESI MLKF ! Use maximum likelihood residual for refinement REFI BREF ANISotropic ! Refine individual anisotropic B-values REFI METH CDIR ! Conjugate direction method. WEIG MATR 4.0 ! Weight geometry and x-ray according to diagonal ! terms of second derivative matrix. SPHERicity 5.0 ! Restraints on sphericity of atoms. Default is 2.0 ! but is not good !Scaling parameters SCAL TYPE BULK LSSC ANISO ! Aniso scaling to remove overall crystallographic ! mode. Gaussian bulk solvent correction BLIM 1.0 150.0 ! Limit of B values. For anisotropic refinement it ! is limit of eignevalues of anisotropic B tensor NCYC 5 ! Number of refinement cycles MONI FEW ! Give just overall statistics BINS 20 ! Number of resolution bins END eop if ($status) exit #exit # @ last++ @ count++ end
Garib Murshudov, York University.
This document has been prepared by: Garib Murshudov, Eleanor Dodson, Maria Turkenburg.