EMAN Reconstruction - Phase 3

In this step we seek to refine our preliminary model to generate a final, (hopefully) high resolution, structure.

Now, time for a quick discussion of some new features in EMAN. Previously these instructions would give you a simple refine command to run and pretty much leave things at that. The current version of EMAN contains a number of new options which can be used to push the resolution limits of your reconstruction slightly further than the single canonical refine command. However, it might take a little experimentation to achieve optimal results. These options are discussed towards the end of this page. They are not included in the default refine command presented below. Also, if your job requires a lot of CPU time, consider just running a few refinement cycles with these basic options, then considering the options presented in phase 4 (which may slow the refinement down, and aren't all that useful in early stages of refinement).

1. Run the 'refine' command.

$1c

At this point you only need 2 files your working directory (start.hed/img are considered one file). If you'd like you can remove the others, but it's not really necessary. One exception, if you used a 1D structure factor file when fitting your CTF curves (which hopefully you did), you also need the structure factor file in this directory. start.hed/img contains the raw particle data (preprocessed with ctfit), and threed.0a.mrc contains the preliminary 3d model you generated in the last step. The refine command is fairly complecated and has a lot of options. For now, we'll start with a basic refine command which should work well to get you started. Generally it's a good idea to run this command from a screen window, or submit it as a background job with nohup and '&' (see man page). You can monitor the progress of the run with eman, by looking at the command history. refine will run numerous other eman programs, much like a script, while it executes. Each of these individual programs will appear in the program history. The end of one refinement iteration is marked by the completion of a proc3d command. If you have a linux cluster or several workstations to use for parallel processing, please read the EMAN manual for how to set this up properly.

Here's the refine command to run for your particle: $1n

At this point you only need 2 files your working directory. If you'd like you can remove the others, but it's not really necessary. start.hed/img contains the raw particle data (filtered earlier), and threed.0a.mrc contains the preliminary 3d model you generated in the last step. The refine command is fairly complecated and has a lot of options. For now, we'll start with a basic refine command which should work well in most cases. Generally it's a good idea to run this command from a screen window, or submit it as a background job with nohup and '&' (see man page). You can monitor the progress of the run with eman, by looking at the command history. refine will run numerous other eman programs, much like a script, while it executes. Each of these individual programs will appear in the program history. The end of one refinement iteration is marked by the completion of a proc3d command.

Here's the refine command to run for your particle: $2

The mask= value should accurately reflect the maximum radius (in pixels) of your particle. In previous versions a value of -1 indicated 1/2 the box size, but now you must specify the actual radius to use. Obviously it is better for this number to be too large rather than too small.

Look at the refine documentation for info on all of the parameters, but there are a few options you should be aware of:

The 'refine' option will increase the precision of the 2D alignments during classification. This can have a substantial impact for final high-resolution refinements, and in some cases can correct problems with misclassification. There is some speed penalty, but aside from this, this option can only help. ie - aside from speed, there is never a reason NOT to use it.
The 'phasecls, fscls and dfilt' options control the algorithm used for similarity comparisons during classification and particle pruning. This is quite important. The default routine used by EMAN if none of these options is specified is sensitive to resolution mismatches between the 3D model and the raw particles. This can lead to deterministic misclassification of particles. You are strongly encouraged to use one of these 3 options to prevent this (only one of these 3 may be used at a time). dfilt seems to produce the best overall results, but can only be used with ctfcw=. phasecls and fscls can be used at any time. phasecls uses the mean phase error as a criteria, meaning Fourier amplitudes are ignored. fscls uses Fourier Ring Correlation as a criteria, so amplitude information is included to an extent. In theory fscls should be better than phasecls, your mileage may vary.
Note that ctfcw= performs much better than ctfc=, but requires the additional work of creating a structure factor file. This is STRONGLY recommended. See the FAQ for details.
the 'setsf=,[]' option allows you to impose the 1D structure factor file on your 3D model after each iteration (if you have one). If you are really pushing for a final, optimal structure, this option may be useful. Clearly this can only be used in conjunction with 'ctfcw='. While EMAN's CTF correction routine is quite good, there are limits to how well it can perform. This option basically performs a final 'perfect' 1D filter on your 3D model to make sure the 1D structure factor is accurate. If you are using an estimated structure factor rather than a solution scattering result, this option may or may not be a good idea. specifies the resolution of the low-pass filter applied to the structure factor before applying it to the model. This should be set to the final convergence resolution. You may also perform an optional Gaussian high-pass filter. Naturally this option replaces the Wiener filter that EMAN applies with the ctfcw option. Note that if this option is used too early in a reconstruction, or when the resolution of the reconstruction is substantially nonisotropic, this option may induce significant artifacts. It should always be used with some caution.
The 'sep=2' option will cause each particle to be included in the 2 best matching classes. This can help to handle the problem of uncertainty in particle classification. Since nearly 1/2 of the data will be thrown out during alignment/averaging, this shouldn't pose any real problem. If you disapprove of this, remove 'sep=2' and increase classkeep to ~1.
The 'tree=2' or 'tree=3' options are used to help classification run faster. See the refine man page for details. They are generally used only when 'ang=' is quite small.
The '3dit' and '3dit2' options are deprecated, and should no longer be used.
The single number following refine tells the command how many refinement iterations to run. For each iteration, a classes.n.hed/img, a threed.n.mrc, a threed.na.mrc, and sometimes an x.n.mrc file will be generated (as well as a bunch of temporary files which are overwritten each iteration). If you kill the refine job while it's running, you can restart it roughly where it left off just by running the same command again. The number of iterations is the TOTAL number of iterations to run, not the number of additional iterations. That is, if your run refine 5 ... for 3 iterations, then kill it and run refine 5 ... again, it will only do 2 more iterations. If you run a couple of iterations, and decide to change some of the parameters, it's ok to kill the job, then run it again with the new parameters. At most, you will loose 1/2 an iteration of processing time. If you want to restart the refinement from scratch, you MUST delete all of the classes.*.hed/img files AND all of the threed.*.mrc files (EXCEPT threed.0a.mrc).

Next we need to discuss some critical issues with respect to refinement. The 'standard' similarity measure used by EMAN to compare noisy particles to projections is a optimized real-space variance. This approach has one significant problem. If the model used to generate projections has a higher or lower resolution than the particles themselves, there is a possibility that the particles will be biased slightly away from their true orientations. This effect is most noticable for cylindrical particles, such as GroEL. If you have a large number of side views of the particle, you may find that their orientation actually 'drifts' to an orientation a little away from the side view. This normally will be a subtle effect that won't be visible unless you are refining to very high resolution. However, clearly it isn't desirable. There are several alternate similarity measures that can be used that aren't susceptible to this problem. My current recommendation for a final high resolution refinement would be to use 'refine', 'setsf=' and 'dfilt', but this may be a highly biased opinion. You may have to experiment to find which works best for any given particle (these are all options to the refine command):

phasecls - This option will use SNR weighted phase residual as a similarity measure. Since it ignores amplitudes, it isn't susceptible to any resolution/filtration dependencies. However, since it makes no real use of amplitudes, it doesn't fully use the available information.
fscls - This is a novel use of the Fourier shell correlation function (actually Fourier ring correlation in this case) used normally as a resolution measure. In a sense, this is just a refinement of the phase residual option. It calculates a correlation coefficient for each Fourier ring, then produces a SNR weighted average. In this way, it makes use of both phases and amplitudes, but since each shell is independently normalized, it is not susceptible to differences in resolution between the 2 models as real-space method are. Since this is a novel approach with no history (that I am aware of) in the field, this option should be considered experimental. It has, however, behaved well in tests on real data.
dfilt - This option is really a modifier for the default similarity measure (optimized variance). It calculates the 1D structure factor of each individual particle, then applies it to each projection before alignment/comparison. While this may seem like an odd thing to do, it has shown very promising results in initial testing. In fact, so far, this option has produced the best results for GroEL. There are a number of possible future improvements to this method, such as a noise correction to make it more of a true 'matched filter'. We will try to characterize this method better in future versions, but even now it is worth a try.
refine - Normally the EMAN 2D alignment is accurate only to +- 1 pixel. This option makes the refinement accurate to a small fraction of a pixel. This can have a significant impact on classification when the noise level is high, but there is a significant speed penalty. For initial low resolution refinements it is probably a waste, but when going for that final high resolution refinement it should be used. This option may be combined with any of the above criteria.
slow - As the name implies, this performs a very slow and accurate 2D alignment. It should be combined with the refine option above. It is not completely clear if this option always produces better results than the refine option alone, but it is certainly slower.

There are also a number of additional refine options, some of which are still somewhat experimental. Once you have a good feel for doing basic refinements, you can experiment with these other parameters.

ctfcw= If you had a structure factor file when determining CTF parameters, you should use the ctfcw option instead of the ctfc option. Rather than a resolution, you specify the filespec containing the structure factor. This will perform a true Wiener filter on the data using the accurate contrast estimate possible when the structure factor is known. This will produce much better results than ctfc=.
shrink= This option will perform classification and alignment on scaled down versions of the particles/projections. This can produce a dramatic speedup in refinement, and, in cases of high resolution refinement (A/pix < ~3), it can actually improve classification accuracy. Generally 'shrink=2' will be adequate. You can experiment with larger numbers, but we make no guarentees.
tree=<2 or 3> This option will speed up a refinement, but doesn't actually improve the results. This option will perform classification in 2 steps. First it will determine the Euler angles roughly, then it will 'fine-tune' them. This can produce up to a factor of 10 speed increase in classification. One warning, however, not all particles are well-suited to this sort of speed-up. In some cases, it will cause some incorrect classifications. '3' will increase the speed more than 2, but should only be used for small angular increments (ang=).
amask=,, This option can eliminate virtually all artifacts outside the 3D model, although there can be some risk of chopping off any very low density features that might exist near the periphery of the model. This option uses the 'automask2' option in proc3d to generate a custom mask around the 3D model. This also enables projection specific masking of class-averages. This option can dramatically improve the convergence and resolution of a model, and can prevent the build-up of noise based iterative artifacts. Note that to use this option, you MUST also use the 'xfiles=' option. We strongly encourage the use of this option in most cases.
usefilt This is a new option in refine. Normally, the raw particles are used for classification and alignment. However, for optimal results, actually these tasks should be performed on filtered images (Wiener filters are one good choice). This option will assume you have generated a file called 'start.filt.img' which contains filtered versions of the images in 'start.hed'. You may use any filter you wish, but 'proc2d start.hed start.filt.hed wiener= hp=3' seems to work well. Note that this type of filter requires a structure factor file as discussed above.