Every other spring, the NCMI hosts a workshop on single particle reconstruction and post-processing (at least we have for a number of years thru March 2007). This workshop covers all of the basics of performing a CTF corrected single particle reconstruction and analyzing the results, and historically has been free of charge (though you must cover your own travel expenses). Check the NCMI webserver for more details on the next available workshop.
If you're in a hurry, and the next workshop isn't imminent, just contact sludtke@bcm.edu with your questions (if that fails, you can try stevel@alumni.caltech.edu). Simple questions can be answered by email or phone, and other arrangements can be made in specific cases where real training is required.
This is a very complicated issue, and depends a lot on your relative skill in using the different packages as well as your understanding of some very complicated issues. The first issue is what your friend meant by 'better resolution'. People in CryoEM often confuse resolvability with resolution. These are two VERY different terms. In single particle processing, resolution typically refers to the results of a test where the data is split into even and odd halves, then reconstructions are compared using the Fourier shell correlation curve. Fundamentally this is measuring the signal to noise ratio in your model. That is, it measures at what resolution your model begins to look noisy. Resolvability is a measure of how close together two blobs can get in your model and still be discerned as two distinct objects. The trick here is that you can apply an arbitrary low pass filter (blurring) to your structure, thus making the resolvability much worse, while having absolutely no impact on the measured resolution of the structure.
Sometimes someone looks at their final structure and sees all of these beautiful 'high resolution' features, then looks at the structure from another reconstruction packaged, and sees some of these features blurred out, concluding that one package did a a better job reconstructing the model than the other package. This is simply not true. Different packages handle CTF amplitude correction and B-factor correction in very different ways. It may well be that if you take the 'blurrier' structure and apply a small inverse B-factor correction, that the structure that originally looked 'blurrier' may actually be better.
The next issue is initial model/noise bias. Different reconstruction strategies are susceptible to different degrees to this problem. EMAN incorporates an iterative class-averaging proceedure designed specifically to make it almost completely unsusceptible to this problem. This option is embodied in the classiter= option in refine. If you run your final high-resolution refinement with classiter=8, your reconstruction will end up significantly blurrier than it could be. On the other hand, if you refine with classiter=0, which is equivalent to what most other single particle reconstruction programs do, you can end up with a substantially exaggerated resolution (and incorrect features in your model that you may be tempted to interpret). In general, in EMAN, we suggest starting out in the early refinement rounds with classiter= in the 5-8 range to eliminate model bias and improve convergence rate. Then when you are doing your final refinement, drop this value to 3 (the smallest permissible value other than the special case 0, which disables the routine). While this will not produce the absolute highest possible resolution, it will be very close, and largely prevent any model bias from creeping it. It may also be safe at some stage to run a few iterations with classiter=0 to produce the highest possible resolution structure, but if you do so, you must run a range of additional tests to demonstrate the reliability of your model.
Another issue relates to angular sampling. In EMAN, and other programs which generate reference projections of your 3-D model for classification, resolution can be strongly affected by angular sampling. Perversely, using a larger angular sampling, which results in rotationally 'blurring' your structure, will actually produce a better resolution using standard methods. Why ? When you reduce your angular sampling, you are averaging more particles together in specific orientations, thus reducing the noise-levels in that orientation. Ideally we would use very fine angular sampling in conjunction with maximum liklihood methods to optimally weight the data contributions at different angles. Lacking that, however, it is possible to exaggerate your resolution (or underestimate it in the opposite case), through bad choices of angular spacing.
One final issue is how resolution is measured. Ideally people would split their raw data in two halves, then run a full refinement on each set, thus testing both for noise levels as well as model-bias. However, this is not what most people do. Typically people go to the final refinement step (3-D reconstruction) and do 2 reconstructions, one from even numbered particles and one from odd numbered particles. However, in this case, all of the Euler angles are fixed, and were generated based on the refinement performed with the full data set. The eman eotest program does something between these two extremes. While it does make use of the particle classification (2 Euler angles determined) from the final refinement iteration, it reruns the iterative 2-D class averaging process to reduce model-bias problems in the resolution estimate. If classiter is set high, again, a worse resolution estimate will result. Even if classiter is set to 0 in eotest if it was not also 0 in the final iteration of refine, a slighly worse resolution estimate may be produced.
So, the final answer is, it depends on how you use each software package. You need to understand the terms you are using. In the end, the real question is how detailed are the features I can reliably interpret in my map. This may give you a different answer than comparing 'resolutions'.
This is known to occur on a few clusters, but not in a predictable fashion. It seems to be related to a bug in older versions of OpenSSH, or perhaps in the network kernel itself. If (while refine is locked up) you run 'netstat' on the node where the refine command was run, and see a number ssh processes in the CLOSE_WAIT state, this is the problem you are having. Unfortunately there isn't a universal fix, other than to run a newer version of Linux on your cluster. Upgrading to a newer version of OpenSSH (4.2 or higher) may solve the problem. If you want a stopgap solution, you can simply run 'killall ssh' on the node where refine is running, and the job will start running again. However, you may be forced to do this once in every iteration, assuming it continues to
We now offer partial support for OSX (current OSX version only). Still, to compile yourself, you need to know what you're doing if you want to make it work on that platform. Basically:
use 'fink' to install required libraries:
QT3 (X11 not native version)
GSL
FFTW2 (not 3)
HDF5
cmake
Then just follow the normal unix install instructions:
cd eman
ccmake .
press 'c'
set various options
press 'c' then 'g'
make
make install
There are several possible causes for this. The first
possibility is
that your particles in start.hed and your initial 3D model in
threed.0a.mrc aren't the same size. Do an 'iminfo start.hed' and
'iminfo
threed.0a.mrc' and make sure they're the same size in pixels.
Aside from that, by far the most likely problem is that you
used
the 'ctfc=' or 'ctfcw=' options in the 'refine' command, but didn't
properly prepare the input particles. With 'ctfit', 'fitctf' and/or
'applyctf'. The answer here is RTFM (read the _ manual). If you want to
see if your particles are prepared for CTF correction run the 'eman'
file browser, and look at the start.img file. In the text box below the
image, you should see something like:
'!--2.38 252 1.22 0.15 0 6.84 2.54 1.43 400 4.1 2.7'
Your numbers will be different, of course, but there should be a row
of numbers beginning with '!-'. If you don't see this, then your images
haven't been properly phase flipped.
EMAN supports a lot of different formats, and it does it
transparently. That is, in general, all EMAN
program can read any
image in a wide variety of formats without you having to do anything
special. EMAN currently supports reading
SPIDER, IMAGIC, MRC, HDF5,Gatan DM2, Gatan DM3, PIF and ICOS formats.
TIFF
images are
now natively supported using libtiff. You should now be able to
directly
read 16 bit tiffs. Most generic image formats like TIFF, GIF, PGM, BMP,
PNG, etc. are also supported if you have the IMAGEMAGICK package
installed on your machine. Due to Gatan constantly changing things, we
cannot guarentee that DM3 file reading will be perfect.
For image writing, EMAN supports most of the above formats as well.
However, most EMAN programs default to IMAGIC format (for 2D) and MRC
format (for 3D). To convert to a different format, use 'proc2d' for 2D
images and 'proc3d' for 3D images. EMAN will detect most filename
extensions automatically, or you can force several of the output
formats with a program option.
Some of you may also be aware of the 'byte ordering' issue. Different
machines (SGI vs Intel, for example) store their numbers in the
opposite
byte-order. Often this means files generated on one machine will be
unreadable on machines using the opposite convention. However, EMAN
handles this problem as well. Any supported image can be read
regardless
of byte-order. When writing images EMAN uses the native byte order of
the machine the software is being run on.
This can be done in EMAN, though it
doesn't use rotational power spectra. Real-space approaches are more
accurate, though proper centering is critical. Past attempts at the
rotational power spectrum approach (on several test cases) showed it to
be
unreliable and imprecise.
First, center the particles:
cenalignint particles.hed maxshift=<pixels>
(warning, this can use a lot of memory. You should have 3x
as much ram as the size of the file you operate on. If not,
use the frac= option)
- or -
proc2d particles.hed centered.hed <center | acfcenter>
One of those three should do a decent job centering your particles (they do not need to be in the same orientation).
Then take the centered data and run :
startcsym centered.hed <# top view particles to keep> sym=<trial symmetry>
While this is also designed to look for side views, it will find top views (with the corresponding symmetry) very nicely.
So, pick a trial symmetry, and run startcsym. Then look at
the
first 2
image in classes.hed and the first image in sym.hed. The first image in
classes.hed is an unsymmetrized particle with the strongest specified
symmetry. The first image in sym.hed is a symmetrized version. If the
two
look the same and have a visible symmetry, you've probably got the
right
answer. Repeat for all possible symmetries. The answer will usually
stand
out very clearly, and can be presented in publication by showing the 2
images side-by-side for each trial symmetry. Note that there are some
known situations (detached virus portal complexes, for example) where a
single data set may contain particles with multiple symmetries.
Also see related question below
The documentation really needs to address this, but
doesn't.
There are two
reasons for this, though. First, it is really difficult to describe
this
adequately textually. Second, you need to have a sound
understanding
of the mathematics being used in CTF correction to use this method
properly
and avoid doing bad things to your structure (without realizing
it).
One other note. Many people (myself included) have suggested generating a structure factor curve computationally from a PDB structure of a similar protein. As it turns out, this is a very difficult thing to do, largely because solvent effects have a profound effect on the overall shape of this curve. Current software (2003) used by the solution scattering community can accurately predict peak locations, etc., but it doesn't have the correct overall shape, and should not be used for EM work. Perhaps this situation will improve in the future.
Still, there is a way to get the necessary curve. It isn't perfect, but it's adequate in most cases, and has been used for several published structures. The basic idea is to use several sets of particles from images at different defocuses. You then simultaneously fit the CTF of these data sets such that the CTF curve is a reasonable fit, and simultaneously the predicted structure factor for all of the curves matches pretty well at low resolution. This process must be done manually using ctfit, but once you have a result, you should be able to do most of your fitting with the automated program 'fitctf'.
The optimal way to approach this problem is to have some sort of solution scattering curve on-hand. This curve is simply used for scaling the data, and getting some general idea of a reasonable B-factor and amplitude contrast. It will not impose it's features on the final structure factor. This is also not strictly necessary, it is possible to proceed without one. The 'groel.sm' curve (native GroEL structure factor) is probably adequate for most cases. Then do the following:
Good questions! To find out how many particles were included in the class-averages, type (for example) 'iminfo classes.4.hed all'. The last number on each line is the number of particles included in that class-average. At the end a total number of particles included in the classes file will be shown.
Now this is where it gets tricky. If all of the class-averages were used in the reconstruction, you'd be done. However, some class-averages may get excluded (depending on the value you select for hard=). However, they are not necessarily the same class averages that are excluded from the original make3d reconstruction. make3d will output how many original particles were included in the final reconstruction as part of its output on the screen. This is probably the best answer you'll get, but it isn't stored anywhere. Generally when I talk about the number of particles used in a reconstruction, I'm referring to the 'iminfo classes.4.hed all' method.
The next part is a little trickier. A complete record of particles excluded from the class-averages is kept (along with classification information) for all iterations, in 'particle.log'. This file has a variety of different information in it, depending on the first character of the line. Lines starting with 'X' indicated excluded particle numbers. If going through this file is too much of a pain, you can rerun the 'classalignall' command with the 'badfile'. This will create a set of files containing the excluded images for whatever options you provide to classesbymra.
Note that the particle.log file can also be used to recreate the 'cls' files from any particular iteration using the 'clsregen' command.
Not exactly. The curve from your data (the power spectrum) is equal to noise + ctf * envelope * structure factor. The curve you're fitting with is just noise + ctf * envelope. The structure factor is missing, and is very important. This problem can be tackeled three ways, described in : the documentation. If you're in the situation where you don't have x-ray data (which you presumably don't), the best results are generally achieved via the following process:
1) you will need several data sets at different defocuses
2) read the first data set into CTFIT
3) set the 'amp' to zero, then use the 4 noise sliders to fit the
background by passing through the zeros of the the ctf, and matching the
high-resolution end of the curve (where the zeros are no longer visible).
4) Increase the amplitude and adjust the defocus and envelope function as
best you can. You should be able to determine the defocus quite
accurately. The envelope function is somewhat arbitrary.
5) Read in the second data set, and repeat this process for it.
6) Bring up a second plot window, and set it to display the structure
factor. This will show the structure factor for each displayaed data set,
calculated from the data. This is not a very accurate calculation, but
it's generally good enough.
7) Now, without spoiling the fitting you've just done, adjust all of the
parameters of the 2 data sets such that the structure factor curves match
as well as you can. Don't worry too much about the divergence at high
frequency. Work on getting a match out to the first or second zero, then
just try to get the general trend at high frequency to be the same.
8) continue to add the other data sets in the same way.
9) When you've got them all fitted satisfactorally, use the 'phase
correct' option as described in the instructions.
Gosh, I hope so! Seriously, there are a few issues to be aware of. First, one factor often of concern is the fact that EMAN generally keeps the entire set of particles in a single image file stack A really big reconstruction might cause this file to become bigger than 2 gigabytes, which is a problem for some parts of certain operating systems and a lot of other software packages. While EMAN does the proper things to support files larger than 2G, sometimes the underlying system still won't allow it. However, there is a good workaround. EMAN supports a special file format called the LST format. This format is basically a text file containing a list of images in other files. For example, if you have too much data to put into a single start.hed/img file, do the following:
You now have 2 files (start.hed/img) which appear to eman to be equivalent to a big imagic file, without any of the normal 2G limitations. Note, however, that you cannot add new images to this file with proc2d. It isn't really an image file.
There are other issues. Some are of concern when there are a lot of pixels in each image and others are of concern when there are a lot of images. In the first case, memory on the computer is the biggest problem. For example, if you were trying to reconstruct a 512x512x512 volume, each volume dataset requires 512 MB. Several programs require enough memory for 2 or 3 3D models. So any machine used to process this dataset would need to have at least 2 GB of RAM. There are too many issues involved to cover all possibilities here. In general, I'd say yes, EMAN can handle really big problems. If you run into problems, email me, and I'll try to help you resolve them.
The answer depends on the source images. EMAN reads most EM file formats directly. If you have each particle in a separate image file, for example, img001.img, img002.img, etc., then the following command would do it (zsh, similar for other shells):
foreach i (img*.img)
proc2d $i start.hed
end
-or- (csh)
for i in img*.imgIt doesn't matter what file format the source images are in. Any EMAN program transparently reads any supported image type (byte order doesn't matter either). If the images are already in a Spider stack file called, for example, part.spi, the following would do it:
proc2d $i start.hed
proc2d part.spi part.hed
All of the 2D EMAN commands currently write Imagic files
by
default. 3D commands write MRC format by default. There are options in
proc2d and proc3d for writing to other file formats.
Tricky. There is a possibility that the answer is "you can't". In most cases, however, it's possible to get a pretty accurate answer. In cases where the symmetry of the particle is unknown, the ability to distinguish between different symmetries is proportional to the overall contrast in the image. In cryo-EM there is always a tradeoff between contrast and resolution, so the best thing to do if you're trying to determine symmetry is exactly the opposite of what you'd do for high resolution. That is, take some micrographs in negative stain, or in ice, fairly far from focus at low voltage. This will provide the best overall contrast for an attempt to determine symmetry.
Once you've collected high-constrast data, there are a number of techniques to try to determine symmetry. for particles with a suspected Cn or Dn symmetry, startcsym is a good starting point. By running it several times with each possible symmetry you can see how well each one fits the data. Frequently comparing the symmetrized model in sym.hed with the class-averages in classes.hed will give the first indications of the true symmetry.
The next step is to try to refine each of the possible initial models and see if they 'fall apart' during refinement. This should resolve the symmetry question IF you have sufficient contrast, and IF your particles are in fairly random orientations. If the contrast is too low, or there is a strongly preferred orientation, however, an accurate answer may not be possible.
If the first technique fails, there are other possibilities, like using multivariate statistical analysis on an aligned set of raw particles. These issues are too complicated for discussuion in this FAQ.
No, the docs don't really explain all of the text output at this point. I can tell you what the numbers are, but I don't think it's going to help you very much. While you may be able to judge the quality of an individual particle when compared to a good model using the quality factor, they really won't tell you what you're trying to find out. There are just too many variables involved.
If you're anxious that things are going too slow, the best
approach is to increase the angular step for the first couple of
refinement iterations. For an asymmetric model you could go as high as
15 or 18 degrees for the first round or two. That should be enough to
tell you if the model is reasonable. You may also consider downsampling
the data with, for example, proc2d
start.hed newdir/start.hed shrink=2
As we tried to impress in the documentation, asymmetric models can be very tricky. It depends on their overall shape. If, for example you have something 'L' shaped, then getting a good starting model shouldn't be difficult at all. However, if you have something that's basically round with a few lumps, it may actually be impossible to generate an unabiguous accurate starting model. It is actually possible to have a set of random projections which can produce several DIFFERENT models, all of which are consistent with the data at some resolution.
StartAny uses c1startup, so no, there's no difference. The routine it uses isn't all that great. For 'easy' models it will work pretty well, but in tough cases, it may just come up with something completely wrong. In these cases, there are really only two good solutions in EMAN right now:
1) if your model may have a pseudosymmetry, ie - it's vaguely cylindrical in shape or something, you can often use the startcsym routine and get something that's good enough to start.
2) Final resort. Use tomography. If you're comfortable with it, then you might actually start here. Anyway, the idea is simple enough, take a tilt series (probably have to use stain or glucose for this). EMAN has a few experimental programs for generating a 3D model from the tilt series and aligning/averaging several such 3D models to generate a starting model. Even this approach isn't perfect (at least the simple implementation EMAN uses isn't), but we have used is with some success on a few projects.
To answer your question anyway: the output from classesbymra looks like:
0 -> 256 (506.86)
1 -> 296 (508.74)
2 -> 278 (502.86)
3 -> 273 (504.82)
The first line is saying that particle 0 (the first one in start.hed) looked the most like projection number 256. The quality factor was 506.86. The interpretation of the quality factors can depend on the shape of your model and the box size, etc.
AThis feature is not well documented. The best
approach
is:
1) generate a set of projections 'proj.hed'
2) 'ctfit proj.hed'
3) On the 'Process' menu select 'Simulate' -> 'RT CTF Sim'
this will open a file dialog, select proj.hed again
4) A window with a picture of your first projection will appear
as you modify the ctf parameters, the appearance
of this
image will
be updated continuously.
5) Set the desired parameters. A reasonable set for a decent FEG scope
is:
200 kev, 1 mm Cs
Noise : 0, 1, 6, 3
Amp Cont: 10
Envelope: 7.5
Defocus: whatever you want, usu 1-4 microns
6) This will set the basic parameters. The one parameter you cannot
define
a 'typical' value for is 'Amp', since it depends
on the
amplitude of
your projections (usually you'd start out with a
normalized 3d model).
Basically you adjust 'Amp' to achieve the desired
signal
to noise
ratio visually. Once you've applied the ctf to the
entire
set of
projections, you can read the simulated set back
into
ctfit again and
redetermine the SNR.
7) select 'process' -> 'simulate' -> 'apply ctf 2d'
This will not overwrite the input file. It will
create a
new file
with the CTF parameters determined
Note: if you want to do this on many files, you can use 'applyctf' once
you have a good set of parameters.
import EMANHere's the script:
help(EMAN.EMData)
help(EMAN.Euler)
# This reads a text file with a space separates Euler triplet
# and generates projections
from EMAN import *
infile=open("eulers.txt","r")
lines=infile.readlines()
infile.close()
# Ok, this next line is not all that transparent, there
# are other ways to do this, but it is a useful construct
# converts a set of input lines into a list of tuples
eulers=map(lambda x:tuple(map(lambda y:float(y)*math.pi/180.0,x.split())),lines)
e=Euler()
# read the volume data
data=EMData()
data.readImage("model.mrc",-1)
for euler in eulers:
e.setByType(euler,Euler.MRC)
# -4 is the best real-space projection mode
out=data.project3d(e.alt(),e.az(),e.phi(),-4)
out.writeImage("proj.hed",-1) # file type determined by extension
each subdirectory must contain a 'threed.0a.mrc' file, the starting model for each subpopulation. In most cases these starting models don't need to look at all like the final structure, and in fact, when you refine a structure from something like a Gaussian blob, this can produce the most convincing results (note that this does not always work). The sole absolute requirement here is that the models cannot be numerically identical. ie - if you start with a Gaussian blob for a starting model, you must add a small amount of perturbative noise to each.
It is also quite acceptable, especially for a first effort, to use 'good' models from single model refinement as starting models for the multi model refinement. Typically this would be done by taking the last n iterative models from a normal refine and using them as starting models. ie - if you ran refine for 10 cycles, use threed.7a.mrc as threed.0a.mrc in subdirectory 0, threed.8a.mrc as threed.0a.mrc in directory 1, etc.
If you don't have any experience with 'refine', I would
strongly
suggest
learning how to do single model refinements first, then moving on to
multirefine.
The overall process:
mkdir r0 r1 r2 r3
cd 0
foreach i (cls*lst)
proc2d $i ../r0/start.hed first=1
refine .... (normal refinement)
cd ../1
foreach i (cls*lst)
proc2d $i ../r1/start.hed first=1
etc.
This is largely answered in the question above relating to resolution of reconstructions. If you are not using optimal options with the refine command, you may, indeed, not get optimal results. The basic options provided by the EMAN tutorial do not give the entire picture. For example, if you continue to use classiter=8 for your high resolution work, you will never get the highest possible resolution. Similarly, if you don't use the refine option to the refine command, you will end up slightly blurring your model in most cases. The bottom line is, to get optimal results, you must specify the optimal options.
There are some caveats in this reply. The fundamental refinement strategy in EMAN (1.x) is not really conducive to providing these numbers in a way that they can be reasonably compared to values from another program. While it is possible to produce them, they will not be as accurate as they effectively are during a refinement loop. CTF correction and parameters are taken into account very heavily in a normal EMAN refinement. That is, we have a CTF corrected 3D model, projections are made, then the projections are modified by the CTF before comparison to the raw particles. Unless you pull these parameters out of the middle of a refinement loop, the numbers you get using a straight uncorrected comparison will be less accurate. In addition, one substantial advantage of EMAN refinement is the 2D iterative class-averaging proceedure, which further refines the 2D particle orientations. These numbers would not be available.
If you are running a full EMAN refinement and you wait until it converges, then getting these values is straightforward. Untar the cls.*.tar file from the final iteration. These cls files are the particles associated with each projection (ie - that gives you 2 of the Euler angles). Each line in this file contains a particle number from start.hed, then at the end of the line is:
similarity, rotation (radians), dx, dy