Dear Colleagues,

My previous writeup was a general discussion of capabilities,
which I thought had been requested, but I see that Cho went
more to specifics of addressing the grand challenge.  Hence,
I have written up this information as well.

SUMMARY: We expect to be able to compute the bunch to bunch 
wakefield on current hardware, explicitly with a FDTD approach
within six months of project commencement, through implementation
of two new capabilities along with associated verification.
We would be able to do systems with errors as well.

DETAILS

I have just confirmed that we can run a 0.43 B cell simulation on
our 96-core cluster at Tech-X.  This is sufficient to resolve a
single cavity (1.287 m x 0.223 m x 0.223 m) to 0.53 mm, just
resolving the air gap by a grid of 2430*421*421.  In this
configuration, VORPAL took 2.5s/time step.  (0.58 us per
cell update, which is slow, probably due to the fact that
this run was using virtual memory.)

Khabiboulline (FNAL) showed results in May where he was able to
effectively model the air gap with a coarser grid by
relaxing the gap and compensating by filling the gap with
dielectric. This causes no reduction of stable time step compared
with the vacuum results.  This method could be used to go to a
grid that is of the order of 1 mm resolution or greater.  (The
formteil thickness is 8mm, so 2mm might be the limit.)

In any case, going to only 1 mm gives a factor of 8 in system
size, thus allowing the modeling of a fully 8-cavity cryomodule
with 0.43B cells.  Such a grid would be 9720*210*210.

With the perfect-dispersion methods recently developed, this
would require a step size of 1mm/c = 3.3 ps, which implies 300
time steps per ns.  With the worst-case estimate of 6x increase in
computational time for the perfect dispersion algorithms, this
gives 4500s = 75 m per ns for the entire cryomodule.  (The
computation is expected to be not quite so bad, as cache hits 
improve with more operations per time step.)

The cavity is 9720 steps long (or 32 ns), so another metric is 11
computational hours for a beam to propagate through the entire
cryomodule.

For a microbunch spacing of 300 ns, we then can do two bunches,
in order to compute a wake field of the entire cryomodule,
basically brute force, but fully self consistent, in 120 hours (5
days) on less than 100 processors.  A 1000 processor cluster
brings this back down to 12 hours.  There would be 11*9720 =
100,000 time steps in this calculation.

To obtain accuracy estimates, we propose to do the simulation
at several comparable resolutions.  Richardson extrapolation,
we expect, will give us both the error and a higher-order result,
as we have shown in the recent SciDAC proceedings.

To make this computation work, we believe we need to implement 
and verify the following:

   dielectric model in gap

   perfect dispersion model

which we would expect completed within 6 months after project
commencement.  We would also investigate the implementation of
the capacitive gap mode mentioned earlier.

This could be repeated with any sets of errors desired.

We could bring the compute time down further by restoring our
load balancing.   That is another several months work.