Overview

~~This milestone report documents blah blah blah~~Milestone F of the QuakeSim project (Numerical Simulations For Active Tectonic Processes: Increasing Interoperability And Performance) for NASA’s Earth Science Technology Office, Computational Technology Program. The text of the milestone appears on Page 1, and requires demonstration of substantially larger problems than Milestone E (7/30/02, see tT~~able 1) for the now-parallel~~ ~~earthquake~~ ~~codes PARK and GeoFEST. Milestone F also requires demonstration of parallel scaling.~~

~~In the next section we describe the large problems that demonstrate code improvement for the PARK and~~ ~~GeoFEST code. Then we give details for the PARK code (code~~ ~~description, algorithm, numerical method, documentation, scaling analysis, and scientific and computational significance), and for the GeoFEST code (including the same topics).~~

~~Code~~	~~Machine Wallclock Time~~	~~Processors~~	~~Date~~	~~Elements~~	~~Time Steps~~
~~PARK~~ ~~Milestone~~ E ~~(7/30/02)~~	~~Chapman (AMES)~~ ~~7.888 Hours~~	1	~~September 18, 2002~~	~~15,000~~	~~500~~
~~PARK~~ ~~Milestone F~~	~~Chapman~~ ~~(AMES)~~ ~~7.879 Hours~~	~~256~~	~~August 15, 2003~~	~~150,000~~	~~5,000~~
~~GeoFEST~~ ~~Milestone E (7/30/02)~~	~~Solaris workstation (JPL)~~ ~~13.7 Hours~~	1	~~July 30, 2002~~	~~55,369~~	~~1,000~~

Code	Machine Wallclock Time	Processors	Date	Elements	Time Steps
GeoFEST Milestone E	Solaris workstation (JPL) 13.7 Hours	1	July 30, 2002	55,369	1,000
GeoFEST Milestone F	Thunderhead (GSFC) 2.8 Hours	64	September 1, 2003	1,400,198	1,000
GeoFEST Milestone G	Cosmos (JPL) 12.0 Hours (under 13.7 allowable)	490 (well under 880 allowable)	March 26, 2005	16,021,034	1,000
~~GeoFEST~~ ~~Milestone F~~	~~Thunderhead~~ ~~(GSFC)~~ ~~2.8 Hours~~	64	~~September 1, 2003~~	~~1,400,198~~	~~1,000~~

Table 1: Computer runs demonstrating baseline, ~~and~~ Milestone F performance enhancements ~~for PARK and GeoFEST applications~~and Milestone G performance enhancements for GeoFEST.

Problems Used to Demonstrate Codeing Improvement

~~Table 1 summarizes the results for the improved code demonstration runs. Additional runs demonstrating parallel scaling are detailed in Section 4.~~

~~The nature of the code improvements we describe is that problems of much larger size than the baseline case are solved in the same time, by use of advanced computing technology (~~M~~multipole methods for PARK, domain partitioning for GeoFEST, and efficient MPI parallel coding for both PARK and GeoFEST).~~

~~For the PARK code the problem demonstrating the improvement over baseline has geometry shown in Figure 1. The boundary conditions (appropriate for the geographic setting at p~~Parkfield, CA) are the same as for the baseline case, but the mesh density is increased from 15,000 to 150,000 rectangular elements, and the problem duration is extended from 500 to 5000 time steps. Traditional methods lead to work scaling with the square of the number of elements, and linear with time steps. Our improvements are due to efficient parallel implementation and use of a M~~multipole solution technique that scales much better than~~ ~~a law~~ ~~quadratic in element count.~~

~~Figure 1.~~ Views of grid geometry and distribution of constitutive properties for PARK at three different scales. A. shows the overall geometry. The numbers are distances in km, both vertically and horizontally from the origin, which is at the earth’s surface at Middle Mountain, directly above the hypocenter of actual Parkfield earthquakes. Boundary conditions of 35 mm/yr slip rate are applied on the light brown areas, while the blue area is locked with zero slip rate. B. Closer view of the grid geometry. The large nearly solid red area is a region with square elements ~65 m on a side. Within it is an area at about 10 km deep and 3 km horizontally that appears solid red at this magnification, with ~22 m squares and ~7.4 m squares. C. Same area as in (B) showing contours of the constitutive parameter a-b without the grid. Instability can occur in areas colored brown, the darkness being proportional to the tendency to be unstable. D. Magnification of the finest area, showing the ~7.4 m elements and part of the area with ~22 m elements. As in (C) the color shows the intensity of a-b.

For GeoFEST, the problem geometry is shown in Figure 2. Elastic and viscoelastic deformation is simulated for 1.4 million finite elements, for a duration of 1000 time steps. The significant increase is in the number of elements, which was about 55,000 for the baseline case (also running 1000 time steps). Note ~~that~~ ~~the baseline case was modeled on the 1994 Northridge earthquake, using a single fault of 300 square km in a domain of 240 x 240 x 100 km. The improved demonstration case is based on the 1992 Landers event, using three closely arranged faults within an~~ ~~865 square km~~ ~~area~~ ~~in a domain~~ ~~volume area of~~ ~~1000 x 1000 x 60 km.~~

For GeoFEST, the problem geometry is shown in Figure 2. Elastic and viscoelastic deformation is simulated for 1.4 million finite elements, for a duration of 1000 time steps. The significant increase is in the number of elements, which was about 55,000 for the baseline case ~~(also running 1000 time steps). Note the baseline case was~~ ~~modeled~~ on the 1994 Northridge event, using a single fault of 200 square km in a domain of 240 x 240 x 60 km. The improved demonstration case is based on the 1992 Landers event, using three closely arranged faults with 600 square km in a domain 1000 x 1000 x 60 km.

~~Figure 2a~~: ~~Finite element mesh~~ ~~LandersGap64~~ ~~for~~ GeoFEST milestone F improvement code improvement problem. Colors indicate partitioning among processors (limited to 16 processors in this image for clarity, actually 64 processors were used). Partitions cluster near domain center due to ~~the~~ ~~high mesh density~~ ~~that is used near the faults.~~ ~~Note the vertical scale differs from the horizontal scales.~~

~~Figure 2b:~~ ~~Fault segments and surface nodes for LandersGap64 mesh, center region.~~

~~Table 1 summarizes the results for these improved code demonstration runs. Additional runs demonstrating parallel scaling are detailed in Section 4~~.

~~Table 1~~~~: Computer runs demonstrating baseline and Milestone F performance enhancements for PARK and GeoFEST applications.~~

SMilestone G Supporting Documents

The top-level web site for the QuakeSim task is at http://quakesim.jpl.nasa.gov . Source code for the ~~three~~GeoFEST codes may be found at http://quakesim.jpl.nasa.gov/download.html. Files required for the GeoFEST baseline cases (Milestone E) may be found at http://quakesim.jpl.nasa.gov/milestones.html.

~~http://quakesim.jpl.nasa.gov/milestones.html~~.

~~Blah is here~~

The PARK Code

~~PARK is a model for unstable slip on a single earthquake fault. Because it aims to capture the instability~~, ~~it is designed to represent the slip on a fault at many scales, and to capture the developing seismic slip details over an extraordinary range of time scales (sub~~-~~seconds to decades). Its simulation of the evolution of fault rupture is the most realistic of the tools~~ ~~in QuakeSim~~ ~~for that scale, demonstrating the multi-scale approach of this project. When transformed into an efficient parallel simulation, it will be the~~a ~~powerful~~ ~~tool~~ ~~of choice for researchers seeking to determine the nature and detectability of earthquake warning signals such as surface strains and patterns of microseismicity.~~

In a typical application the PARK code will compute the history of slip, slip velocity, and stress on a vertical strike-slip fault that results from using state-of-the-art rate and state frictional constitutive laws on the fault~~, which is currently that~~ ~~for a specific geographic setting at Parkfield, California.~~ ~~The boundary conditions are those appropriate for Parkfield~~, ~~and the distribution of constitutive properties on the fault zone are as realistic as our ability to characterize the subsurface properties of the fault there allow~~s~~. The methods developed in solving this problem can be generalized to other geologic settings in which the fault geometry,~~ ~~and~~ ~~the boundary conditions are not so simple and multiple faults are involved.~~

~~The fault is represented by a rectangular~~ ~~grid,~~ ~~with highly variable mesh density (see Figure 1).~~

Algorithm

Numerical Methods

Numerically, PARK is a boundary element program that determines the stress on every element of the fault surface due to slip on every other element, using a Greens function approach. This is the first earthquake simulation code to seek and achieve enhanced scalability and speed by employing a M~~multipole technique (documented below). The~~ Mmultipole experience gained here will also be transferable to the Virtual California code and other boundary element simulations. The power of massive parallel computing is required for this problem in order to support many small slip patch elements in order~~needed~~ ~~to cover the nucleation scale that initiates the instability~~.

~~The integration is done with a fifth order Runge-Kutta[1]~~ ~~scheme with adaptive step size control. Because the time-steps range over ten orders of magnitude,~~ ~~the adaptive step-size control is an essential element in the solution. The time steps range~~ ~~depending on whether the fault is slipping very slowly in the interseismic period or very fast during an earthquake, the adaptive step-size control is an essential element in the solution.~~

The main program calls a variety of subroutines and the one of these subroutines that calculates the derivatives used in the forward time integration itself calls a Fast Multipole library that is suitable for such Green's functions problems. The Multipole approach allows a number of computations to scale as N log N rather than N² as would otherwise be the case. The particular~~This~~ ~~Fast Multipole approach being used allows determination of the degree of grouping of the remote cells based on an analytical approximation to the Green~~’~~s function. In order to reduce computation time it also renumbers the elements so that those that are near in space are also near in memory.~~

Parallel Implementation

~~The flow of the program and the functions of its main routines are found in~~ ~~the file~~ ~~Code_Description.doc posted at~~ ~~http://quakesim.jpl.nasa.gov/milestones~~ ~~and~~ ~~also~~ ~~http://quakesim.jpl.nasa.gov/documentation~~. ~~The main program and most of its subroutines are written in FORTRANortran~~ ~~90. At the time of the First Code Improvement Milestone~~ T~~these programs~~ ~~were~~ ~~have been converted to use MPI to run in parallel~~ ~~for the First Code~~ ~~Improvement Milestone~~.

~~In brief, t~~T~~he main program reads inputs and sets up the problem,~~ ~~carries out a time-stepping loop~~ ~~that integrates the slip and stress field forward in time, and writes a summary and closes files at the end.~~ ~~The dominant task is a function “DERIVS,” which uses a~~ fault constituitive relationship to determine slip velocity based on local stress, includes effects of shear wave speed and radiation damping, and uses the Multipole library to sum the stress contributions of the slip on all fault elements to determine the local stress. ~~So, within a single time step the objective is to integrate the history using variable-step Runge-Kutta based on the stress-velocity interactions.~~ ~~In each time step “DERIVS” is called a total of six times, once initially by the main program and five more times by the integrator, to result in a fifth-order Runge Kutta integration.Hence DERIVS is called for initial derivatives, then the integrator calls it six more times for fifth-order Runge Kutta.~~ “~~DERIVS~~” ~~relies on the Multipole code so it calls~~ “~~sumtree~~” ~~to gather all the contributions to local stress, thenand then~~ ~~determines the local stress and slip temporal derivatives from that. Finally it calls~~ “~~forgettree~~” ~~to clear the m~~M~~ultipole structure.~~

~~In the parallel version, a single processor reads the geometry and partitions it among the remaining processors, sending a portion of the geometry~~ ~~and local properties~~ ~~to each one.~~ ~~Within the loop over time steps~~ “~~DERIVS~~” ~~uses the Mulipole functions similar to the sequential code, except that~~ “~~sumtree~~” ~~is designed for parallel use and has internal MPI calls~~.

~~The function that account~~s ~~forcomputing~~ ~~the~~ ~~stress interactions between fault elements~~ ~~dominatinges the~~ ~~work in the problem are~~ ~~computing~~ ~~the stress interactions~~ ~~between fault elements. This interaction is made fast and efficient by using a proven parallel~~ fF~~ast~~ mM~~ultipole~~ ~~library.~~ ~~Parallel decomposition was added to portions of the remaining code in order to make good use of this parallel library.~~

~~We link~~ ~~the Fast Multipole library~~ ~~of Salmon and Warren (Salmon and Warren, 1997; Warren and Salmon, 1997~~), ~~which~~ ~~is written in parallel using MPI.~~

Problem Description - PARK

Compute the history of slip, slip velocity, and stress on a vertical strike-slip fault that results from using state-of-the-art rate and state frictional constitutive laws on the fault for a specific geographic setting at Parkfield, California. The boundary conditions are those appropriate for Parkfield and the distribution of constitutive properties on the fault zone are as realistic as our ability to characterize the subsurface properties of the fault there allow. The methods developed in solving this problem can be generalized to other geologic settings in which the fault geometry, the boundary conditions are not so simple and multiple faults are involved.

Code Description - PARK

The main program is a boundary element program that determines the stress on every element of the fault surface due to slip on every other element, using a Greens function approach. The fault constitutive law is used to determine what the slip velocity will be for that stress and this velocity multiplied by the time step gives the slip to be used to calculate the stress in the next time increment. This involves the forward time integration of coupled ordinary differential equations. The integration is done with a fifth order Runge-Kutta scheme with adaptive step size control. Because the time-steps range over ten orders of magnitude, depending on whether the fault is slipping very slowly in the interseismic period or very fast during an earthquake, the adaptive step-size control is an essential element in the solution.

The main program calls a variety of subroutines and the one of these subroutines that calculates the derivatives used in the forward time integration itself calls a Fast Multipole library that is suitable for such Green's functions problems. The Multipole approach allows a number of computations to scale as N log N rather than N^2 as would otherwise be the case. The particular Fast Multipole approach being used allows determination of the degree of grouping of the remote cells based on an analytical approximation to the Greens function. In order to reduce computation time it also renumbers the elements so that those that are near in space are also near in memory.

The main program and most of its subroutines are written in Fortran 90. At the time of the First Code Improvement Milestone these programs have been converted to use MPI to run in parallel. The Fast Multipole library already is also written in parallel using MPI.

Documentation

Software documentation is summarized in this report and can be found, in full, in the subdirectories of the 1st_Code_Improv_Milestone directory that is accessible at the public URL http://www.servogrid.org/slide/GEM/PARK.

For the purpose of verifying that the First Code Improvement Milestone run is as described in the “Milestone_Certification_Data” file and repeating the run if desired, the compiled files and documentation are available on turing, an SGI Origin 3000 at NASA Ames. This pre-compiled version, on a NASA Ames machine, uses two copyrighted numerical recipes subroutines, which are available in the src-bin subdirectory of turing:/u/tullis/1st_Code_Improv_Milestone. The subroutines are easily and inexpensively available, however they cannot be posted on the public web site.

Source code and files needed to compile PARK, as well as input files for verification of the First Code Improvement run have been placed in unix-compressed tar files available in two locations. The first is a website (http://www.servogrid.org/slide/GEM/PARK/1st_Code_Improv_Milestone/Downloads) and the second is two of the SGI Origin 3000s at NASA Ames (turing or “chapman:/u/tullis/1st_Code_Improv_Milestone/Downloads”). On the Origin 3000s, the directory Downloads contains two files, “PARK_Package_1st_Improv.tar.Z” and “PARK_Package_NR.tar.Z”. The files are identical except for the two copyrighted numerical recipes subroutines, which are only included in the “PARK_Package_NR.tar.Z” version on turing. Except for the presence or absence of these two subroutines, both of these tar files will create the Multipole library, the source files for the PARK fault application, and the input files for the First Code Improvement Milestone run. On the website, only the “PARK_Package_1st_Improv.tar.Z” file exists, so that the copyrighted subroutines are not made public. See either the “README-src-bin.txt” file in the src-bin directory or the header for the “park.f” file to learn what needs to be done to create these numerical recipes subroutines.

When the tar files are extracted and decompressed, compiling the libraries and the executable will create two files in the t17-7/Objfiles/IRIX64 directory, “libsw.a” and “mpmy_seq.o”. In addition, object files and the executable file (“park”) will be created in the src-bin directory.

Documentation: Software documentation Contained herein andis summarized in this report and is fully found in the subdirectories under the 1st_Code_Improv_Milestone directory in which this file is found.of the server which is accessible at the public URL http://www.servogrid.org/slide/GEM/PARK .

Algorithm

~~Code and documentation can be found in two places, a web site (http://www.servogrid.org/slide/GEM/PARK/) and on~~ ~~turing~~ ~~and~~ ~~chapman~~~~, two of the SGI Origin 3000s at NASA Ames. The web site is the only public location. The machine~~ ~~turing~~ ~~is the front end to the machine~~ ~~chapman~~ on which the First Code Improvement Milestone was run. For the purposes of NASA's verifying that the First Code Improvement run is as described in the Milestone_Certification_Data file and repeating the run if desired, it may be easier to use the copies of the documentation, files, etc. that are located on ~~turing~~ or ~~chapman~~, since no compression, taring or compilation is needed there. In addition, because two copyrighted, but easily and inexpensively available, subroutines are used that cannot be posted on the public web site, it is also easier to verify the behavior on the NASA Ames machines where these subroutines are available in the ~~src-bin~~ ~~subdirectory of the~~ ~~/u/tullis/1st_Code_Improv_Milestone~~ directory. Verification can also be done by the public or by NASA officials by using materials taken from the web site and by purchasing the subroutines if one does not already have them. The directory structure on turing, chapman, and at the website are the same to make it easier to compare each to the others.

~~Files needed to run the first improvement problem are described under~~ ~~“Simulation Details”, below.~~

Within the appropriately named subdirectories under the 1st_Code_Improv_Milestone directory in which this file is found can be found all the necessary material that describes the First Code Improvement Milestone and gives instructions that would allow one to duplicate it.

Included in the "in" and "out" directories are all the materials from the First Code Improvement Milestone run with 150000 elements and 256 processors for 5000 time steps. For code testing purposes on one's own system it is useful to set the number of time steps in the prk.dat.150003 file to a smaller number than 5000 for the initial run; even 2 would be reasonable for the first run.

~~The materials in these directories include:~~

~~Milestone_Certification_Data.txt - a file that give the time required for the First Code Improvement run and describes various parameters of the run.~~

README-setting_up_input_files.txt - a file that tells one how to understand the input files including an explanation of how the elements are created from the input files.

~~README-Compile.txt - a file that tells how to create both the multipole library and the PARK fault files using the appropriate Makefiles.~~

~~in - a directory that contains the input files that were used in the First Code Improvement run.~~

~~out - a directory that contains the output files that were generated in the First Code Improvement run.~~

src-bin - a directory that contains the PARK and related fault application files used in the First Code Improvement run. The versions of this directory on turing and chapman also have the object files and executable binary file (named park)

Downloads - a directory that contains two unix-compressed tar files, PARK_Package_1st_Improv.tar.Z and PARK_Package_NR.tar.Z, that allow one to generate the files needed for the First Code Improvement run. The file PARK_Package_1st_Improv_NR.tar.Z contains all the programs that are needed, including two copyrighted subroutines from the Numerical Recipes book. This version is only on ~~turing~~, not on the public web site. It can be copied by NASA officials for verification of the program behavior and saves one making the additions needed for the other version, PARK_Package_1st_Improv.tar.Z, that does not have the two Numerical Recipes subroutines. See either the README-src-bin.txt file in the ~~src-bin~~ directory or the header for the park.f file to learn what needs to be done to create these Numerical Recipes subroutines. Except for the presence or absence of these two subroutines, both of these tar files will create the Multipole library, the source files for the PARK fault application, the input files for the First Code Improvement Milestone run.

~~The tar files were created on~~ ~~turing~~ ~~in the following way. The libsw.a and the mpmy_seq.o files were moved from the~~ ~~t17-7/Objfiles/IRIX64~~ ~~directory to another temporary location. The object files and the executable file (park) was moved from the~~ ~~src-bin~~ directory to Another temporary location. For creating the PARK_Package_1st_Improv.tar.Z the file the numrec.f file containing the two Numerical Recipes subroutines was also removed to another temporary locations, whereas for creating the PARK_Package_1st_Improv_NR.tar.Z this file was not removed. Then while in the home directory the following command was issued to produce the versions without and with the Numerical Recipes subroutines:

~~tar -cvf PARK_Package_1st_Improv.tar 1st_Code_Improv_Milestone~~

~~tar -cvf PARK_Package_1st_Improv_NR.tar 1st_Code_Improv_Milestone~~

~~Then the .tar files were compressed by issuing~~

~~compress PARK_Package_1st_Improv.tar~~

~~and~~

~~compress PARK_Package_1st_Improv_NR.tar~~

~~to produce the PARK_Package_1st_Improv.tar.Z and the PARK_Package_1st_Improv_NR.tar.Z files.~~

Numerical Methods

Parallel Implementation

Scaling Analysis

~~In the directory~~ ~~scaling~~~~, found in the same~~ ~~1st_Code_Improv_Milestone~~ ~~directory in which this file is found,~~ ~~there~~ ~~are a number of files that show how the job scales both with~~ ~~the~~ ~~number of elements and~~ ~~the~~ ~~number of processors. Thirty sixThirty-six~~ ~~scaling runs were done, involving all the combinations of the number of elements used (712, 5~~,~~292, 15~~,~~000, and 150~~,~~000) and number of processors used (1, 2, 4, 8, 16, 32, 64, 128, and 256)~~ ~~(Table 2, Figure 3). All these scaling runs were run for 100 time steps, whereas the full 150~~,~~000 element, 256~~ ~~processor First Code Improv~~e~~ment Milestone run was done for 5~~,~~000 time steps. In the~~ ~~scaling~~ directory is a data table giving the walltime for all 36 scaling runs, as well as five plots showing dependence of walltime, speedup, efficiency, and overhead on number of elements and number of processors.

~~Table 2~~: Table 2 shows the problem sizes and run times of PARK on chapman using several different numbers of processors. In order to make this scaling study, the milestone problem was reduced to 100 time steps (rather than 5,~~000) These numbers form the basis for all the plots shown in Figure 3.~~

~~Figures 3a, 3b, 3c (below)~~~~: Several ways of presenting how run time varies with problem size and number of processors for PARK on chapman (Table 2).~~

~~Figure~~ 3~~: Scaling behavior for PARK, examined as~~ ~~walltime, speedup, efficiency, and overhead.~~

~~The scaling data show that not much speedup is gained by going from one to two processors~~.~~, and this~~ ~~Further work~~ ~~to understand this behavior could allow us nearly a factor of twosuggests that we need to examine why this is true, since it could lead to a understanding of the behavior that might allow us nearly a factor of two i~~ i~~n efficiency in future runs. For the largest job (150~~,~~000) elements, the efficiency and overhead are nearly constant from 2-8 processors. Efficiency falls off between 8 and 16 processors and is constant from 16-32 processors. For 64, 128 and 256~~ ~~processors,~~ efficiency falls off gradually. This falloff is presumably due to an insufficient number of elements per processor, the numbers being 2343, 1171, and 585, respectively, as the plots show. This effect is seen even more dramatically for the jobs with a smaller number of elements because, ~~as the number of processors increases~~, ~~the number of elements per processor gets so small that a large amount of time is spent communicating between processors. The falloff bewteenbetween~~ ~~8 and 16 processors on the 150,000~~- ~~element problem suggests that the optimum number of elements per processor may be about 20~~,~~000.~~

Scientific and Computational Significance

~~Achieving the First Code Improvement Milestone is significant because it opens the way to run significant~~ly sized problems now that the earthquake code as well as the Fast Multipole library run in parallel under MPI. For the first time it presents to the scientific community fast parallel codes that allow creating simulations of the entire earthquake cycle ~~on a fault~~ in a 3D model that uses the most accurate description of fault friction, rate and state friction, and the quasi-dynamic radiation damping approximation to full elastodynamics. We now have the potential for greatly increasing the number of elements that can be included in the model over what could be done in the past.

~~This means that e~~Enough elements can now be used that is it possible to represent a reasonably sized fault with elements that are small enough that they can properly represent the behavior of a continuum. Larger numbers of elements also allow occurrence in the simulation of earthquakes with a large range of sizes ~~in the simulation. This means that i~~I~~t will~~ ~~now~~ ~~be possible to study in the simulat~~e~~ions in what situations small earthquakes occur~~r~~ing~~ ~~in isolation and~~ ~~ones that~~ ~~in what situations they may cascade or grow into larger ones.~~ ~~It is currently not understood what causes small earthquakes to grow into large ones or stop at small events. Hence, these new simulations should be key to understanding earthquake rupture processes.~~

This could help gain an understanding of whether patterns of microseismicity might be used to help predict earthquakes. The attainment of this milestone not only represents an advance in our computational ability to simulate earthquakes, it~~but~~ ~~will allow us to understand the earthquake process better by creating~~ ~~simulated~~ data sets that can be compared with data on real earthquakes. The attainment of the next milestone (Second Code Improvement) will involve increasing the efficiency of the code in other ways, now that the parallel implementation has been achieved, and this will allow even larger and more realistic simulations to be run.

~~Computationally, we note this is the first simulation of an earthquake fault using the f~~F~~ast m~~M~~ultipole technique, using a library originally developed for astrophysical~~, ~~gravitationally~~- ~~interacting bodies. As shown in Figure~~ 4~~, this technique leads to enormous savings over traditional full-interaction methods.~~ ~~Note that the m~~M~~ultip~~o~~le method results in an improvement in scaling on a single processor, and that~~ ~~the performance~~ ~~is continually increased~~ ~~by employing additional processors (for sufficiently large problems).~~

~~Figure~~ 4: ~~PARK:~~ ~~Scaling of time required per time~~ ~~step using traditional N squared full interaction method, the~~ M~~multipole method on one processor, and the parallel improvement found with additional processors (to 256).~~

Simulation details

~~All~~ of ~~the necessary material that describes the First Code Improvement Milestone can be found within the appropriately named subdirectories under the 1st_Code_Improv_Milestone directory in the public server at~~ ~~http://www.servogrid.org/slide/GEM/PARK~~ .~~(or on NASA machines turing or chapman). Instructions are included that will allow duplication of the results~~.

~~Included in the "in" and "out" directories are all the materials from the First Code Improvement Milestone run with 150~~,~~000 elements and 256 processors for 5~~,~~000 time steps. For code testing purposes on one's own system it is useful to set the number of time steps in the prk.dat.150003 file to a smaller number than 5~~,~~000 for the initial run; even 2 would be reasonable for the first run.~~

~~The materials in these directories include:~~

“~~Milestone_Certification_Data.txt~~” ~~- a file that give~~s ~~the time required for the First Code Improvement run and describes various parameters of the run.~~

“~~README-setting_up_input_files.txt~~” - a file that tells one how to understand the input files including an explanation of how the elements are created from the input files.

~~README-Compile.txt~~ ~~- a file that tells how to create both the~~ M~~multipole library and the PARK fault files using the appropriate Makefiles.~~

in ~~- a directory that contains the input files that were used in the First Code Improvement run.~~

~~out~~ ~~- a directory that contains the output files that were generated in the First Code Improvement run.~~

~~src-bin~~ - a directory that contains the PARK and related fault application files used in the First Code Improvement run. The versions of this directory on turing and chapman also have the object files and executable binary file (named “~~park~~”).

~~Chapman~~ ~~is described at~~ ~~http://www.nas.nasa.gov/About/Profile/resources.html. Particulars relevent to this run (current as of November 2003) are:~~

~~Chapman~~, an SGI Origin 3000, is currently the only 1,024-processor single-image, shared-memory system in existence, with one operating system and a single address space. Chapman will become a major component of the IPG ~~(Information Power Grid), and is currently being used to demonstrate that applications can scale to 1~~,~~024 processors on this machine. The system has 128 gigabytes of main memory,~~ ~~and~~ ~~2 terabytes of FC Raid disk storage.~~

PARK References

Salmon, John K, and Michael S. Warren, Parallel out-of-core methods for N-body simulation. In Michael Heath, Virginia, Torczon. et. al., editiors, Eighth SIAM Conference on Parallel Processing for Scientific Computing, SIAM, 1997.

Michael, S. Warren, John K. Salmon, Donald J. Becker, M. Patrick Goda, Thomas Sterling, and Gregoire S. Winckelmas. PentiumPro Inside: I. a treecode at 430 Gflops on ASCI red, II. Price/performance of $50/Mflop on Loki and Hyglac. In Supercomputing, ’97, Los Alamos, 1997, IEEE Comp. Soc.

Code Description – GeoFEST

GeoFEST simulates stress evolution, fault slip and plastic/elastic processes in realistic materials. The products of such simulations are synthetic observable time-dependent surface deformation on scales from days to decades. Scientific applications of the code include the modeling of static and transient co- and post-seismic Earth deformation, Earth response to glacial, atmospheric and hydrological loading, and other scenarios involving the bulk deformation of geologic media.

Diverse types of synthetic observations will enable a wide range of data assimilation and inversion techniques for ferreting out subsurface structure and stress history. In the short term, such a tool allows rigorous comparisons of competing models for interseismic stress evolution, and the sequential GeoFEST system is being used for this at JPL and UC Davis. Parallel implementation is required to go from local, single-event models to regional models that cover many earthquake events and cycles.

GeoFEST uses stress-displacement finite elements to model stress and flow in a realistic model of the Earth's crust and upper mantle in a complex region such as the Los Angeles Basin. The model includes stress and strain due to the elastic response to an earthquake event in the region of the slipping fault, the time-dependent viscoelastic relaxation, and the net effects from a series of earthquakes. The physical domain may be two or three dimensional and may contain heterogeneous materials and an arbitrary network of faults. The physics models supported by the code include isotropic linear elasticity and both Newtonian and power-law viscoelasticity via implicit/explicit quasi-static time stepping. In addition to triangular, quadrilateral, tetrahedral and hexahedral continuum elements, the program supports split-node faulting, body forces and surface tractions.

The GeoFEST problem geometry for Milestone G is shown in Figure 1 below. Elastic and viscoelastic deformation is simulated for 16 million finite elements, for a duration of 1000 time steps. The significant increase is in the number of elements, which was about 55,000 for the baseline case and 1.4 million elements for the Milestone F code improvement (both of which also run 1000 time steps). Note that the baseline case was modeled on the 1994 Northridge earthquake, using a single fault of 300 square km in a domain of 240 x 240 x 100 km. The Milestone F demonstration case is based on the 1992 Landers event, using three closely arranged faults within an 865 square km area in a domain volume area of 1000 x 1000 x 60 km.] The Milestone G demonstration is still based on the 1992 Landers event – but now has higher mesh density near the faults, and 200 km depth, to reduce boundary effects for the viscoelastic relaxation.

GeoFEST

Figure 1: Finite element mesh LandersDeep for GeoFEST milestone G code improvement problem. Colors indicate partitioning among processors (limited to 16 processors in this image for clarity, although 490 processors were actually used). Partitions cluster near domain center due to the high mesh density that is used near the faults.

GeoFEST

Figure 2: Top view (X-Y surface plane) of LandersDeep mesh in Figure 1 (above). This is the central portion of the mesh that has been zoomed in to provide detail on the region of active, slipping faults. Since the number of elements per partition is roughly equal, smaller regions represent areas of highest element density where strain energy is largest near the faults. Every partition shown in the image, independent of color, is uniquely owned by a specific processor. For example, each of the four red partitions belongs to distinct processors.

GeoFEST

Figure 3: Detail of top view (X-Y surface plane) of the 16 million element LandersDeep mesh created by strain energy solution-adaptive refinement using PYRAMID. The close approach of two neighboring faults is shown. The previous code improvement Milestone F used the GuiVISCO sequential mesh generator to produce a ~1.4 million element mesh. The input mesh for Milestone G, also created with that tool, was limited to nearly 10 million elements. The mesh shown above required use of the parallel adaptive approach.

Figure 4: Top view of vertical displacement of elastic response of region to Landers earthquake based on Milestone G simulation using LandersDeep mesh.

Figure 5: GeoFEST simulated surface displacement from coseismic Landers model, displayed as InSAR fringes.

Figure 6: GeoFEST simulated postseismic surface displacement from Landers model after 500 years of viscoelastic relaxation (at the end of the GeoFEST Milestone G case of Table 1). Color scale of InSAR fringes is that of Figure 5.

GeoFEST simulates stress evolution, fault slip and plastic/elastic processes in realistic materials. The products of such simuluations are synthetic observable time-dependent surface deformation on scales from days to decades. Scientific applications of the code include the modeling of static and transient co- and post-seismic E~~earth deformation, Earth response to glacial, atmospheric and hydrological loading, and other scenarios involving the bulk deformation of geologic media.~~

~~GeoFEST uses stress-displacement finite elements to model stress and flow in a realistic model of the~~ Eearth's crust and upper mantle in a complex region such as the Los Angeles Basin. The model includes stress and strain due to the elastic response to an earthquake event in the region of the slipping fault, the time-dependent viscoelastic relaxation, and the net effects from a series of earthquakes. The physical domain may be two or three dimensional and may contain heterogeneous materials and an arbitrary network of faults. The physics models supported by the code include isotropic linear elasticity and both Newtonian and power-law viscoelasticity, via implicit/explicit quasi-static time stepping. In addition to triangular, quadrilateral, tetrahedral and hexahedral continuum elements, the program supports split-node faulting, body forces and surface tractions.

GeoFEST

Algorithm

GeoFEST reads in a tetrahedral mesh and information on boundary conditions, faulting events and variations in time. The equilibrium conditions are computed based on solution of the elastostatic equations through the finite element method. For the Milestone G computation, a metric of strain energy per element is used to indicate where the mesh needs to be refined. PYRAMID produces a refined mesh (partitioned among the processors), which becomes the basis for a new equilibrium elastic solution and the subsequent simulation. Then viscoelastic evolution of the stress field is computed based on an implicit technique applied on a series of time steps. User-indicated parts of the displacement and stress fields are collected to a single ordered file at requested time steps.

~~GeoFEST reads in a~~ tetrahedral mesh and information on boundary conditions, faulting events and variations in time. The equilibrium conditions are computed based on solution of the elastostatic equations through the finite element method. Then viscoelastic evolution of the stress field is computed based on an implicit technique applied on a series of time steps.

GeoFEST Numerical Methods

Details are found in the GeoFEST User’s Guide, which is posted at:

~~http://www.openchannelsoftware.org/projects/GeoFEST~~ ~~and~~

~~http://www-aig.jpl.nasa.gov/public/dus/quakesim/GeoFEST_User_Guide.pdfhttp://quakesim.jpl.nasa.gov/milestones.html~~. http://quakesim.jpl.nasa.gov/GeoFEST_User_Guide.pdf.

In brief, the elastostatic solution is computed using standard elastostatic finite elements, as found in (e.g.) Hughes (2000). The sparse positive definite system of equations is solved by the Ddiagonally pPreconditioned Conjugate Gradient (DPCG) method.

For the viscoelastic time steps we form a modified stiffness matrix and right-hand side terms according to the method of Hughes and Taylor (1978), which again results in a positive definite system. Each time-step solution is found by the same DPCG function.

Slip on faults is accommodated by a split node technique (Melosh and Raefsky, 1981) that modifies the right-hand side of the matrix system at nodes local to the fault that slips.~~Slip on faults is accommodated by a split node technique that modifies the right-hand side of the matrix system at nodes local to the fault that slips.~~

GeoFEST

Parallel Implementation

We have linked GeoFEST to the PYRAMID library

(http://www.openchannelsoftware.org/projects/PYRAMID) and rely on PYRAMID functions for parallel mesh refinement, as well as parallel domain partitioning and communication between nodes. The initial mesh (constructed according to the User's Guide) and a derived connectivity file (produced from the initial mesh by the application gfmeshparse, available with the GeoFEST software) are read by GeoFEST and mesh structures are passed to PYRAMID functions. PYRAMID is used for partition information, resulting in each processor having a compact segment of the finite element domain for its formation of the local part of the stiffness matrix and solution. PYRAMID is also used for combining these local solution estimates and conjugate gradient vectors at each iteration within of each time step using the "globalize" function that combines the local information and ensures each processor has valid data. When used with mesh refinement enabled, GeoFEST uses a strain energy per element threshold to mark the elements requiring refinement. All node-and element-based mesh properties (including split nodes) are written to PYRAMID attributes for preservation through the refinement process. PYRAMID refines the mesh and passes the mapped attributes back to GeoFEST, which updates its structures to represent the problem on the new mesh. The new mesh is used to solve again the problem at the current time step, and is the basis for the subsequent simulation steps. When the output displacement and stress fields are required, a final merging within PYRAMID allows GeoFEST to write a single result file with every node and element represented correctly and uniquely.

Performance is dominated by the elastic and viscoelastic iterative matrix solution. The scaling of this parallel iterative solution is explained in terms of component operations in the Milestone F report. PYRAMID refinement does not significantly affect scaling when done once, early in the simulation, as it is in the Milestone G computation. Overall scaling is shown in Figure 7 (the GeoFEST scaling plot).

We ~~have linked GeoFEST to the Pyramid library~~

(~~http://www.openchannelsoftware.org/projects/Pyramidhttp://www.openchannelsoftware.org/projects/Pyramid~~) ~~and rely on Pyramid functions for parallel domain partitioning~~ ~~and communication between nodes. This results in one change in processing, and some changes in the code.~~

~~There is now a preprocessing step before GeoFEST is run on parallel machines. This is a command-line invocation of an application~~ “~~gfmeshparse~~”~~, which derives a full connectivity description from the GeoFEST input file for PYRAMID’s use.~~ ~~Note that in the GeoFEST-4.3P download, this application is called “meshgen.”~~

~~Changes to GeoFEST are chiefly in the use of PYRAMID. PYRAMID is used~~ ~~for partition information~~, ~~(resulting in each processor having a compact segment of the finite element domain for it~~s ~~formation of the local part of the stiffness matrix and solution)~~., ~~PYRAMID is also used for~~ ~~combining these local solution estimates and conjugate gradient vectors at each iteration~~ ~~within~~ ~~of each time step~~ ~~using the~~ (““~~globalize” is the function that combines the local information and ensures each processor has valid data),~~. ~~and a~~A ~~final~~ ~~merging~~ ~~within PYRAMID~~ ~~that allows~~ ~~GeoFEST to write a single result file with every node and element represented correctly and uniquely.~~

Figure 5 shows how time is spent on four processors (stacked vertically) during the Conjugate Gradient iterations. Two iterations are shown, with blank (black) areas indicating the domination of numerical calculation. Red shows the “WAITALL” state that results when some processors finish local operations earlier than others and the matrix-vector product is combined across processors that have adjacent partition information (arrows show communication among these partition-adjacent processors). Violet indicates the “~~ALLREDUCE~~” ~~function of parallel communication that is required to globally combine parts of a vector dot product.~~

Beyond the interest in seeing the fingerprint of a parallel conjugate gradient operation, this image shows the dominance of computation over communication time (black >> colors), and the acceptable (but not perfect) load balance of the partitioned work.

~~Figure 5~~: ~~GeoFEST:~~ ~~Fine-detail image of the portion of a GeoFEST run that represents most of the computer time, indicating good parallel performance. Four processors are represented~~ a~~long the vertical axis,~~ ~~and~~ ~~time~~ ~~is represented~~ ~~along the horizontal axis~~ ~~(a 50 millisecond time clip is shown~~ ~~above~~)~~. Two iterations of the Conjugate Gradient algorithm, of the thousands of iterations making up this simulation, are shown. Three features indicate this algorithm will scale to very large problems: 1) The computational load (thin~~ ~~horizontal~~ ~~turquoise~~ ~~line on black background) is about the same for each processor. 2) The time spent in~~ ~~synchronization~~ ~~and communication (red and violet) is a small fraction of the total.~~ The white arrows indicate the inter-processor communication paths among processors where such communication occurs only as needed. (This explains why some processors spend more time in synchronization than others even though the fraction of time is small.) ~~3) (not visible in this plot) the fraction of time spent in communication does not grow when problem size grows proportional to the number of processors used.Fine-detail image of GeoFEST~~ ~~use of computer time during two iterations of the Conjugate Gradient solver on four processors using Jumpshot performance library.~~ ~~See text for details.~~

~~Figure 5 shows how time is spent on four processors (stacked vertically) during the Conjugate Gradient iterations.~~ ~~Two~~ ~~iterations are shown, with blank (black) areas indicating the domination of numerical calculation. Red shows the~~ ~~“WAITALL” state~~ ~~that results when some processors finish local operations earlier than others~~ ~~and the matrix-vector product is combined across~~ ~~processors which have adjacent partition information~~ ~~(arrows show communication among these partition-adjacent processors)~~. ~~Violet indicates the~~ ~~ALLREDUCE function of parallel communication that is required to globally combine parts of a vector dot product.~~

~~Beyond the interest in seeing the~~ ~~fingerprint of a parallel conjugate gradient operation, this image shows the dominance of computation over communication time~~ ~~(black >> colors), and the acceptable (but not perfect) load balance of the partitioned work.~~

DGeoFEST Documentation

This section will describe how to replicate the Milestone G case using mesh refinement, GeoFEST version 4.5G. Version 4.5 is also available and useful for validation and development purposes, but will not be discussed here.

Three files should be downloaded:

· the "GeoFEST User's Guide" (a pdf file, GeoFEST_User_Guide.pdf)

· the "GeoFEST 4.5(g) Addendum" (a pdf file, GF45Gdescription.pdf)

· the "GeoFest 4.5g" release (a compressed tar file, geofest_4_5g.tgz).

The GeoFEST User Guide and addendum can be downloaded from

http://quakesim.jpl.nasa.gov/GeoFEST_User_Guide.pdf and

http://quakesim.jpl.nasa.gov/GF45Gdescription.pdf

The GeoFEST code may be downloaded from:

http://www.openchannelsoftware.org/projects/GeoFEST

(follow the "GET IT!" link).

The User's Guide (with Addendum) covers the following. The Introduction describes the use of the finite element method for stress and deformation simulation for models of earthquake faults that include many of the complexities of crust and mantle materials. The Features section describes the kinds of physics and boundaries currently supported in GeoFEST. The Theory of Operation section covers the mathematical and computational basis of the GeoFEST simulations, including the mathematics of viscoelastic mechanics, the finite element formulation, the implicit time-stepping scheme, the split-node implementation for faults, and the basis of parallel computation. The Input/Output section describes the formats and meanings of the parts of the relevant files. The section titled Running GeoFEST includes compilation details and parallel execution. There is an annotated sample 2D input file (this corresponds to the "GeoFEST example" interactive link from the GeoFEST Open Channel page, mentioned above).

Finally there are two appendices. The first provides flow charts describing the basic organization of the GeoFEST code at the source level, including how GeoFEST links PYRAMID calls for parallel operation. The second describes all GeoFEST functional routines, organized by source file. The addendum contains additional descriptions and instructions relevant to the special 4.5(g) release, which is only supported for parallel computing application (unlike the general purpose 4.5 release intended for the wider community of users.

The compressed file “geofest_4_5g.tgz” may be unpacked using "tar xzf geofest_4_5g.tgz " (on UNIX systems) or the equivalent. This creates a directory GeoFEST-4.5g which includes the subdirectories geofest, PYRAMID, and a text guide "README." Follow the directions in “README” to complete the download and configure the GeoFEST compilation for your machine. One should note that additional libraries are necessary (PYRAMID and ParMetis, both of which are freely distributed at the links listed in “README”), and a soft link must be made to configure GeoFEST with PYRAMID. (The entire process should take just a few minutes). Note that several example files named “Makefile.*” are given in the geofest directory to support various systems and compilers. These may be used as examples for systems not yet supported (usually new compilers or parallel systems are a matter of choosing appropriate compiler flags and incorporating these into the make system).

The PYRAMID directory is empty, and is provided for creating the necessary soft links for the PYRAMID library (see "README" in the geofest directory).

Directions for running the Milestone G test case are below.

The GeoFEST users guide can be found at

http://www-aig.jpl.nasa.gov/public/dus/quakesim/GeoFEST_User_Guide.pdf

The GeoFEST code and validation case may be downloaded from:

http://www.openchannelsoftware.org/projects/GeoFEST

(follow the "GET IT!" link).

Two files should be downloaded:

·the "GeoFEST User's Guide" (a pdf file, GeoFEST_par_5P.pdf) and

·the "GeoFest 4.3p" (a compressed tar file, GeoFEST-4.3p.tgz).

Another helpful link from the download area is "GeoFEST example" in the left margin under "Additional resources." This link produces an html interactive guide to a simple GeoFEST input file, which illuminates the input format for GeoFEST.

The User's Guide covers the following. The Introduction describes the use of the finite element method for stress and deformation simulation for models of earthquake faults that include many of the complexities of crust and mantle materials. The Features section describes the kinds of physics and boundaries currently supported in GeoFEST. The Theory of Operation section covers the mathematical and computational basis of the GeoFEST simulations, including the mathematics of viscoelastic mechanics, the finite element formulation, the implicit time-stepping scheme, the split-node implementation for faults, and the basis of parallel computation. The Input/Output section describes the formats and meanings of the parts of the relevant files. The section titled Running GeoFEST includes compilation details and parallel execution. There is an annotated sample 2D input file (this corresponds to the "GeoFEST example" interactive link from the GeoFEST Open Channel page, mentioned above). Finally there are two appendices. The first provides flow charts describing the basic organization of the GeoFEST code at the source level, including how GeoFEST links Pyramid calls for parallel operation. The second describes all GeoFEST functional routines, organized by source file.

The compressed file “GeoFEST-4.3p.tgz” may be unpacked using "tar xzf GeoFEST-4.3p.tgz" (on UNIX systems) or the equivalent. This creates a directory GeoFEST-4.3p which contains subdirectories geofest, MeshGen, Pyramid, and a text guide "README." Follow the directions in “README” to complete the download and configure the GeoFEST compilation for your machine. One should note that additional libraries are necessary (Pyramid and ParMetis, both of which are freely distributed at the links listed in “README”), and a soft link must be made to configure GeoFEST with Pyramid. (The entire process should take just a few minutes). Note that several example files named “Makefile.*” are given in the geofest directory to support various systems and compilers. These may be used as examples for systems not yet supported (usually new compilers or parallel systems are a matter of choosing appropriate compiler flags and incorporating these into the make system).

The MeshGen directory contains a separate sequential program. In particular one will find FORTRAN 90 source GEN_GeoFEST.f90 and a Makefile with support for several FORTRAN 90 compilers. Successful compilation will result in an executable named "meshgen". It generates additional connectivity information from a provided mesh file, and this information is necessary for running Pyramid-enabled parallel GeoFEST. The MeshGen program must be run on the GeoFEST input file to create a second file (usually with matching prefix to the GeoFEST file and with suffix ".jpl") containing additional mesh information. Then both files are used in a subsequent parallel GeoFEST simulation. This use is described in the subsection "Running the GeoFEST parallel version" in the User's Guide, repeated here for convenience:

"Running the parallel version of GeoFEST is very similar to the sequential version, with two main differences. The first is that a .jpl file is required in addition to the regular GeoFEST input file in order for the parallel version to successfully execute. This file includes auxiliary geometry data needed by Pyramid to partition the mesh into subdomains. The second is that the output directory path must be specified if the user does not want the output to be written in the ./ directory (which is the default).

“One additional step must be taken with the input for the parallel version. The user will use the meshgen program to convert the regular GeoFEST input into an input file that the parallel code can use (with the .jpl extension). To invoke this program the user simply enters "meshgen", and is interactively prompted to enter the input filename. (Upon completion, filename gives rise to the new file “filename.jpl.”)"

The Pyramid directory is empty, and is provided for creating the necessary soft links for the Pyramid library (see "README" in the geofest directory).

Under the geofest directory is a subdirectory validation. The "README" file there describes using the meshgen program followed by GeoFEST, followed by a script "summa.pl" that writes a summary of the generated output file, allowing the user to check the correctness of the locally built system. The simulation uses the local file test.dat, which causes GeoFEST to perform ten time steps on the Landers three-fault geometry meshed with about 350,000 elements. It takes a few minutes to run.

GeoFEST

Scaling Analysis

GeoFEST processing for the benchmark case naturally divides into three phases:

· Input, including reading the mesh file and creating the mesh partition data for each processor;

· Elastic solution and mesh refinement, where the initial equilibrium state is computed;

· and Viscoelastic evolution, where a sequence of time steps follow the relaxation of shear stresses in viscous material.

Figure 6 shows where time is spent for the mMilestone G case (64490 processors, 1.46 million elements, 1,000 time steps)[2]. The preponderance of time is taken up by the viscoelastic time step portion of the code. It is clear that unless many more processors are used (or many fewer time steps of simulation), the scaling of the viscoelastic steps determines the time one may expect for a simulation to complete.

Text Box: Figure 6: The Milestone G GeoFEST run times for the model input, first elastic step after the earthquake, and the 1000 viscoelastic steps. Input is for the 10 M element LandersDeep mesh. Elastic/Refine/Elastic represents the time for an initial elastic solution, solution-based refinement, and the elastic solution on the 16 M element refined mesh.

GeoFEST

Code

Input

Time

(seconds)

Elastic

Time

(seconds)

Viscoelastic

Time

(seconds)

GeoFEST Milestone F

186.2

3.5

7,247.6

GeoFEST

Milestone G

1,273.8

169.7

Elastic/Refine/Elastic

41,838.0

Table 2. The Milestone G case has proportionally larger input time. But this confirms our original design concept: if we didn't have adaptive mesh refinement, we'd eventually be dominated by input time, because that's sequential. We could have started with a mesh like the 1.4 million element milestone F mesh, and reduced our input time to about the same (186 s); that would increase our Elastic/Refine/Elastic time (to reach 16 M elements), but probably not by more than a few hundred seconds. Such tradeoffs will be evaluated in future work.

A single Viscoelastic time step consists of construction of a stiffness matrix (and right-hand side), followed by solution of this matrix by the parallel conjugate gradient technique (many iterative steps). So the Viscoelastic segment of Figure 6 represents 1,000 time steps, each consisting of roughly 200 iterations of the sparse solver. One iterative step consists chiefly of a sparse distributed matrix vector multiply and three distributed vector dot products (plus some scalar-vector operations that may be neglected).

We show in Table 3 the time for completing the time step portion for several short (5 time step) runs of varying size, on Thunderhead. Problem sizes were constructed to approximate a constant number of elements in each processor for 4, 16, and 64 processors. Figure 7 shows the speed of solution for these cases: "work" is taken to be the operations involved in the iterative steps. This plot proves communication overhead is not growing faster than computational time as we scale the problem size with number of processors. The result is excellent scaling. Problems with > 4,000 elements per processor see negligible parallel overhead, and even smaller problems show substantial speed up with many processors.

Text Box: Table 3: Table 3 shows the problem sizes and run times of GeoFEST (for the viscoelastic simulation phase) on Thunderhead using different numbers of processors. In order to make this scaling study, the milestone problem was reduced to 5 time steps (rather than 1,000) These numbers form the basis for the plot in Figure 7. Figure 7 demonstrates that the GeoFEST scaling (on the dominating time-stepping loop) is excellent. Sufficiently large problems (such as the 16M element case) have trivial parallel overhead on 490 processors.

~~A single Viscoelastic time step consists of construction of a stiffness matrix (and right-hand side), followed by solution of this matrix by~~ ~~the~~ ~~parallel conjugate gradient technique (many iterative steps). So the Viscoelastic segment of Figure 6 represents 1~~,000 time steps, each consisting of roughly 200 iterations of the sparse solver. One iterative step consists chiefly of a sparse distributed matrix vector multiply and three distributed vector dot products (plus some scalar-vector operations that ~~may be~~ ~~neglect~~ed~~). In Figure 5, the portions represented by turquoise lines on black are the local operat~~i~~ons for the sparse matrix-vector product (the finite element domain has been partitioned among the processors in contiguous pieces). The red bars indicate~~ ~~synchronization~~ ~~and combining of the solutions where nodes are shared among adjacent processors. The violet represents the~~ ~~synchronization~~ ~~and communication that combines local dot products into full values across all processors.~~

~~The operations required for construction of the stiffness matrix are small. If we represent the number of elements as~~ ~~N_elts, and number of iterations as N_iters, stiffness construction operations number~~:

~~~3500*N_elts = ~5 billion for the~~ ~~N_elts~~ ~~= 1.4 million~~

~~compared to the operations required for a~~

~~N_iters~~ ~~= ~200 solution,~~

~~which has operation count~~

~300* ~~N_iters~~ ~~*300*~~ ~~N_elts~~ ~~= ~80 billion.~~

We show in Table 3 the time for completing the time step portion for several short (5 time step) runs of varying size, on Thunderhead. Problem sizes were constructed to approximate a constant number of elements in each processor for 4, 16, and 64 processors. Figure 7 shows the speed of solution for these cases: "work" is taken to be the operations involved in the iterative steps. This plot proves communication overhead is not growing faster than computational time as we scale the problem size with number of processors. The result is excellent scaling. Problems with > 4,~~000 elements per processor see negligible parallel overhead, and even smaller problems show substantial speed up with many processors.~~

GeoFEST

Figure 77: GeoFEST: Scaling of work (on linear scales) in GeoFEST time-step function with number of processors on three sizes of problems (on Thunderhead cluster computer, GSFC) plus two very large problems (on Cosmos cluster computer, NASA JPL). Blue indicates ideal scaling (from 4 processors). Expressing work in operations/wallclock time scaled by processor speed allows comparison of sizes (and platforms) in a single plot.

~~Figure 6:~~ ~~Scaling of work~~ ~~(on linear scales)~~ ~~in GeoFEST time-step function~~ ~~with number of processors on three sizes of problems (on Thunderhead cluster computer, GSFC). Blue indicates ideal scaling (from 4 processors).~~ ~~Expressing work in operations/wallclock time allows comparison of sizes in single plot.~~

~~Figure 6~~7 ~~demonstrates~~ ~~that the~~ ~~GeoFEST scaling~~ (~~on the dominating time-stepping loop~~) ~~is excellent. Sufficiently large problems (353,000 and e~~s~~xpecially 1.4 million elements shown here) have trivial~~ ~~parallel~~ ~~overhead on 64 processors.~~

~~More complete scaling analysis considers the time for problem set-up, elastic solution, and writing results to files. Benchmark-type problems with 1~~,~~000 time steps are dominated by the scaling behavior shown above, for the example problem (1.4 Million elements, 64 processors).~~

GeoFEST

Scientific and Computational Significance

GeoFEST has been demonstrated using adaptive mesh refinement based on an element-wise strain energy metric, which concentrates solution resources in the parts of the domain where high resolution is required. Because approximate solutions determine the final mesh structure, this allows higher accuracy solutions for larger domain sizes and greater faulting complexity for any available computer resource. This is a substantial and necessary step in allowing simulations of surface deformation in active tectonic areas, in that it allows accurate solutions on regional scales (such as Southern California), well beyond the scales of single fault segments used in past studies.~~The baseline problem was modeled after the Northridge earthquake (single thrust fault). Analysis of individual earthquake events with attention to their geographic settings is~~ ~~a long-term~~, ~~important area of simulation and research. But with the ability to solve domains with millions of elements implies we can~~ ~~simulate regions with multiple faults, such as the Los Angeles basin.~~ ~~Interactions among slipping faults and possible emergent structures from these nonlinear interactions appears to be the next advance in forecasting earthq~~ua~~ke risk.~~ This is a new and promising area for simulation, data comparison and testing of concepts. Many kinds of simulation codes are beginning to be employed for this new kind of work across the earthquake community,. b~~But a finite element code has close to the greatest degree of flexibility in including the effects of realistic structures in the~~ E~~earth in a heterogeneous domain.~~ W~~So we expect this improved GeoFEST will have unique value in validating these other simulations and determining when~~ ~~other~~ ~~multiple-fault models are leaving out too many~~ ~~material~~ ~~effects.~~

~~Computationally,~~ ~~we have demonstrated efficient parallel Conjugate Gradient solutions~~ ~~for 3-D~~ ~~faulted-system~~ ~~finite elements, and linked our method with the PYRAMID library.~~ ~~The parallel Conjugate Gradient is no surprise, as it has been demonstrated in other domains of physics. But~~ ~~a freely-available~~ ~~source~~ ~~code for faulted domains will be helpful to the US simulation effort.~~ Linking with PYRAMID is a convenient way to handle issues of partitioning and communication. More important, it is a first step to using the PYRAMID functions for parallel adaptive mesh refinement. Adaptive mesh refinement is essential to attaining high-quality solutions to problems with widely varying stress intensities in three dimensions. Parallel mesh refinement has the additional advantage of solving problems with size ~~commensurate~~ with the memory space of massively parallel computers without handling the associated mesh files with solid meshing programs, which are nearly all written for sequential machines. A domain can be described and meshed at a relatively low mesh density, imported to the parallel system, and then key locations in the mesh can be refined to the degree needed, expanding the number of elements by large factors (~~possibly~~ ~~hundreds or more).~~

GeoFEST SSimulation details

~~The GeoFEST code and milestone input file are available on the Goddard Space Flight Center machine~~ ~~Thunderhead~~. ~~Building the code~~ ~~on Thunderhead is accomplished by following these steps:~~

~~Note: Use the INTEL compiler by typing "mpi_env -p intel"~~

~~Get the code from ~nortonc/Milestone/GeoFEST.tgz~~

~~"tar xvzf GeoFEST.tgz" in your home directory ~/~~

~~"tcsh" to use tcsh~~

~~cd ~/GeoFEST/Pyramid-3D/Pyramid-3D/ParMetis/ and type "make"~~

~~cd ~/GeoFEST/Pyramid-3D/ and type "make -f Intel"~~

~~cd ~/GeoFEST/geofest/ and type "make -f Intel"~~

~~type "exit" to leave tcsh~~

~~This results in~~ ~~an executable program called GeoFEST in ~/GeoFEST/geofest~~.

~~Follow these steps to run the code on Thunderhead.~~

~~Start LACE Task manager with "ltmsuper"~~

~~Move executable to your run directory~~

~~cp ~/GeoFEST/geofest/GeoFEST ~/rhome/GeoFEST/geofest/GeoFEST~~

~~Setup input (input.dat and input.dat.jpl for 4, 16 and 64 processors)~~

~~cd ~/rhome/GeoFEST/geofest/~~

~~ln -s /data1/nortonc/Quakesim/GeoFEST/geofest/Visuals/LandersGr4.dat input.dat~~

~~ln -s /data1/nortonc/Quakesim/GeoFEST/geofest/Visuals/LandersGap4.dat.jpl input.dat.jpl~~

~~The~~ ~~16 processors case files are LandersGr16.dat and LandersGap16.dat.jpl~~

~~The~~ ~~64 processors case files are LandersGr64.dat and LandersGap64.dat.jpl~~

~~Request processors and run the code~~

~~cd ~/GeoFEST/geofest/~~

~~ltmbegin -n 4 -m 120~~

~~ltmpi -p 1 GeoFEST input.dat /data1/<your loginid>/<output directory>~~

~~ltmpi -p 1 GeoFEST input.dat /data1/<your loginid>/<output directory> >& /data1/<your loginid>/OUT.log~~

~~Note: -n means number of processors~~

~~-m means number of minutes requested~~

~~-p 1 means use one CPU per node (rather than 2 CPUs per node which is the default)~~

~~For the16 processor case use -n 16 -m 300~~

~~For the~~ ~~64 processors case us~~e ~~-n 64 -m 360~~

~~Follow these steps to interpret the results.~~

~~When the code runs~~, ~~some welcome and diagnostic messages will be printed to the screen.~~

~~Visualization and the conjugate gradient history files will be written in /data1/<your loginid>/<output directory>~~.

~~For scalability timings~~, ~~capture~~ ~~("grep -i pyramid OUT.log")~~ ~~the output that looks like this (the bracketed~~ ~~items are explanatory and the~~ ~~data will be on the screen and also in OUT.log if you chose to save it as suggested above):~~

~~PYRAMID Current Time: xxxx.xx [A. Loading Data Start]~~

~~PYRAMID Current Time: xxxx.xx [B. Elastic Soln Start]~~

~~PYRAMID Current Time: xxxx.xx [C. Time Stepping Start]~~

~~PYRAMID Current Time: xxxx.xx [D. Program Termination]~~

~~The total runtime will be in step D~~.

~~In order to examine scalability, the key number~~ is ~~the average time per iteration in the time stepping stage. Since the problem size grows~~ ~~somewhat proportionally to the number of processors in a scaled fashion 4, 16, 64, this~~ ~~number should look roughly~~ ~~constant~~ ~~across these problem sizes.~~

~~Compute the average time per iteration as~~ ~~(D-C)/X where X is the total number of iterations for each problem size~~

~~4 PE case: X = 83~~,~~256~~

1~~6 PE case: X = 132~~,~~429~~

~~64 PE case: X = 195~~,~~914~~

~~As an aside~~, ~~the baseline case ran in about 13 hours for a problem about the same size as the 4 PE case above, so for scalability relative to the baseline case the 4 PE case would have~~ ~~to run in less than 3 hours. Our experience has been that the 4 PE case can run in 1 hour (dual cpu mode)~~ ~~or less (single cpu mode).~~

~~Thunderhead~~ ~~is described at:~~

~~http://newton.gsfc.nasa.gov/thunderhead/index.htm. Particulars relevant to this run (current as of November 2003) are:~~

This section has two parts. The first describes how to run the milestone case on a unix/linux-based system with the Intel compilers and MPI based on the LSF queuing system. The second part contains details that prove that the milestone case was run successfully on such a system at JPL, the Intel Pentium Xeon-based cluster called Cosmos.

Part I

Building the code on Cosmos is accomplished by following these steps:

§ Download GeoFEST and PYRAMID from Open Channel Foundation

· http://openchannelfoundation.org/

§ Download ParMetis from University of Minnesota

· http://www-users.cs.umn.edu/~karypis/metis/parmetis/index.html/

§ "tar xvzf GeoFEST.tgz" in your home directory ~/

§ Load the Intel 7.x compiler and Myrinet modules

· module purge

· module load latest_intel71

· module load mpich-gm-intel71

§ Install ParMetis in ~/GeoFEST/Pyramid/Pyramid/ParMetis/

§ cd ~/GeoFEST/Pyramid/ and type "make -f Makefile.Intel"

§ cd ~/GeoFEST/geofest/ and type "make -f Makefile.Parallel"

This results in an executable program called GeoFEST in ~/GeoFEST/geofest.

Follow these steps to run the code on Cosmos.

§ Setup input (input.dat and input.dat.jpl for milestone meshes)

· ln –s /pathtofile/LandersDeep.dat input.dat

· ln –s /pathtofile/LandersDeep.dat.jpl input.dat.jpl

§ If these files are not present on Cosmos, they may be retrieved from http://www.servogrid.org/slide/GEM/SERVO/GeoFESTBenchmarks/ and expanded with the unzip utility.

§ Submit an LSF batch job

#!/bin/sh

#BSUB -x -n 490 -R "span[ptile=1]" -W 13:00

#BSUB -a mpich_gm

#BSUB -q long

#BSUB -J GeoFEST

#BSUB -o /tmp/QSMG.outfile -e /tmp/QSMG.errfile

date

# NOTE: LSF starts in the current working directory by default.

cd ~/GeoFEST/geofest

mpirun.lsf ./GeoFEST input.dat ~/

Note: -x –R "span[ptile=1]" means use one CPU per node

-n means number of processors requested

-W hh:mm means wall clock time requested in hours:minutes

Follow these steps to interpret the results.

§ When the code runs, some welcome and diagnostic messages will be printed to the screen.

§ Visualization and the conjugate gradient history files will be written in ~/

§ For scalability timings, capture ("grep -i pyramid QSMG.outfile") the output that looks like this (the bracketed items are explanatory and the data will be on the screen and also in QSMG.outfile if you chose to save it as suggested above):

PYRAMID Current Time: xxxx.xx [A. Loading Data Start]

PYRAMID Current Time: xxxx.xx [B. Elastic Soln Start]

PYRAMID Current Time: xxxx.xx [C. Time Stepping Start]

PYRAMID Current Time: xxxx.xx [D. Program Termination]

The total runtime will be in step D.

Cosmos is described at:

http://sc.jpl.nasa.gov/hardware/dell1750. Particulars relevant to this run are:

Cosmos is a 1024-processor Cluster located at the NASA/Jet Propulsion Laboratory. It features 512 dual-CPU 3.2Ghz Pentium 4 Xeons, 2TB memory, 48TB disk space and a 2+2Gpbs (i.e. bi-directional) myrinet fiber interconnection network.

Part II

The following are excerpts from the run script that demonstrate that the milestone was successfully met.

Transcript from LSF log file:

Sender: LSF System <lsfadm@n030>

Job <#!/bin/sh;#BSUB -x -n 490 -R "span[ptile=1]" -W 24:00;#BSUB -a mpich_gm;#BSUB -q test;#BSUB -o /scratch00/nortonc/Milestone/QSM.outfile -e /scratch00/nortonc/Milestone/QSM.errfile # my default stdout, stderr files; date;# NOTE: LS> was submitted from host <cosmos> by user <nortonc>.

Started at Fri Mar 25 18:06:19 2005

Results reported at Sat Mar 26 06:08:31 2005

Your job looked like:

------------------------------------------------------------

# LSBATCH: User input

#!/bin/sh

#BSUB -x -n 490 -R "span[ptile=1]" -W 24:00

#BSUB -a mpich_gm

#BSUB -q test

#BSUB -o /scratch00/nortonc/Milestone/QSM.outfile -e /scratch00/nortonc/Milestone/QSM.errfile # my default stdout, stderr files

date

# NOTE: LSF starts in the current working directory by default.

cd /home/nortonc/GeoFEST-4.5/geofest-MG

mpirun.lsf ./GeoFEST input.dat /scratch00/nortonc/Milestone

------------------------------------------------------------

Successfully completed.

Resource usage summary:

CPU time :21053728.00 sec.

Max Memory : 257131 MB

Max Swap : 361327 MB

Max Processes : 490

Max Threads : 490

The output (if any) follows:

Fri Mar 25 18:06:20 PST 2005

PYRAMID Unstructured AMR Library

DEMO Version 1.15, 3/25/2005 at 18 hrs 6 min 44 sec

Initialized: 490 processors using double precision reals

************************************************

Welcome to the Geophysical Finite Element Simulation Tool (GeoFEST 4.5) (./GeoFEST)

CPU time = 0.520000

Loading Data At: PYRAMID Current Time: 1.58786773681641D-004

Opening output file /scratch00/nortonc/Milestone/LandersDeep.out ...

landers geometry, two layers, three faults, deeper box

500 years, 8 print times

Number of nodes = 1902576

No surface tractions to process . . .

Split nodes all processed . . .

Generation of element group #1 complete.

Doing shape functions...

Element generation complete. Setting up linear algebra...

Equation setup completed.

Time data generation complete...

Starting Elastic Soln At: PYRAMID Current Time: 1273.76971983910

Starting elastic solution...

Initializing globalize comm buffers:

Calling SOLVER

Elastic solution complete.

Starting Save Attributes.

save_attributes complete.

Starting Adaptive Refinement.

Adaptive mesh refinement complete.

Starting Update Phase.

Number of nodes = 2840427

Global b.c. map generated...

Decomposed node and equation totals:

Global number of elements = 16021034

Split nodes all processed . . .

Generation of element group #1 complete.

Starting Cleanup...

Starting Final Free...

Starting elastic solution...

Initializing globalize comm buffers:

Calling SOLVER

Elastic solution complete.

CPU time = 585.400000

Starting Time_Step Soln At: PYRAMID Current Time: 1443.54365301132

Starting time-stepping algorithm...

Performing scheduled output at time = 2.000000 ...

Output step finished.

Performing scheduled output at time = 10.000000 ...

Output step finished.

Performing scheduled output at time = 50.000000 ...

Output step finished.

Performing scheduled output at time = 100.000000 ...

Output step finished.

Performing scheduled output at time = 200.000000 ...

Output step finished.

Performing scheduled output at time = 300.000000 ...

Output step finished.

Performing scheduled output at time = 400.000000 ...

Output step finished.

Performing scheduled output at time = 500.000000 ...

Output step finished.

Normal program termination.

Finishing Time_Step Soln At: PYRAMID Current Time: 43281.3791708946

Transcript from cghist (conjugate gradient history) file:

time=0 converged in 393 iterations norm reduced 9.88859e-08

starting=558122 , ending=0.0551904

time=0 converged in 429 iterations norm reduced 9.92227e-08

starting=710706 , ending=0.0705182

time=0.5 converged in 394 iterations norm reduced 9.87508e-08

starting=710648 , ending=0.070177

time=1 converged in 918 iterations norm reduced 9.8854e-08

starting=0.357578 , ending=3.5348e-08

time=1.5 converged in 799 iterations norm reduced 9.94214e-08

starting=0.149331 , ending=1.48467e-08

…

time=498 converged in 943 iterations norm reduced 9.94877e-08

starting=0.00420323 , ending=4.18169e-10

time=498.5 converged in 948 iterations norm reduced 9.91846e-08

starting=0.000210996 , ending=2.09276e-11

time=499 converged in 941 iterations norm reduced 9.92404e-08

starting=2.78625e-05 , ending=2.76508e-12

time=499.5 converged in 918 iterations norm reduced 9.76287e-08

starting=3.38485e-06 , ending=3.30459e-13

time=500 converged in 834 iterations norm reduced 9.84679e-08

starting=4.34575e-07 , ending=4.27916e-14

~~Thunderhead~~ ~~is a 512-processor Commodity Cluster located at NASA/Goddard Space Flight Center. It features 512 2.4Ghz Pentium 4 Xeons, 256Gb DDR memory, 20Tb disk space and 2.2Gpbs myrinet fiber interconnect.~~

~~Scientific Significance~~

Achieving the First Code Improvement Milestone is significant because it opens the way to run significant sized problems now that the earthquake code as well as the Fast Multipole library run in parallel under MPI. For the first time it presents to the scientific community fast parallel codes that allow creating simulations of the entire earthquake cycle in a 3D model that uses the most accurate description of fault friction, rate and state friction, and the quasi-dynamic radiation damping approximation to full elastodynamics. We now have the potential for greatly increasing the number of elements that can be included in the model over what could be done in the past.

This means that enough elements can now be used that is it possible to represent a reasonably sized fault with elements that are small enough that they can properly represent the behavior of a continuum. Larger numbers of elements also allow occurrence in the simulation of earthquakes with a large range of sizes. This means that it will be possible to study in the simulations in what situations small earthquakes occur in isolation and in what situations they may cascade or grow into larger ones. This could help gain an understanding of whether patterns of microseismicity might be used to help predict earthquakes. The attainment of this milestone not only represents an advance in our computational ability to simulate earthquakes, it will allow us to understand the earthquake process better by creating data sets that can be compared with data on real earthquakes. The attainment of the next milestone (Second Code Improvement) will involve increasing the efficiency of the code in other ways, now that the parallel implementation has been achieved, and this will allow even larger and more realistic simulations to be run.

~~Computational Significance~~

~~Demonstration of performance goals:~~

~~Code~~

~~Machine Wallclock Time~~

~~Processors~~

~~Date~~

~~Elements~~

~~Time Steps~~

~~PARK~~

~~Chapman (AMES)~~

~~7hours~~

~~52 minutes~~

~~43.32sec~~

~~256~~

~~August 15, 2003~~

~~150,000~~

~~5,000~~

~~GeoFEST~~

~~Thunderhead (GSFC)~~

~~2.8 hours~~

~~1,400,198~~

~~1,000~~

~~Chapman~~ ~~is described at http://www.nas.nasa.gov/About/Profile/resources.html~~ ~~. Parculars relevent to this run (current as of November 2003) are:~~ "~~Chapman~~, an SGI Origin 3000, is currently the only 1,024-processor single-image, shared-memory system in existence, with one operating system and a single address space. Chapman will become a major component of the IPG, and is currently being used to demonstrate that applications can scale to 1024 processors on this machine. The system has 128 gigabytes of main memory, 2 terabytes of FC Raid disk storage."

~~Thunderhead is described at http://newton.gsfc.nasa.gov/thunderhead/index.htm. Particulars~~ ~~relevent~~ ~~to this~~ ~~run (current as of November 2003) are:~~
"~~Thunderhead is a 512-processor Commodity Cluster located at NASA/Goddard Space Flight Center. It features 512 2.4Ghz Pentium 4 Xeons, 256Gb DDR memory, 20Tb disk space and 2.2Gpbs myrinet fiber~~ ~~interconnect.~~ "

GeoFEST RReferences

~~[1] Thomas J. R. Hughes and Robert Taylor, “Unconditionally Stable Algorithms For Quasi-Static Elasto-Plastic Finite Element Analysis.” Computers & Structures, Vol. 8, pp. 169-173, 1978.~~

~~[2] Thomas J. R. Hughes, “The Finite Element Method: Linear Static and Dynamic Finite Element Analysis.” Dover, Publication, INC., Mineola, New York, 2000.~~

Thomas J. R. Hughes and Robert Taylor, “Unconditionally Stable Algorithms For Quasi-Static Elasto-Plastic Finite Element Analysis.” Computers & Structures, Vol. 8, pp. 169-173, 1978.

Thomas J. R. Hughes, “The Finite Element Method: Linear Static and Dynamic Finite Element Analysis.” Dover, Publication, INC., Mineola, New York, 2000.

H. J. Melosh and Raefsky, “A Simple and Efficient Method for Introducing Faults into Finite Element Computations.” Bulletin of the Seismological Society of America, Vol. 71, No. 5, October, 1981.[3] H. J. Melosh and Raefsky, “A Simple and Efficient Method for Introducing Faults into Finite Element Computations.” Bulletin of the Seismological Society of America, Vol. 71, No. 5, October,October 1981.

[1] Runge-Kutta is a method for forward integration of differential equations that involves calculating derivatives of the functions at the current time and several fractions of potential time-steps in the future, appropriately weighting these derivatives estimating the best derivative value to use and determining the value of the function at the new time by multiplying that best derivative by the appropriate time-step. The fifth-order Runge Kutta method compares estimates made using two different time-steps and, based on this comparison, determines whether a smaller or larger time-step should be used for the next step. This allows for adaptive time-stepping which is extremely important in problems such as this where the time-steps can vary as much as ten orders of magnitude, depending on whether interseismic or a coseismic behavior is involved.

[2] For 8 of the 1,000 time steps, a snap-shot of full displacement data was captured to a file (as wads done in the baseline case for Milestone E).

2nd code improvement - further optimization for some codes, pick up others that were neglected in 1st improvement - documented source code made publicly available via the Web.

Team

Andrea Donnellan:

Principal Investigator

Technical Task Manager

818-354-4994

Jay Parker:

Overall Software Engineer

Geoffrey FoxTerry Tullis:

Information ArchitectFast Multipole Methods

Andrea Donnellan:

Database Design and Implementation

Jay Parker:

Overall Software Engineer

Marlon Pierce:

Code Interoperability Software Engineer

Geoffrey Fox:

Information Architect

Dennis McLeod:

Database Interoperability

John Rundle:

Pattern Recognizers

Gleb Morein:

Pattern Recognizers

Greg Lyzenga:

Finite Element Models

Technical Task Manager

Database Implementation

Technical Task Manager

Marlon Pierce:

Code Interoperability Software Engineer

Lisa Grant:

Fault Database Architect

Robert Granat:

Pattern Recognizers

Miryha Gould:

Population of Fault Database

949-824-5491TMaggi Glasscoeeresa Baker:

GeoFEST Code Verification

Charles Norton:

PYRAMID/GeoFEST

Overview

Overview

Problems Used to Demonstrate Codeing Improvement

SMilestone G Supporting Documents

The PARK Code

Algorithm

Numerical Methods

Parallel Implementation

Problem Description - PARK

Code Description - PARK

Documentation

Software documentation is summarized in this report and can be found, in full, in the subdirectories of the 1st_Code_Improv_Milestone directory that is accessible at the public URL http://www.servogrid.org/slide/GEM/PARK.

When the tar files are extracted and decompressed, compiling the libraries and the executable will create two files in the t17-7/Objfiles/IRIX64 directory, “libsw.a” and “mpmy_seq.o”. In addition, object files and the executable file (“park”) will be created in the src-bin directory.

Documentation: Software documentation Contained herein andis summarized in this report and is fully found in the subdirectories under the 1st_Code_Improv_Milestone directory in which this file is found.of the server which is accessible at the public URL http://www.servogrid.org/slide/GEM/PARK .

Algorithm

Numerical Methods

Parallel Implementation

Scaling Analysis

Scientific and Computational Significance

Simulation details

PARK References

Code Description – GeoFEST

GeoFEST

Algorithm

GeoFEST Numerical Methods

Parallel Implementation

DGeoFEST Documentation

This section will describe how to replicate the Milestone G case using mesh refinement, GeoFEST version 4.5G. Version 4.5 is also available and useful for validation and development purposes, but will not be discussed here.

Three files should be downloaded:

· the "GeoFEST User's Guide" (a pdf file, GeoFEST_User_Guide.pdf)

· the "GeoFEST 4.5(g) Addendum" (a pdf file, GF45Gdescription.pdf)

· the "GeoFest 4.5g" release (a compressed tar file, geofest_4_5g.tgz).

The GeoFEST User Guide and addendum can be downloaded from

http://quakesim.jpl.nasa.gov/GeoFEST_User_Guide.pdf and

http://quakesim.jpl.nasa.gov/GF45Gdescription.pdf

The GeoFEST code may be downloaded from:

http://www.openchannelsoftware.org/projects/GeoFEST

(follow the "GET IT!" link).

The PYRAMID directory is empty, and is provided for creating the necessary soft links for the PYRAMID library (see "README" in the geofest directory).