1994 Abstracts of MCS Reports and Preprints

	1994 Abstracts of MCS Reports and Preprints

Reports

C. Bischof, P. Khademi, and T. Knauff, "ADIFOR Working Note No. 11: ADIFOR Strategies Related to POINTER Usage in MM5," ANL/MCS-TM-187 (March 1994).	POINTERs are nonstandard Fortran statements which cannot be processed by ADIFOR. The authors were interested in generating derivative code for MM5, a mesoscale model code that uses POINTERS extensively and in a particular structured manner. This paper reports on POINTERS and their role in MM5 and, for their particular usage in MM5, describes the three-step code transformation scheme consisting of pre-ADIFOR, ADIFOR, and post-ADIFOR transformations that result in the generation of correct derivative code for MM5.

C. Bischof, S. Huss-Lederman, E. Jacobson, X. Sun, and A. Tsao, "The Impact of HPF Data Layout on the Design of Efficient and Maintainable Parallel Linear Algebra Libraries," ANL/MCS-TM-184 (March 1994).	In this document, we are concerned with the effects of data layouts for nonsquare processor meshes on the implementation of common dense linear algebra kernels such as matrix-matrix multiplication, LU factorizations, or eigenvalue solvers. In particular, we address ease of programming and tunability of the resulting software. We introduce a generalization of the torus wrap data layout that results in a decoupling of "local" and ``global'' data layout view. As a result, it allows for intuitive programming of linear algebra algorithms and for tuning of the algorithm for a particular mesh aspect ratio or machine characteristics. This layout is as simple as the proposed HPF layout but, in our opinion, enhances ease of programming as well as ease of performance tuning. We emphasize that we do not advocate that all users need be concerned with these issues. We do, however, believe that, for the foreseeable future, "assembler coding" (as message-passing code is likely to be viewed from a HPF programmers' perspective) will be needed to deliver high performance for computationally intensive kernels. As a result, we believe that the adoption of this approach not only would accelerate the generation of efficient linear algebra software libraries but also would accelerate the adoption of HPF as a result. We point out, however, that the adoption of this new layout would necessitate that an HPF compiler ensure that data objects are operated on in a consistent fashion across subroutine and function calls.

I. Foster, C. Kesselman, R. Olson, and S. Tuecke, "Nexus: An Interoperability Layer for Parallel and Distributed Computer Systems," ANL/MCS-TM-189 (May 1994).	Nexus is a set of services that can be used to implement various task-parallel languages, data-parallel languages, and message-passing libraries. Nexus is designed to permit the efficient, portable implementation of individual parallel programming systems and the interoperability of programs developed with different tools. Nexus supports lightweight threading and active message technology, allowing integration of message passing and threads.

W. Gropp and E. Lusk, "Users Guide for the ANL IBM SPx," ANL/MCS-TM-199 (December 1994).	This guide presents the features of the IBM SPx installed in the Mathematics and Computer Science Division at Argonne National Laboratory. The guide describes the available hardware and software, access policies, and hints for using the system productively.

W. Gropp, E. Lusk, and S. C. Pieper, "Users Guide for the ANL IBM SP1," ANL/MCS-TM-198 (October 1994).	This guide presents the features of the IBM SP1 installed in the Mathematics and Computer Science Division at Argonne National Laboratory. The guide describes the available hardware and software, access policies, and hints for using the system productively.

D. Levine, "A Parallel Genetic Algorithm for the Set Partitioning Problem," ANL-94/23 (May 1994).	This dissertation reports on efforts to develop a parallel genetic algorithm and apply it to the solution of the set partitioning problem--a difficult combinatorial optimization problem used by many airlines as a mathematical model for flight crew scheduling. The author developed a distributed steady-state genetic algorithm in conjunction with a specialized local search heuristic for solving the set partitioning problem. The algorithm is based on an island model where multiple independent subpopulations each run a steady-state genetic algorithm on their own subpopulation and occasionally fit strings migrate between the subpopulations. Tests on 40 real-world set partitioning problems were carried out on up to 128 nodes of an IBM SP1 parallel computer. 78 pages.

A. Mauer, "A Collection of Tools in Support of Automatic Differentiation," ANL/MCS-TM-185 (February 1994).	This document contains a collection of notes about tools that ANL researchers have found useful in work on automatic differentiation.

W. W. McCune, "OTTER 3.0 Reference Manual and Guide," ANL-94/6 (January 1994).	OTTER (Organized Techniques for Theorem-proving and Effective Research) is a resolution-style theorem-proving program for first-order logic with equality. OTTER includes the inference rules binary resolution, hyperresolution, UR-resolution, and binary paramodulation. Some of its other abilities and features are conversion from first-order formulas to clauses, forward and back subsumption, factoring, weighting, answer literals, term ordering, forward and back demodulation, evaluable functions and predicates, and Knuth-Bendix completion. OTTER is coded in C, is free, and is portable to many different kinds of computer.

W. McCune, "A Davis-Putnam Program and Its Application to Finite First-Order Model Search: Quasigroup Existence Problems," ANL/MCS-TM-194 (September 1994).	This document describes the implementation and use of a Davis-Putnam procedure for the propositional satisfiability problem. It also describes code that takes statements in first-order logic with equality and a domain size n then searches for models for size n. The first-order model-searching code transforms the statements into set of propositional clauses such that the first-order statements have a model of size n if and only if the propositional clauses are satisfiable. The propositional set is then given to the Davis-Putnam code; any propositional models that are found can be translated to models of the first-order statements. The first-order model-searching program accepts statements only in a flattened relational clause form without function symbols. Additional code was written to take input statements in the language of OTTER 3.0 and produce the flattened relational form. The program was successfully applied to several open questions on the existence of orthogonal quasigroups.

J. G. Michalakes and R. S. Nanjundiah, "Computational Load in Model Physics of the Parallel NCAR Community Climate Model," ANL/MCS-TM-186 (November 1994). Look here for the HTML version.	Maintaining a balance of computational load over processors is a crucial issue in parallel computing. For efficient parallel implementation, complex codes such as climate models need to be analyzed for load imbalances. In the present study we focus on the load imbalances in the physics portion of the community climate model's (CCM2) distributed-memory parallel implementation on the Intel Touchstone DELTA computer. We note that the major source of load imbalance is the diurnal variation in the computation of solar radiation. Convective weather patterns also cause some load imbalance. Land-ocean contrast is seen to have little effect on computational load in the present version of the model.

G. W. Pieper, synthesizer, "The QED Workshop," ANL/MCS-TM-191 (July 1994). Look here for the HTML version TM191.html.	On May 18-20, 1994, Argonne National Laboratory hosted the QED Workshop. The purpose of the workshop was to assemble a group of researchers to consider whether it is desirable and feasible to build a proof-checked encyclopedia of mathematics, with an associated facility for theorem proving and proof checking. Among the projects represented were Coq, Eves, HOL, ILF, Imps, MathPert, Mizar, NQTHM, NuPrl, OTTER, Proof Pad, Qu-Prolog, and RRL. Discussions focused primarily on political, sociological, practical, and aesthetic questions such as, Why do it? Who are the customers? How can we get mathematicians interested? What sort of interfaces are desirable?

K. D. Rayl and T. Gaasterland, "Overview of Selected Molecular Biological Databases," ANL/MCS-TM-200 (November 1994).	This paper presents an overview of the purpose, contents, and design of a subset of the currently available biological databases, with an emphasis on protein databases. Databases included in this summary are 3D_ALI, Berlin RNA databank, Blocks, DSSP, EMBL Nucleotide Database, EMP, ENZYME, FSSP, GDB, GenBank, HSSP, LiMB, PDB, PIR, PKCDD, ProSite, and SWISS-PROT. The goal is to provide a starting point for researchers who wish to take advantage of the myriad available databases. Rather than providing a complete explanation of each database, we present its content and form by explaining the details of typical entries. Pointers to more complete "user guides" are included, along with general information on where to search for a new database.

B. Smith, "Extensible PDE Solvers Package Users Manual," ANL-94/40.	This manual describes the design and use of the Extensible PDE Solvers package for the solution of elliptic PDEs. Currently, the package supports the solution of elliptic PDEs using either finite elements or finite differences in two or three dimensions on either structured or unstructured grids. The package is easily extended to new discretizations or classes of PDEs. Several classical direct and iterative methods, as well as several multigrid variants, may be used to solve the resulting linear systems.

Preprints

M. T. Jones and P. E. Plassmann, ``Scalable Iterative Solution of Sparse Linear Systems,'' Parallel Computing, 20 (1994), pp. 753-773. Also preprint MCS-P277. Look here for the DVI version.	This paper compares the performance of the preconditioned conjugate gradient method using coloring orderings with a number of standard orderings on matrices arising from finite element models. Because optimal colorings for these systems may not be known a priori, the authors use a graph coloring heuristic to obtain consistent colorings. Based on lower bounds obtained from the local structure of these systems, the colorings determined by the heuristic are nearly optimal. The increase in parallelism afforded by the coloring-based orderings more than offsets any increase in the number of iterations required for the convergence of the conjugate gradient algorithm Results from the Intel iPSC/860 are given.

J. N. Lyness and R. Cools, "A Survey of Numerical Cubature over Triangles," Proceedings of Symposia in Applied Mathematics, Vol. 48, 1994, AMS. Also preprint MCS-P410.	This survey collects theoretical results in the area of numerical cubature over triangles and is a vehicle for a current bibliography. The pare first treats the theory relating to regular integrands, and then the corresponding theory for singular integrands with emphasis on the full corner singularity. Within these two sections, the author treats approaches based on transforming the triangle into a square, formulas based on polynomial moment fitting, and extrapolation techniques. Within each category, key theoretical results are cited without proof and are related to other results and references. The survey is theoretical and does not include recent work in adaptive and automatic integration.

J. N. Lyness, "Finite-Part Integrals and The Euler-Maclaurin Expansion," International Series of Numerical Mathematics, Vol. 119, 1994, Birkhauser. Also preprint MCS-P411.	The context of this note is the discretization error made by the m-panel trapezoidal rule when the integrand has an algebraic singularity at one end, say x = 0, of the finite integration interval. In the absence of a singularity, this error is described by the classical Euler-Maclaurin summation formula, which is an asymptotic expansion in inverse integer powers of m. When an integrable singularity (x**alpha with alpha > -1) is present, Navot's generalization is valid. This introduces negative fractional powers of m into the expansion. Ninham has generalized this result to noninteger alpha satisfying alpha less than -1. In this note, we extend these results to all alpha by providing the nontrivial extension to negative integer alpha. This expansion differs from the previous expansions by the introduction of a term log m.

C. Bischof, X. Sun, and B. Lang, "Parallel Tridiagonalization through Two-Step Band Reduction," MCS-P412-0194.	The authors present a two-step variant of the successive band reduction paradigm for the tridiagonalization of symmetric matrices. They reduce a full matrix first to narrow-banded form and then to tridiagonal form. The first step allows easy exploitation of block orthogonal transformations. In the second step, they use a blocked version of a banded matrix tridiagonalization algorithm by Lang. They are able to express the update of the orthogonal transformation matrix in terms of block transformations. This expression leads to an algorithm that is almost entirely based on BLAS-3 kernels and has greatly improved data movement and communication characteristics. Results on the Delta and SP1 are given.

C. Bischof, S. Huss-Lederman, X. Sun, A. Tsao, and T. Turnbull, "Parallel Performance of a Symmetric Eigensolver based on the Invariant Subspace Decomposition Approach," MCS-P413-0194.	This paper discusses work on a complete eigensolver based on the Invariant Subspace Decomposition Algorithm for dense symmetric matrices. The authors describe a recently developed acceleration technique that substantially reduces the overall work required by this algorithm. They also review the algorithmic highlights of a distributed-memory implementation of this approach, including a fast matrix-matrix multiplication algorithm, a new approach to parallel band reduction and tridiagonalization, and a harness for coordinating the divide-and-conquer parallelism in the problem. Performance results are presented for the dominant kernel as well as for the overall implementation on the Delta and Paragon.

C. H. Bischof, G. J. Whiffen, C. A. Shoemaker, A. Carle, and A. A. Ross, "Application of Automatic Differentiation to Groundwater Transport Models," MCS-P414-0194.	Automatic differentiation (AD) is a technique for generating efficient and reliable derivative codes from computer programs with a minimum of human effort. Derivatives of model output wit respect to input are obtained exactly. No intrinsic limits to program length or complexity exist for this procedure. Calculation of derivatives of complex numerical models is required in systems optimization, parameter identification, and systems identification. We report on our experiences with the ADIFOR (Automatic Differentiation in Fortran) tool on a two-dimensional groundwater flow and contaminant transport finite-element model, ISOQUAD, and a three-dimensional contaminant transport finite-element model, TLS3D. Derivative values and computational times for the automatic differentiation procedure are compared with values obtained from the divided differences and handwritten analytic approaches. We found that the derivative codes generated by ADIFOR provided accurate derivatives and ran significantly faster than divided-differences approximations, typically in a tenth of the CPU time required for the imprecise divided-differences method for both codes. We also comment on the benefit of automatic differentiation technology with respect to accelerating the transfer of general techniques developed for using water resource computer models, such as optimal design, sensitivity analysis, and inverse modeling problems, to field problems.

L. Freitag, M. Jones, and P. Plassmann, "New Advances in the Modeling of High-Temperature Superconductors," MCS-P415-0294.	This paper presents a new discrete formulation that maintains discrete invariance and is suitable for use on nonorthogonal meshes. The formulation, unlike its predecessors, allows one to easily use adaptive mesh refinement to concentrate grid points where error contributions are large (near vortex centers). In this way the total number of grid points required to accurately capture vortex configurations is reduced. The authors have developed scalable libraries for adaptive mesh refinement and partitioning on two-dimensional triangular grids. They also present a new geometric partitioning algorithm that strives to minimize communication cost by ensuring good partition aspect ratios. The paper includes computational results showing the efficiency of these adaptive techniques for a superconductor application.

J. M. Boyle, "Automatic, Self-adaptive control of Unfold-Fold Transformations," MCS-P416-0294.	I describe an automated approach to partial evaluation based transformations for elementary simplifications and unfolding and folding. The approach emphasizes program algebra and relies on canonical forms and distributive laws to expose instances to which the elementary simplifications apply. This approach to partial evaluation has been applied to a number of practical examples of moderate complexity, including eliminating a data structure from a partial-differential-equation solver.

K. D. Rayl, T. Gaasterland, and R. Overbeek, "Automating the Determination of 3D Protein Structure," MCS-417-0294.	The creation of an automated method for determining 3D protein structure would be invaluable to the field of biology and presents an interesting challenge to computer science. Unfortunately, given the present level of protein knowledge, a completely automated solution method is not feasible; therefore, the Argonne group has decided to integrate existing databases and theories to create a software system that assists X-ray crystallographers in specifying a particular protein structure.

S. Wright and Y. Zhang, "A Superquadratic infeasible-interior-Point Method for Linear Complementarity Problems," MCS-P418-0294.	This paper presents a modification of a path-following infeasible-interior-point algorithm described by Wright. The new algorithm improves each new iterate by reusing the coefficient matrix factors from the latest step. The authors show that the modified algorithm has similar theoretical global convergence properties to the earlier algorithm, while its asymptotic convergence rate can be made superquadratic by an appropriate parameter choice.

C. H. Bischof, L. L. Green, K. J. Haigler, and T. L. Knauff, Jr., "Parallel Calculation of Sensitivity Derivatives for Aircraft Design Using Automatic Differentiation," MCS-P419-0294, Part 1 & Part 2.	Sensitivity derivative (SD) calculation via automatic differentiation typical of that required for the aerodynamic design of a transport-type aircraft is considered. Two ways of computing SDs via code generated by the ADIFOR automatic differentiation tool are compared for efficiency and applicability to problems involving large numbers of design variables. A vector implementation on a CRAY Y-MP computer is compared with a coarse-grained parallel implementation on an IBM SP1 computer, employing a Fortran M wrapper. The SDs are computed for a swept transport wing in turbulent, transonic flow; the number of geometric design variables varies from 1 to 60, with coupling between a wing grid generation program and a state-of-the-art, 3-D computational fluid dynamics program, both augmented for derivative computation via AD. For a small number of design variables, the CRAY Y-MP implementation is much faster. As the number of design variables grows, however, the SP1 becomes an attractive alternative in terms of compute speed, job turnaround time, and total memory available for solutions with large numbers of design variables. The coarse-grained parallel implementation also can be moved easily to a network of workstations.

C. Bischof and A. Griewank, "Computational Differentiation and Multidisciplinary Design," MCS-P420-0294.	Multidisciplinary design optimization (MDO) by means of formal sensitivity analysis requires that each single-discipline analysis code supply not only the output functions for the (usually constrained) optimization process and other discipline analysis inputs, but also the derivatives of all of these output functions with respect to its input variables. Computational differentiation techniques and automatic differentiation tools enable MDO by providing accurate and efficient derivatives of computer programs with little human effort. This paper discusses the principles behind automatic differentiation and gives an overview of AD tools and how they can be used for sparse Jacobians and for exploiting parallelism. The authors show how AD applied to iterative solvers delivers the mathematically desired derivatives. They then show how derivatives that can be feasibly obtained by computational differentiation techniques can lead to improved solution schemes for nonlinear coupled systems and multidisciplinary design optimization.

M. T. Jones and P. E. Plassmann, ``Parallel Algorithms for Adaptive Mesh Refinement,'' SIAM J. on Scientific Computing (to appear). Also Preprint MCS-P421-0494.	Computational methods based on the use of adaptively constructed nonuniform meshes reduce the amount of computation and storage necessary to perform many scientific calculations. The adaptive construction of such nonuniform meshes is an important part of these methods. In this paper, we present a parallel algorithm for adaptive mesh refinement that is suitable for implementation on distributed-memory parallel computers. Experimental results obtained on the Intel DELTA are presented to demonstrate that, for scientific computations involving the finite element method, the algorithm exhibits scalable performance and has a small run time in comparison with other aspects of the scientific computations examined. It is also shown that the algorithm has a fast expected running time under the P-RAM computation model.

I. Foster and M. Xu, "Libraries for Parallel Paradigm Integration," MCS-P422-0394.	Most existing programming languages and tools are designed to support either task parallelism or data parallelism. Yet many applications can benefit from a combination of both. This paper presents a small set of extensions to Fortran, called Fortran M, that provides a framework on which portable, reusable libraries can be built to implement programming paradigms. Here the authors illustrate the approach by showing how it is used to implement libraries for file I/O, message passing, and data-parallel computation.

J. M. Restrepo and G. K. Leaf, "Inner Product Computations Using Periodized Daubechies Wavelets," International Journal for Numerical Methods in Engineering 40 (1997), pp. 3557-3578. Also Preprint MCS-P423-0394 (revised 1997).	Inner properties of wavelets and their derivatives are presently known as connection coefficients. The numerical calculation of inner products of periodized Daubechies wavelets and their derivatives is reviewed, with the aim at providing potential users of the publicly-available numerical scheme, details of its operation. The numerical scheme for the calculation of connection coefficients is evaluated in the context of approximating differential operators, information which is useful in the solution of partial differential equations using wavelet-Galerkin techniques. Specific details of the periodization of inner products in the solution differential equations are included in this presentation.

I. Foster and J. Michalakes, "MPMM: A Massively Parallel Mesoscale Model," MCS-P424-0494.	In this paper we report on a project at Argonne National Laboratory to parallelize the Penn State/NCAR Mesoscale Model version 5 using a fine grain decomposition dynamically mapped and managed under PCN.

M. Henderson, B. Nickless, and R. Stevens, "A Scalable High-Performance I/O System," MCS-P425-0494.	A significant weakness of many existing parallel supercomputers is their lack of high-performance parallel I/O. This weakness has prevented, in many cases, the full exploitation of the true potential of MPP systems. As part of a joint project wit IBM, we have designed a parallel I/O system for an IBM SP system that can provide sustained I/O rates of greater than 160 MB/s from collections of compute nodes to archival disk and peak transfer rates that should exceed 400 MB/s from compute nodes to I/O servers. This testbed system will be used for a number of projects. First, it will provide a high-performance experimental I/O system for traditional computational science applications; second, it will be used as an I/O software and development environment for new parallel I/O algorithms and operating systems support; and third, it will be used as the foundation for a number of new projects designed to develop enabling technology for the National Information Infrastructure. This report describes the system under development at Argonne National Laboratory, provides some preliminary performance results, and outlines future experiments and directions.

I. T. Foster and P. H. Worley, "Parallel Algorithms for the Spectral Transform Method," SIAM J. Sci. Comput. 18, no. 3 (May 1997), pp. 806-837. Also Preprint MCS-P426-0494.	The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Recent research has identified several promising algorithms for implementing this method on massively parallel computers; however, no detailed comparison of the different algorithms has previously been attempted. In this paper, we describe these different parallel algorithms and report on computational experiments that we have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations on a sphere; considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. We focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional FFTs and other parallel transforms. Look here for the preprint MCS-P426-0494.

L. Wos, "The Problem of Strategy and Hyperresolution," MCS-P427-0494.	The problem proposed for research asks one to find a strategy that can be couple d with the inference rule hyperresolution to control the behavior of an automate d reasoning program as effectively as does paramodulation.

L. Wos, "The Problem of Hyperparamodulation," Journal of Automated Reasoning 12 (1994), pp. 265-271. Also preprint MCS-P428.	The problem for research asks one to establish criteria for allowing certain---but not all---new clauses to become nuclei when using the inference rule hyperparamodulation or hyperresolution.

J. J. More', "Global Methods for Nonlinear Complementarity Problems," MCS-P429-0494.	Global methods for nonlinear complementarity problems formulate the problem as a system of nonsmooth nonlinear equations approach or use continuation to trace a path defined by a smooth system of nonlinear equations. This paper formulates the nonlinear complementarity problem as a bound-constrained nonlinear least squares problem. Algorithms based on this formulation are applicable to general nonlinear complementarity problems and can be started from any nonnegative starting point. Each iteration requires the solution of systems of linear equations. Convergence to a solution of the nonlinear complementarity problem is guaranteed under reasonable regularity assumptions. The converge rate is Q-linear, Q-superlinear, or Q-quadratic, depending on the tolerances used to solve the subproblems.

L. Wos, "The Field of Automated Reasoning," MCS-P430-0494.	This article is the introduction to a special issue on automated reasoning, to appear in Computers and Mathematics with Applications in late 1994.

X. Sun and C. Bischof, "A Basis-Kernel Representation of Orthogonal Matrices," SIAM J. Matrix Anal. Appl. 16, no. 4 (October 1995), pp. 1184-1196. Also Preprint MCS-P431-0594.	In this paper we introduce a new representation of orthogonal matrices. We show that any orthogonal matrix can be represented in the form Q = I - YSY^T, which we call the basis-kernel representation of Q. We show that the kernel S can be chosen to be triangular and show how the familiar representation of an orthogonal matrix as a product of Householder matrices can be directly derived from a representation with triangular kernel. We also show that there exists an, in some sense, minimal orthogonal transformation for solving the block elimination problem. We explore how the basis Y determines the subspaces that Q acts on in a nontrivial fashion, and how S determines the way Q acts on this subspace. We derive a canonical representation that explicitly shows how Q partitions R^n into three invariant subspaces in which it acts as the identity, a reflector, and a rotator, respectively. We also derive a generalized Cayley representation for arbitrary orthogonal matrices, which illuminates the degrees of freedom we have in choosing orthogonal matrices acting on a predetermined subspace.

I. T. Foster and D. W. Walker, "Paradigms and Strategies for Scientific Computing on Distributed-Memory Concurrent Computers," MCS-P432-0594.	This paper examines recent advances in parallel languages and abstractions that have the potential for improving the programmability and maintainability of large-scale parallel scientific applications running on high-performance architectures and networks. This paper focuses on a Fortran M implementation of a particle-in-cell plasma simulation application and discusses issues in the optimization of the code. The use of two other methodologies for parallelizing the PIC application are considered. The first is based on the shared object abstraction as embodied in the Orca language. The second is the Split-C language. in Fortran M, Orca, and split-C, the ability of the programmer to control the granularity of communication is important in designing an efficient implementation.

I. Foster, M. Xu, B. Avalani, and A. Choudhary, "A Compilation System That Integrates High Performance Fortran and Fortran M," MCS-P433-0594.	This paper describes a system under develop to explore the integration of task parallelism and data parallelism. The system builds on previous work on task-parallel and data-parallel Fortran compilers to provide an environment in which the task-parallel language Fortran M can be used to coordinate data-parallel High Performance Fortran. An image-processing problem is used to illustrate issues that arise when building an integrated system of this sort.

S. K. Park, K. K Droegemeier, C. Bischof, and T. Knauff, "Sensitivity analysis of Numerically Simulated Convective Storms Using Direct and Adjoint Methods," MCS-P434-0594.	This paper describes an alternative, essentially automated method (ADIFOR) for obtaining the solution gradient that bypasses the adjoint altogether yet provides even more information about the gradient. The paper assesses the utility of ADIFOR relative to the brute force approach applied to a 1-D moist cloud model. Also evaluated is the validity of the tangent linear approximation in the context of deep convection.

A. Carle, L. L. Green, C. H. Bischof, and P. A. Newman, "Applications of Automatic Differentiation in CFD," MCS-P435-0594.	This paper discusses the use of automatic differentiation as a means of extending iterative CFD codes with sensitivity derivatives. In particular, the ADIFOR automatic differentiator has been applied to the 3D, thin-layer Navier-Stokes multigrid flow solver TLNS3D coupled with the WTCO wing grid generator. Results of a sequence of efforts in which TLNS3D have been successfully augmented to compute a variety of sensitivities are presented. It is shown that sensitivity derivatives can be obtained accurately and efficiently using ADIFOR, although significant advances are necessary for the efficiency of ADIFOR-generated derivative code to become truly competitive with hand-differentiated code.

W. R. McKinney, J. M. Restrepo, and J. L. Bona, "Unstable Solitary-Wave Solutions of the Generalized Benjamin-Bona-Mahony Equation," MCS-P436-0594.	The evolution of solitary waves of the gBBM equation is investigated computationally. The experiments confirm previously derived theoretical stability estimates and, more importantly, yield insights into their behavior. For example, highly energetic unstable solitary waves, when perturbed, are shown to evolve into several stable solitary waves.

J. N. Lyness and I. H. Sloan, "Cubature Rules of Prescribed Merit," SIAM J. Numer. Anal. 34, no. 2 (April 1997), pp. 586-602. Also Preprint MCS-P437-0594.	This paper introduces a criterion for the evaluation of multidimensional quadrature, or cubature, rules for the hypercube: this is the {\it merit} of a rule, which is closely related to its trigonometric degree, and which reduces to the Zaremba figure of merit in the case of a lattice rule. The authors derive a family of rules Q sub k sup s having dimension s and merit 2 sup k. These rules seem to be competitive with lattice rules with respect to the merit that can be achieved with a given number of abscissas.

P. H. Worley and I. T. Foster, "Parallel Spectral Transform Shallow Water Model: A Runtime-Tunable Parallel Benchmark Code," MCS-P438-0594.	This paper describes a code, PSTSWM, in which algorithmic options were embedded, allowing it to be tuned for a particular machine without requiring code modifications. PSTSWM was developed for evaluating parallel algorithms for the spectral transform method in atmospheric circulation models. Many levels of runtime-selectable algorithmic options are supported. The paper discusses these options and the evaluation methodology. Also provided are empirical results from a number of parallel machines, indicating the importance of tuning for each platform before making a comparison.

I. T. Foster and B. R. Toonen, "Load-Balancing Algorithms for Climate Models," MCS-P439-0594.	Implementations of climate models on scalable parallel computer systems can suffer from load imbalances resulting from temporal and spatial variations in the amount of computation required for physical parameterizations such as solar radiation and convective adjustment. The authors have developed specialized techniques for correcting such imbalances. These techniques are incorporated in a general-purpose, programmable load-balancing library that allows the mapping of computation to processors to be specified as a series of maps generated by a programmer-supplied load-balancing module. The communication required to move from one map to another is performed automatically by the library, without programmer intervention. The authors describe this work, discuss specific load-balancing algorithms they have developed for PCCM2, and present experimental results that demonstrate the effectiveness of these algorithms on parallel computers.

I. Foster, "Task Parallelism and High-Performance Languages," IEEE Parallel and Distributed Technology, 2(3) (1994), pp. 39-48. Also Preprint MCS-P440-0594.	This paper examines and illustrates the considerations that motivate the use of task parallelism. Also described is one particular approach to task parallelism in Fortran, namely, the Fortran M extensions. Finally, the authors contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages.

G. J. Whiffen, C. A. Shoemaker, C. H. Bischof, A. A. Ross, and A. Carle, "Application of Automatic Differentiation to Groundwater Transport Models," MCS-P441-0594.	Automatic differentiation is a technique for generating efficient and reliable derivative codes from computer programs with minimal human effort. Derivatives of model output with respect to input are obtained exactly. No intrinsic limits to program length or complexity exist for this procedure. Calculation of derivatives of complex numerical models is required in system optimization, parameter identification, and systems identification. We report on our experiences with the ADIFOR (automatic Differentiation of Fortran) tool on a two-dimensional groundwater flow and contaminant transport finite-element model, ISOQUAD, and a three-dimensional contaminant transport finite-element model, TLS3D. Derivative values and computational times for the automatic differentiation procedure are compared with values obtained from the divided differences and handwritten analytic approaches. The automatic differentiation tool ADIFOR produced derivative codes that calculated exact derivatives in typically almost an order of magnitude less CPU time than what is required for the imprecise divided differences method for both the two- and three-dimensional codes. We also comment on the benefit of automatic differentiation technology with respect to accelerating the transfer of general techniques developed for using water resource computer models (such as optimal design, sensitivity analysis, and inverse modeling problems) to field applications.

Z. Wu, "The Effective Energy Transformation Scheme as a Special Continuation Approach to Global Optimization with Application to Molecular Conformation," MCS-P442-0694.	This paper discusses a generalization of the function transformation scheme for global energy minimization applied to the molecular conformation problem. A mathematical theory for the method as a special continuation approach to global optimization is established. We show that the method can transform a nonlinear objective function into a class of gradually deformed, but smoother or easier functions. an optimization procedure can then be applied to the new functions successively, to trace their solutions back to the original function. Two types of transformation are defined: isotropic and anisotropic. We show that both transformations can be applied to a large class of nonlinear partially separable functions including energy functions for molecular conformation. methods to compute the transformation for these functions are given.

T. F. Coleman and Z. Wu, "Parallel Continuation-Based Global Optimization for Molecular Conformation and Protein Folding," MCS-P443-0694.	This paper presents recent work on parallel algorithms and software for solving the global minimization problem for molecular conformation, especially protein folding. To avoid directly minimizing a difficult function, the authors use a special integral transformation to transform the function into a class of gradually deformed, but smoother functions. An optimization procedure is then applied to the new functions successively, to trace their solutions back to the original function. Mathematical theory for the method is established. Different levels of parallelism are exploited for the implementation of the algorithms on massively parallel architectures.

L. Freitag, M. Jones, and P. Plassmann, "New Techniques for Parallel Simulation of High-Temperature Superconductors," MCS-P444-0594.	The authors discuss several new techniques used for the simulation of high-temperature superconductors on parallel computers. They introduce an innovative methodology to study the effects of temperature fluctuations on the vortex lattice configuration of these materials. They find that the use of uniform orthogonal meshes results in several limitations. To address these, they consider nonorthogonal meshes and describe a new discrete formulation that solves the difficult problem of maintaining gauge invariance on nonorthogonal meshes. With this discretization, adaptive refinement strategies are used to concentrate grid points where error contributions are large (in this case, near vortex cores). They describe the algorithm used for the parallel implementation of this refinement strategy, and they present computational results on the Intel DELTA.

M. T. Jones and P. E. Plassmann, "Parallel Algorithms for the Adaptive Refinement and Partitioning of Unstructured Meshes," MCS-P445-0594.	The efficient solution of many large-scale scientific calculations depends on adaptive mesh strategies. This paper presents new parallel algorithms to solve two significant problems that arise in this context: the generation of the adaptive mesh and the mesh partitioning. The crux of the refinement algorithm presented here is the identification of independent sets of elements that can be refined in parallel. Runtime results on the DELTA are given, demonstrating that the algorithms exhibit scalable performance and have run times that are small in comparison with other aspects of the computation.

S. Wright, "Stability of Augmented System Factorizations in Interior-Point Methods," MCS-P446-0694.	Some implementations of interior-point algorithms obtain their search directions by solving symmetric indefinite systems of linear equations. The conditioning of the coefficient matrices in these so-called augmented systems deteriorates on later iterations, as some of the diagonal elements grow without bound. Despite this apparent difficulty, the steps produced by standard factorization procedures are often accurate enough to allow the interior-point method to converge to high accuracy. When the underlying linear program is nondegenerate, we show that convergence to arbitrarily high accuracy occurs, at a rate that closely approximates the theory. We also explain and demonstrate what happens when the linear program is degenerate, where convergence to acceptable accuracy (but not arbitrarily high accuracy) is usually obtained.

R. C. Brown and M. K. Kwong, "On the Best Constant for the Inequality," MCS-P447-0694.	This paper identifies the best constant for a specific integro-differential inequality. The approach consists of reducing the problem to various equivalent inequalities on a finite interval and determining necessary conditions on the extremals. The best constant is shown to satisfy an algebraic equation that is solved exactly with the help of MAPLE. The best constants for several similar inequalities are also determined.

J. M. Restrepo and G. K. Leaf, "Wavelet-Galerkin Discretization of Hyperbolic Equations," J. of Comp. Phys. 122 (1995), pp. 118-128. Also Preprint MCS-P448-0694.	The relative merits of the wavelet-Galerkin solution of hyperbolic partial differential equations, typical of geophysical problems, are quantitatively and qualitatively compared with traditional finite difference and Fourier-pseudo-spectral methods. The wavelet-Galerkin solution presented here is found to be a viable alternative to the two conventional techniques.

M. K. Kwong and P. T. P. Tang, "W-Matrices, Nonorthogonal Multiresolution Analysis, and Finite Signals of Arbitrary Length," MCS-P449-0794.	This paper proposes a new class of discrete transforms. The transforms treat the endpoints of a signal in a different manner from that of conventional techniques. This new approach enables one to efficiently handle signals of any length; thus, one is not restricted to work with signal or image sizes that are multiples of a power of 2. Moreover, the method does not lengthen the output signal and does not require an additional bookkeeping vector.

C. Bischof and X. Sun, "On Orthogonal Block Elimination," MCS-P450-0794.	This paper considers a block elimination problem using a basis kernel representation as the main tool. The approach presented holds great promise for more efficient strategies to compute sparse orthogonal factorizations. Moreover, since the canonical basis and kernel can be computed faster than the compact WY accumulation procedure, and in a much more block-oriented fashion, the approach should be advantageous in cache-based systems or parallel processors.

A. Bouaricha, "STENMIN: A Software Package for Large, Sparse Unconstrained Optimization Using Tensor Methods," MCS-P451-0794.	This paper describes a new package for minimizing an unconstrained nonlinear function where the Hessian is large and sparse. The software allows the user to select between a tensor method and a standard method based on a quadratic model. The package uses an entirely new way of minimizing the tensor model that makes it suitable for solving large, sparse optimization problems efficiently. Test results indicate that in general the tensor method is significantly more efficient and more reliable than the standard Newton method for solving large, sparse unconstrained optimization problems.

A. Bouaricha, "Tensor Methods for Large, Sparse Unconstrained Optimization," MCS-P452-0794.	This paper extends traditional tensor methods to large, sparse, unconstrained optimization problems. The extension requires an entirely new way of solving the tensor model that makes the methods suitable for solving large, sparse optimization problems efficiently. Test results are presented for sets of problems where the Hessian at the minimizer is nonsingular and where it is singular. The results show that tensor methods are significantly more efficient and more reliable than standard methods based on Newton's method.

B. Avalani, A. Choudhary, I. Foster, R. Krishnaiyer, and M. Xu, "A Data Transfer Library for Communication Data-Parallel Tasks," MCS-P453-0794.	This paper describes a communicating data-parallel tasks (CDT) library being developed for constructing applications of this sort. The paper outlines the techniques used to implement the library, as well as a range of data transfer strategies and several algorithms based on these strategies. Performance results for several algorithms are presented. The CDT library is being used as a compiler target for an HPF compiler augmented with Fortran M extensions.

Z. Wu, "A Subgradient Algorithm for Nonlinear Integer Programming," MCS-P454-0794.	This paper describes a subgradient approach to nonlinear integer programming and, in particular, nonlinear 0-1 integer programming. In this approach, the objective function for a nonlinear integer program is considered as a nonsmooth function over the integer points. The subgradient and the supporting plane for the function are defined, and a necessary and sufficient condition for the optimal solution is established, based on the theory of nonsmooth analysis. A new algorithm, called the subgradient algorithm, is developed that searches for a solution iteratively among the integer points. A solution is found when the optimality condition is satisfied or an iterate is repeated. In either case, the algorithm terminates in finite steps. The theory and the algorithm are presented, as well as test results for a small set of problems.

C. Zhu, "Solving Large-Scale Minimax Problems with the Primal-Dual Steepest Descent Algorithm," MCS-P455-0794.	This paper shows that the primal-dual steepest descent algorithm developed by Zhu and Rockafellar for large-scale extended linear-quadratic programming can be used in solving constrained minimax problems related to a general saddle function. It is proved that the algorithm converges linearly from the very beginning of the iteration if the related saddle function is strongly convex-concave uniformly and the cross elements between the convex part and the concave part of the variables in its Hessian are bounded on the feasible region. Better bounds for the asymptotic rates of convergence are also obtained.

C. Zhu, "On the Primal-Dual Steepest Descent Algorithm for Extended Linear-Quadratic Programming," SIAM J. on Optimization 5, no. 1 (February 1995), pp. 114-128. Also preprint MCS-P456-0794.	The aim of this paper is twofold. First, it proposes new variants for the primal-dual steepest descent algorithm as one in the family of primal-dual projected gradient algorithms developed by Zhu and Rockafellar for large-scale extended linear-quadratic programming. Second, it provides a proof of new linear convergence results for all these variants of the algorithm, including the original version as a special case, without the additional assumptions used by Zhu and Rockafellar. For the variants with the second update scheme, a much sharper estimation for the rate of convergence is obtained as a result of the new primal-dual feedback pattern.

C. Zhu, "Asymptotic convergence Analysis of the Forward-Backward Splitting Algorithm," MCS-P457-0794.	The asymptotic convergence of the forward-backward splitting algorithm is analyzed, and two special cases are discussed. New results are presented for the case in which h = 0; the results complement those of Rockafellar and Luque on the proximal point algorithm.

D. Levine, "A Parallel Genetic Algorithm for the Set Partitioning Problem," MCS-P458-0894.	This paper describes a parallel genetic algorithm developed for the solution of the set partitioning problem--a difficult combinatorial optimization problem used by many airlines as a mathematical model for flight crew scheduling. The genetic algorithm is based on an island model where multiple independent subpopulations each run a steady-state genetic algorithm on their own subpopulation and occasionally fit strings migrate between the subpopulations. Tests on 40 real-world problems were carried on on up to 128 nodes of an IBM SP parallel computer. The performance improved as additional subpopulations were added. In two cases, high-quality integer feasible solutions were found for problems with 36,699 and 43,749 integer variables.

L. Kettunen, K. Forsman, D. Levine, and W. Gropp, "Solutions of TEAM Problems 13 and 20 Using a Volume Integral Formulation," MCS-P459-0894.	This paper presents a brief discussion of h-type volume integral formulations implemented in CFUNET/CORAL code. CFUNET/CORAL is a general-purpose code for 2D and 3D magnetostatics. Solutions of TEAM problem no. 13 are computed by using both a sequential and a parallel version the code.

L. Kettunen, K. Forsman, D. Levine, and W. Gropp, "Volume Integral Equation in Nonlinear 3D Magnetostatics," Int. J. Numer. Methods Eng. 38 (1995), pp. 2655-2675. Also Preprint MCS-P460-0894.	This paper discusses volume integral formulations in three-dimensional nonlinear magnetostatics. Integral formulations are examined in connection with Whitney's elements in order to find new approaches. A numerical algorithm based on an h-formulation is introduced. Results of demanding application problems are shown. In addition, the parallelized version the code is discussed, along with numerical results illustrating the advantages of combining integral formulations with concurrent computing.

L. Wos, "The Problem of Hyperparamodulation and Nuclei," MCS-P461-0894.	This paper discusses an open research problem in automated reasoning. Specifically, one is asked to establish criteria for allowing certain--but not all--new clauses to become nuclei when using the inference rule hyperparamodulation or hyperresolution.

M. K. Kwong, "MATLAB Implementation of W-Matrix Multiresolution Analyses," MCS-P462-0894.	This paper presents a MATLAB toolbox on multiresolution analysis based on the W-transform introduced by Kwong and Tang. The toolbox contains basic commands to perform forward and inverse transforms on finite 1D and 2D signals of arbitrary length, to perform multiresolution analysis of given signals to a specified number of levels, to visualize the wavelet decomposition, and to do compression. Examples of numerical experiments are also discussed.

A. Bouaricha and R. B. Schnabel, "TENSOLVE: A Software Package for Solving Systems of Nonlinear Equations and Nonlinear Least Squares Problems Using Tensor Methods," MCS-P463-0894.	This paper describes a modular software package for solving systems of nonlinear equations and nonlinear least squares problems, using a new class of methods called tensor methods. It is intended for small to medium-sized problems, say with up to 100 equations and unknowns, in cases where it is reasonable to calculate the Jacobian matrix or approximate it by finite differences at each iteration. The software allows the user to select between a tensor method and a standard method based on a linear model. Moreover, the software proves two different global strategies, a line search and a 2D trust region approach. Test results indicate that, in general, tensor methods are significantly more efficient and robust than standard methods on small and medium-sized problems in iterations and function evaluations.

M. T. Jones and P. E. Plassmann, "Software for the Generalized Eigenproblem on Distributed-Memory Architectures," Proc. of the Cornelius Lanczos International Centenary Conference, R. P. M. Chu, D. Brown, and D. Ellison, eds., SIAM Publications, 1994, pp. 322-325. Also Preprint MCS-P464-0894.	The generalized eigenproblem is of significant importance in several fields. Generalized eigenproblems can be very large, with matrices of order greater than one million for problems arising from three-dimensional finite element models. To solve such problems, we are proposing a flexible software system for parallel distributed-memory architectures. This software is based on the Lanczos algorithms with a shift-and-invert transformation. This paper briefly describes the prototype version the software, presents computational results, and indicates the status of the project.

M. T. Jones and P. E. Plassmann, "Computational Results for Parallel Unstructured Mesh Computations," Computing Systems in Engineering 5, no. 4-6 (1994), pp. 297-309. Also preprint MCS-P466-0894.	The majority of finite element models in structural engineering are composed of unstructured meshes. These unstructured meshes are often very large and require significant computational resources; hence they are excellent candidates for massively parallel computation. Parallel solution of the sparse matrices that arise from such meshes has been studied heavily, and many good algorithms have been developed. Unfortunately, many of the other aspects of parallel unstructured mesh computation have gone largely ignored. We present a set of algorithms that allow the entire unstructured mesh computation process to execute in parallel---including adaptive mesh refinement, equation reordering, mesh partitioning, and sparse linear system solution. We briefly describe these algorithms and state results regarding their running time and performance. We then give results from the 512-processor Intel DELTA for a large-scale structural analysis problem. These results demonstrate that the new algorithms are scalable and efficient. The algorithms are able to achieve up to 2.2 gigaflops for this unstructured mesh problem.

M. K. Kwong, "On the One-Dimensional Ginzburg-Landau BVPs,'' Differential and Integral Equations 8, no. 6 (July 1995), pp. 1395-1405. Also preprint MCS-P467.	We study the one-dimensional system of Ginzburg-Landau equations that models a thin film of superconductor subjected to a tangential magnetic field. We prove that the bifurcation curve for the symmetric problem is the graph of a continuous function of the supremum of the order parameter. We also prove the existence of a critical magnetic field. In general, there is more than one positive solution to the symmetric boundary value problem. Our numerical experiments have shown cases with three solutions. It is still an open question whether only one of these corresponds to the physical solution that minimizes the Gibbs free energy. We establish uniqueness for a related boundary value problem.

T. R. Canfield and P. B. Dobrin, "Mechanics of Blood Vessels," MCS-P468-0894.	This is a chapter to be published in the Chemical Rubber Corp. Handbook on biomechanics. The chapter is concerned with the mechanical behavior of blood vessels under static loading conditions and the methods required to analyze this behavior.

W. Gropp and E. Lusk, ``Scalable Unix Tools on Parallel Processors,'' Proc. 1994 Scalable High Performance Computing Conference, IEEE Computer Society Press, 1994, pp. 55-62. Also preprint MCS-P469-0894.	The introduction of parallel processors that run a separate copy of Unix on each process has introduced new problems in managing the user's environment. This paper discusses some generalizations of common Unix commands for managing files (e.g., ls) and processes (e.g., ps) that are convenient and scalable. These basic tools, just like their Unix counterparts, are text based. We also discuss a way to use these with a graphical user interface (GUI). some notes on the implementation are provided. Prototypes of these commands are publicly available.

L. Kettunen, K. Forsman, D. Levine, and W. Gropp, "Solutions of TEAM Problem #13 Using Integral Equations in a Sequential and Parallel Computing Environment," MCS-P470-0994.	Solutions for TEAM benchmark problems 13 and 20, obtained with an h-type volume integral formulation, are presented. Results computed with an increasing number of unknowns are shown in order to study the convergence of the numerical calculations. Some theoretical questions and aspects of parallelism are also highlighted.

C. Bischof, "Automatic Differentiation, Tangent Linear Models, and (Pseudo)Adjoints," MCS-P472-1094.	This paper provides a brief introduction to automatic differentiation and relates it to the tangent linear model and adjoint approaches commonly used in meteorology. After a brief review of the forward and reverse mode of automatic differentiation, the ADIFOR automatic differentiation tool is introduced, and initial results of a sensitivity-enhanced version of the MM5 PSU/NCAR mesoscale weather model are presented. We also present a novel approach to the computation of gradients that uses a reverse mode approach at the time loop level and a forward mode approach at every time step. The resulting "pseudoadjoint" shares the characteristic of an adjoint code that the ratio of gradient to function evaluation does not depend on the number of independent variables. In contrast to a true adjoint approach, however, the nonlinearity of the model plays no role in the complexity of the derivative code.

A. Bouaricha and R. B. Schnabel, "Tensor Methods for Large Sparse Systems of Nonlinear Equations," MCS-P473-1094.	This paper introduces tensor methods for solving large sparse systems of nonlinear equations. Tensor methods for nonlinear equations were developed in the context of solving small to medium-sized dense problems. They base each iteration on a quadratic model of the nonlinear equations, where the second-order term is selected so that the model requires no more derivative or function information per iteration than standard linear model-based methods, and hardly more storage or arithmetic operations per iteration. Computational experiments on small to medium-sized problems have shown tensor methods to be considerably more efficient than standard Newton-based methods, with a particularly large advantage on singular problems. This paper considers the extension of this approach to solve large sparse problems. The key issue that must be considered is how to make efficient use of sparsity in forming and solving the tensor model problem at each iteration. Accomplishing this turns out to require an entirely new way of solving the tensor model that successfully exploits the sparsity of the Jacobian, whether the Jacobian is nonsingular or singular. We develop such an approach and, based upon it, an efficient tensor method for solving large sparse systems of nonlinear equations. Test results indicate that this tensor method is significantly more efficient and robust than an efficient sparse Newton-based method, in terms of iterations, function evaluations, and execution time.

P. Ciarlet, Jr., F. Lamour, and B. F. Smith, "On the Influence of Partitioning Schemes on the Efficiency of Overlapping Domain Decomposition Methods," MCS-P474-1094 (November 1994).	One level overlapping Schwarz domain decomposition preconditioners can be viewed as a generalization of block Jacobi preconditioning. The effect of the number of blocks and the amount of overlapping between blocks on the convergence rate is well understood. This paper considers the related issue of the effect of the scheme used to partition the matrix into blocks on the convergence rate of the preconditioned iterative method. Numerical results for Laplace and linear elasticity problems in two and three dimensions are presented. The tentative conclusion is that using overlap tends to decrease the differences between the rates of convergence for different partitioning schemes.

K. Forsman, W. Gropp, L. Kettunen, and D. Levine, "Computational Electromagnetics and Parallel Dense Matrix Computations," MCS-P475-1094.	We present computational results using CORAL, a parallel, three-dimensional, nonlinear magnetostatic code based on a volume integral equation formulation. A key feature of CORAL is the ability to solve, in parallel, the large, dense systems of linear equations that are inherent in the use of integral equation methods. Using the Chameleon and PSLES libraries ensures portability and access to the latest linear algebra solution technology.

W. D. Gropp, H. G. Kaper, G. K. Leaf, D. M. Levine, M. Palumbo, and V. M. Vinokur, "Numerical Simulation of Vortex Dynamics in Type-II Superconductors," J. of Computational Physics 123<\/b> (1996), pp. 254-266. Also Preprint MCS-P476-1094.	This article describes the results of several numerical simulations of vortex dynamics in type-II superconductors. The underlying mathematical model is the time-dependent Ginzburg-Landau model. The simulations concern vortex penetration in the presence of twin boundaries, interface patterns between regions of opposite vortex orientation, and magnetic-flux entry patterns in superconducting samples.

J. L. Tilson, "Massively Parallel Self-Consistent-Field Calculations," Preprint MCS-P477-1194.	The advent of supercomputers with many computational nodes each with its own independent memory makes possible extremely fast computations. Our work, as part of the U.S. High Performance Computing and Communications Program (HPCCP), is focused on the development of electronic structure techniques for the solution of Grand Challenge-size molecules containing hundreds of atoms. Our efforts have resulted in a fully scalable Direct-SCF program that is portable and efficient. This code, named NWCHEM, is built around a distributed-data model. This distributed data is managed by a software package called Global Arrays developed within the HPCCP. We present performance results for Direct-SCF calculations of interest to the consortium.

L. Wos, "Searching for Circles of Pure Proofs," Journal of Automated Reasoning 15, pp. 279-315, 1995. Also Preprint MCS-P479-1094.	When given a set of properties or conditions (say, three) that are claimed to be equivalent, the claim can be verified by supplying what we call a circle of proofs. In the case in point, one proves the second property or condition from the first, the third from the second, and the first from the third. If the proof that 1 implies 2 does not rely on 3, then we say that the proof is pure with respect to 3, or simply say the proof is pure. If one can remember the three properties or conditions in such a way that one can find a circle of three pure proofs---technically, each proof pure with respect the condition that is neither the hypothesis nor the conclusion---then we say that a circle of pure proofs has been found. Here we study the specific question of the existence of a circle of pure proofs for the thirteen shortest single axioms for equivalential calculus, subject to the requirement that condensed detachment be used as the rule of inference. For an indication of the difficulty of answering the question, we note that a single application of condensed detachment to the (shortest single) axiom known as P4 (also known as UM) with itself yields the (shortest single) axiom P5 (also known as XGF), and two applications of condensed detachment beginning with P5 as hypothesis yields P4. Therefore, except for P5, one cannot find a pure proof of any of the twelve shortest single axioms when using P4 as hypothesis or axiom, for the first application of condensed detachment must focus on two copies of P4, which results in the deduction of P5, forcing P5 to be present in all proofs that use P4 as the only axiom. Further, the close proximity in the proof sense of P4 when using as the only axiom P5 threatens to make impossible the discovery of a circle of pure proofs for the entire set of thirteen shortest single axioms. Perhaps more important than our study of pure proofs, and of a more general nature, we also present the methodology used to answer the cited specific question, a methodology that relies on various strategies and features offered by W. McCune's automated reasoning program OTTER. The strategies and features of OTTER we discuss here offer researchers the needed power to answer deep questions and solve difficult problems. We close this article with some challenges and some topics for research and with a sample input file and some proofs for study.

V. E. Taylor, R. L. Stevens, and K. E. Arnold, "Parallel Molecular Dynamics: Communication Requirements for Massively Parallel Machines," MCS-P480-1194.	Molecular mechanics and dynamics are becoming widely used to perform simulations of molecular systems from large-scale computations of materials to the design and modeling of drug compounds. In this paper we address two major issues: a good decomposition method that can take advantage of future massively parallel processing systems for modest-sized problems in the range of 50,000 atoms and the communication requirements needed to achieve 30 to 40% efficiency of MPPs. We analyzed a scalable benchmark molecular dynamics program executing on the Intel Touchstone Delta parallelized with an interaction decomposition method. Using a validated analytical performance model of the code, we determined that for an MPP with a four-dimensional mesh topology and 400 MHz processors the communication startup time must be at most 30 clock cycles and the network bandwidth must be at least 2.3 GB/s. This configuration results in 30 to 40% efficiency of the MPP for a problem with 50,000 atoms executing on 50,000 processors.

C. Bischof, A. Carle, P. Khademi, and A. Mauer, "The ADIFOR 2.0 System for the Automatic Differentiation of Fortran 77 Programs," Preprint MCS-P481-1194.	Automatic Differentiation is a technique for augmenting computer programs with statements for the computation of derivatives based on the chain rule of differential calculus. The ADIFOR 2.0 system provides automatic differentiation of Fortran 77 programs for first-order derivatives. The ADIFOR 2.0 system consists of three main components: The ADIFOR 2.0 preprocessor, the ADIntrinsics Fortran 77 exception-handling system, and the SparsLinC library. The combination of these tools provides the ability to deal with arbitrary Fortran 77 syntax, to handle codes containing single- and double-precision real- or complex-valued data, to fully support and easily customize the translation of Fortran 77 intrinsics, and to transparently exploit sparsity in derivative computations. ADIFOR 2.0 has been successfully applied to a 60,000-line code, which we believe to be a new record in automatic differentiation.

A. Bouaricha, "Tensor-Krylov Methods for Large Nonlinear Equations," Computational Optimization and Applications 5 (1996), pp. 207-232. Also Preprint MCS-P482-1194.	In this paper, we describe tensor methods for large systems of nonlinear equations based on Krylov subspace techniques for approximately solving the linear systems that are required in each tensor iteration. We refer to a method in this class as a tensor-Krylov algorithm. We describe comparative testing for a tensor-Krylov implementation versus an analogous implementation based on a Newton-Krylov method. The test results show that tensor-Krylov methods are much more efficient and robust than Newton-Krylov methods on hard nonlinear equations problems.

J. Michalakes, T. Canfield, R. Nanjundiah, S. Hammond, and G. Grell, "Parallel Implementation, Validation, and Performance of MM5," MCS-P483-1194.	We describe a parallel implementation of the nonhydrostatic version of the Penn State/NCAR Mesoscale Model, MM5, that includes nesting capabilities. This version of the model can run on many different massively parallel computers (including a cluster of workstations). The model has been implemented and run on the IBM SP and Intel multiprocessors using a columnwise decomposition that supports irregularly shaped allocations of the problem to processors. This strategy will facilitate dynamic load balancing for improved parallel efficiency and promotes a modular design that simplifies the nesting problem. All data communication for finite differencing, inter-domain exchange of data, and I/O is encapsulated within a parallel library, RSL. Hence, there are no sends or receives in the parallel model itself. The library is generalizable to other, similar finite difference approximation codes. The code is validated by comparing the rate of growth in error between the sequential and parallel models with the error growth rate when the sequential model input is perturbed to simulate floating point rounding error. Series of runs on increasing numbers of parallel processors demonstrate that the parallel implementation is efficient and scalable to large numbers of processors.

Christian H. Bischof, Alan Carle, Peyvand M. Khademi, and Gordon Pusch, "Automatic Differentiation: Obtaining Fast and Reliable Derivatives---Fast," MCS-P484-1194.	In this paper, we introduce automatic differentiation as a method for computing derivatives of large computer codes. After a brief discussion of methods of differentiating codes, we review automatic differentiation and introduce the ADIFOR automatic differentiation tool. We highlight some applications of ADIFOR to large industrial and scientific codes, and discuss the effectiveness and performance of our approach. Finally, we discuss sparsity in automatic differentiation and introduce the SparsLinC library.

F. Jarre and S. Wright, "On the Role of the Objective Function in Barrier Methods," MCS-P485-1294. DVI version.	To simplify the analysis of interior-point methods, one commonly formulates the problem so that the objective function is linear, by introducing a single extra variable if necessary. Here we show that a linear objective function makes the Newton direction for a barrier function a useful search direction if the current iterate is sufficiently close to the central path. Hence, there are two advantages to using a linear objective and staying close to the central path. First, the Newton direction (which coincides with the affine scaling direction on the central path) gives a very accurate approximation to the direction to the minimum. Second, a long step along the Newton direction is possible without violating the inequality constraints.

K. W. Dritz, "Random Number Generation in Ada 9X," MCS-P486-1194. HTML version.	This paper presents an overview of the random number generation features of Ada 9X.

1994 Abstracts of MCS Reports and Preprints

Reports

Preprints

Last updated on January 23, 2004

Disclaimer
Security/Privacy Notice

webmaster@mcs.anl.gov

1994 Abstracts of MCS Reports and Preprints

Reports

Preprints

Last updated on January 23, 2004

Disclaimer Security/Privacy Notice

webmaster@mcs.anl.gov

Disclaimer
Security/Privacy Notice