Standard Reference Datasets

Project Proposal

Tools for Evaluating Mathematical and Statistical Software

Ronald F. Boisvert
Eric S. Lagergren
William F. Guthrie

NIST Information Technology Laboratory

Summary

This is a joint project of the NIST Applied and Computational Mathematics Division and Statistical Engineering Division. Its purpose is to improve the reliability of mathematical and statistical software products by providing reference data and computational results that enable objective evaluation of algorithms and software by developers and users.

Technical Strategy

Effective methodologies for the testing and evaluation of mathematical and statistical algorithms are absolutely essential to the development of robust, reliable software. Unfortunately, this area has received little attention by the research community, resulting in a lack of tools to aid in product validation by developers and users. One universally applied method for the testing of numerical software is to exercise it on a battery of representative problems. Typically such problems are generated randomly, which insures that a large number of test cases can be easily applied. Unfortunately, this is rarely sufficient for serious numerical software testing. Errors or, more likely, numerical difficulties, in software typically occur for highly structured problems or for those near the boundaries of applicability of the underlying algorithm. These parts of the domain are rarely sampled in random problem generation, and hence testing must also be done on problems which illustrate special behaviors. These are quite difficult to produce, and, as a result, the generation and use of structured problem sets has occurred only on an ad hoc basis. Such data sets have a wide variety of uses:

Defining the state-of-the-art, and catalyzing research by posing challenges.
Providing a baseline of performance for software developers.
Providing data for users to gain confidence in software.
Characterizing industrial-grade applications.

Unfortunately, these collections are often lost when the underlying technology is adopted by the commercial sector, leaving software developers and users without the tools to judge the capability of their products. A goal of this project is to identify, preserve, and make available such test corpora for use by numerical software researchers, developers and users.

Expectations

The approach best used in the development and dissemination of test data for mathematical and statistical software is dependent upon the particular problem domain and target audience. As a result, we will select particular focus areas that characterize different domains in order to learn more about their differing requirements. Within these areas we intend to develop expertise, perform relevant research, and provide highly visible services using modern communications media such as the World Wide Web. The following focus areas have been selected.

Linear Algebraic Systems.
This task is primarily directed at supporting research in algorithms and software for solution of very large linear systems and eigenvalue problems. This community is in particular need of cutting-edge industrial-grade example problems. Although some well-known test sets have been generated in the past, they are somewhat out of date and not easy to obtain and use.
Special Function Evaluation.
This is a moderately mature area with a variety of existing software libraries. It also has the property that input to its software is of small dimension. Because of this, high accuracy reference software that produces test data with guaranteed error bounds may be feasible and preferable. Recent research has also provided algorithms for new functions which could be used to extend library functionality. Early product developers need illustrations of anomalous behaviors to improve robustness of research algorithms.
Statistical Software.
The statistical software industry is a mature and stable one. As a result, use of such software is proliferating and statistical algorithms are increasingly being incorporated into traditionally non-statistical packages such as spreadsheets. The integrity of some of this software is questionable, however. As a result, both software developers and users have a great need for reference data sets with validated answers to assess the quality of such products. Areas of acute concern include linear and nonlinear regression and unbalanced ANOVA.

Collaborations

This project parallels other NIST programs in many respects. Like the calibration and Standard Reference Materials programs, it will improve measurement assurance by ensuring software reliability. Batteries of tests for software also provide a baseline for product comparison, as standard physical tests do in other domains. NIST is uniquely positioned for such tasks because of its reputation for technical expertise and objectivity.

NIST is well-known as a producer of mathematical and statistical reference data (e.g., AMS 55). For example, Wolfram Research used tabulated data from AMS 55 to verify the special functions in Mathematica. Further indication that this is an important emerging area comes from IFIP WG2.5 (International Federation on Information Processing, Working Group 2.5 (Numerical Software)), which has identified The Quality of Numerical Software: Assessment and Enhancement as the theme for their next working conference; ACMD is an invited participant. SED has been requested directly by industry (e.g., Dupont) to take leadership in providing such tools in the area of statistical software.

ACMD and SED also have a long tradition in the development of mathematical and statistical algorithms and software. As a result, the issues involved in producing numerical software are well known to us. Other research institutions with interests in numerical software testing include Boeing Computer Services, George Mason University, Los Alamos National Lab, Oak Ridge National Lab, and Rutherford Appleton Lab. We have had positive feedback from each of these.

Participants

Participants will include : R.F. Boisvert, J. Filliben, L. Gill, W. Guthrie, E. Lagergren, D. Lozier, H.-K. Liu, R. Pozo, K. Remington, J. Rogers, M. Vangel, N.-F. Zhang.

Return to SRD Homepage