U.S. DEPARTMENT OF COMMERCE
Technology Admistration
National Institute of Standards and Technology
Computer Systems Laboratory
Gaithersburg, MD 20899
March 1993
This document provides guidance on software error analysis. Software error analysis includes error detection, analysis, and resolution. Error detection techniques considered in the study are those used in software development, software quality assurance, and software verification, validation and testing activities. These techniques are those frequently cited in technical literature and software engineering standards or those representing new approaches to support error detection. The study includes statistical process control techniques and relates them to their use as a software quality assurance technique for both product and process improvement. Finally, the report describes several software reliability models.
KEYWORDSData Collection; Error Detection; Error Removal; High Integrity Software; Metrics; Software Error Analysis; Software Quality Assurance; Software Verification and Validation; Statistical Process Control
EXECUTIVE SUMMARYThe main purpose of this document is to provide the software engineering community with current information regarding error analysis for software, which will assist them to do the following:
This document provides guidance on software error analysis. Error analysis
includes the activities of detecting errors, of recording errors singly and across
projects, and of analyzing single errors and error data collectively. The
purpose of error analysis is to provide assurance of the quality of high
integrity software.
The software industry is currently still young, without sufficient knowledge
and adequate standards to guarantee fault-free software. Although research
continues to identify better processes for error prevention, with current
practices, errors will likely be entered into the software some time during
development and maintenance. Hence, there is the need for error analysis, to
aid in detecting, analyzing, and removing the errors.
The main purpose of this study is to provide the software engineering
community with current information regarding error analysis, which will
assist them to do the following:
Section 2 discusses how error detection and analysis techniques can be used to
improve the quality of software. Section 3 provides a global description of the
principal detection techniques used in each software lifecycle phase and cost
benefits for selected categories of these techniques. Section 4 provides
guidance on collecting individual error data and removing single errors.
Section 5 describes techniques for the collection and analysis of sets of error
data, including statistical process control techniques and software reliability
models. Section 6 provides a summary and recommendations based on this
study of error analysis, and Section 7 provides a list of references. Appendix
A contains detailed descriptions of common error detection techniques.
Appendix B contains the results of a study of standards for high
integrity software to determine the extent of coverage of error analysis
techniques.
The error detection techniques and statistical techniques described in this
report are a representative sampling of the most widely-used techniques and
those most frequently referenced in standards, guidelines and technical
literature. This report also describes the more common software reliability
estimation models, most which are described in the American Institute of
Aeronautics and Astronautics (AIAA) draft handbook for software reliability
[AIAA]. Inclusion of any technique in this report does not indicate
endorsement by the National Institute of Standards and Technology (NIST).
Definitions of the following key terms used in this report are based on those in
[IEEEGLOSS], [JURAN], [FLORAC], [SQE], [SHOOMAN], and [NIST204].
However, this report does not attempt to differentiate between "defect,"
"error," and "fault," since use of these terms within the software community
varies (even among standards addressing these terms). Rather, this report
uses those terms in a way which is consistent with the definitions given below,
and with other references from which information was extracted.
anomaly. Any condition which departs from the expected. This
expectation can come from documentation (e.g., requirements specifications,
design documents, user documents) or from perceptions or experiences.
Note: An anomaly is not necessarily a problem in the software, but a
deviation from the expected, so that errors, defects, faults, and failures are
considered anomalies.
computed measure. A measure that is calculated from primitive
measures.
defect. Any state of unfitness for use, or nonconformance to
specification.
error. (1) The difference between a computed, observed, or
measured value and the true, specified, or theoretically correct value or
condition. (2) An incorrect step, process, or data definition. Often called a
bug. (3) An incorrect result. (4) A human action that produces an
incorrect result.
Note: One distinction assigns definition (1) to error, definition (2) to
fault, definition (3) to failure, and definition (4) to
mistake.
error analysis. The use of techniques to detect errors, to
estimate/predict the number of errors, and to analyze error data both singly
and collectively.
fault. An incorrect step, process, or data definition in a computer
program. See also: error.
failure. Discrepancy between the external results of a program's
operation and the software product requirements. A software failure is
evidence of the existence of a fault in the software.
high integrity software. Software that must and can be trusted to
work dependably in some critical function, and whose failure to do so may
have catastrophic results, such as serious injury, loss of life or property,
business failure or breach of security. Examples: nuclear safety systems,
medical devices, electronic banking, air traffic control, automated
manufacturing, and military systems.
primitive measure. A measure obtained by direct observation, often
through a simple count (e.g., number of errors in a module).
primitive metric. A metric whose value is directly measurable or
countable.
measure. The numerical value obtained by either direct or indirect
measurement; may also be the input, output, or value of a metric.
metric. The definition, algorithm or mathematical function used to
make a quantitative assessment of product or process.
problem. Often used interchangeably with anomaly,
although problem has a more negative connotation, and implies that
an error, fault, failure or defect does exist.
process. Any specific combination of machines, tools, methods,
materials and/or people employed to attain specific qualities in a product or
service.
reliability (of software). The probability that a given software system
operates for some time period, without system failure due to a software fault,
on the machine for which it was designed, given that it is used within design
limits.
statistical process control. The application of statistical techniques
for measuring, analyzing, and controlling the variation in processes.
Software error analysis includes the techniques used to locate, analyze, and
estimate errors and data relating to errors. It includes the use of error
detection techniques, analysis of single errors, data collection, metrics,
statistical process control techniques, error prediction models, and reliability
models.
Error detection techniques are techniques of software development, software
quality assurance (SQA), software verification, validation and testing used to
locate anomalies in software products. Once an anomaly is detected, analysis
is performed to determine if the anomaly is an actual error, and if so, to
identify precisely the nature and cause of the error so that it can be properly
resolved. Often, emphasis is placed only on resolving the single error.
However, the single error could be representative of other similar errors
which originated from the same incorrect assumptions, or it could indicate the
presence of serious problems in the development process. Correcting only the
single error and not addressing underlying problems may cause further
complications later in the lifecycle.
Thorough error analysis includes the collection of error data, which enables
the use of metrics and statistical process control (SPC) techniques. Metrics
are used to assess a product or process directly, while SPC techniques are
used to locate major problems in the development process and product by
observing trends. Error data can be collected over the entire project and
stored in an organizational database, for use with the current project or
future projects. As an example, SPC techniques may reveal that a large
number of errors are related to design, and after further investigation, it is
discovered that many designers are making similar errors. It may then be
concluded that the design methodology is inappropriate for the particular
application, or that designers have not been adequately trained. Proper
adjustments can then be made to the development process, which are
beneficial not only to the current project, but to future projects.
The collection of error data also supports the use of reliability models to
estimate the probability that a system will operate without failures in a
specified environment for a given amount of time. A vendor
The error data collected by a vendor may be useful to auditors. Auditors
could request that vendors submit error data, but with the understanding that
confidentiality will be maintained and that recriminations will not be made.
Data collected from vendors could be used by the auditors to establish a
database, providing a baseline for comparison when performing evaluations
of high integrity software. Data from past projects would provide guidance to
auditors on what to look for, by identifying common types of errors, or other
features related to errors. For example, it could be determined whether the
error rates of the project under evaluation are within acceptable bounds,
compared with those of past projects.
Ideally, software development processes should be so advanced that no errors
will enter a software system during development. Current practices can only
help to reduce the number of errors, not prevent all errors. However, even if
the best practices were available, it would be risky to assume that no errors
enter a system, especially if it is a system requiring high integrity.
The use of error analysis allows for early error detection and correction.
When an error made early in the lifecycle goes undetected, problems and costs
can accrue rapidly. An incorrectly stated requirement may lead to incorrect
assumptions in the design, which in turn cause subsequent errors in the code.
It may be difficult to catch all errors during testing, since exhaustive testing,
which is testing of the software under all circumstances with all possible input
sets, is not possible [MYERS]. Therefore, even a critical error may remain
undetected and be delivered along with the final product. This undetected
error may subsequently cause a system failure, which results in costs not only
to fix the error, but also for the system failure itself (e.g., plant shutdown, loss
of life).
Sometimes the cost of fixing an error may affect a decision not to fix an error.
This is particularly true if the error is found late in the lifecycle. For example,
when an error has caused a failure during system test and the location of the
error is found to be in the requirements or design, correcting that error can be
expensive. Sometimes the error is allowed to remain and the fix deferred until
the next version of the software. Persons responsible for these decisions may
justify them simply on the basis of cost or on an analysis which shows that the
error, even when exposed, will not cause a critical failure. Decision makers
must have confidence in the analyses used to identify the impact of the error,
especially for software used in high integrity systems.
A strategy for avoiding the high costs of fixing errors late in the lifecycle is to
prevent the situation from occurring altogether, by detecting and correcting
errors as early as possible. Studies have shown that it is much more expensive
to correct software requirements deficiencies late in the development effort
than it is to have correct requirements from the beginning [STSC]. In fact,
the cost to correct a defect found late in the lifecycle may be more than
one hundred times the cost to detect and correct the problem when the defect
was born [DEMMY]. In addition to the lower cost of fixing individual errors,
another cost benefit of performing error analysis early in development is that
the error propagation rate will be lower, resulting in fewer errors to correct in
later phases. Thus, while error analysis at all phases is important, there is no
better time, in terms of cost benefit, to conduct error analysis than during the
software requirements phase.
Planning for error analysis should be part of the process of planning the
software system, along with system hazard analysis
The results of hazard analysis and criticality analysis can be used to build an
effective error analysis strategy. They aid in choosing the most appropriate
techniques to detect errors during the lifecycle (see sec. 3). They also aid in
the planning of the error removal process (i.e., the removal of individual
errors, as described in sec. 4). Lastly, they aid in the selection of metrics,
statistical process control techniques, and software reliability estimation
techniques, which are described in section 5. Error analysis efforts and
resources can be concentrated in critical program areas. Error analysis
techniques should be chosen according to which type of errors they are best at
locating. The selection of techniques should take into account the error
profile and the characteristics of the development methodology. No project
can afford to apply every technique, and no technique guarantees that every
error will be caught. Instead, the most appropriate combination of techniques
should be chosen to enable detection of as many errors as possible in the
earlier phases.
Software development and maintenance involves many processes resulting in
a variety of products collectively essential to the operational software. These
products include the statement of the software requirements, software design
descriptions, code (source, object), test documentation, user manuals, project
plans, documentation of software quality assurance activities, installation
manuals, and maintenance manuals. These products will probably contain at
least some errors. The techniques described in this section can help to detect
these errors. While not all products are necessarily delivered to the customer
or provided to a regulatory agency for review, the customer or regulatory
agency should have assurance that the products contain no errors, contain no
more than an agreed upon level of estimated errors, or contain no errors of a
certain type.
This section of the report identifies classes of error detection techniques,
provides brief descriptions of these techniques for each phase of the lifecycle,
and discusses the benefits for certain categories of these techniques. Detailed
descriptions of selected techniques appear in Appendix A. Detailed checklists
provided in [NISTIR] identify typical problems that error detection
techniques may uncover.
Error detection techniques may be performed by any organization responsible
for developing and assuring the quality of the product. In this report, the
term "developer" is used to refer to developers, maintainers, software quality
assurance personnel, independent software verification and validation
personnel, or others who perform error detection techniques.
Error detection techniques generally fall into three main categories of analytic
activities: static analysis, dynamic analysis, and formal analysis. Static
analysis is "the analysis of requirements, design, code, or other items either
manually or automatically, without executing the subject of the analysis to
determine its lexical and syntactic properties as opposed to its behavioral
properties" [CLARK]. This type of technique is used to examine items at all
phases of development. Examples of static analysis techniques include
inspections, reviews, code reading, algorithm analysis, and tracing. Other
examples include graphical techniques such as control flow analysis, and finite
state machines, which are often used with automated tools. Traditionally,
static analysis techniques are applied to the software requirements, design,
and code, but they may also be applied to test documentation, particularly test
cases, to verify traceability to the software requirements and adequacy with
respect to test requirements [WALLACE].
Dynamic analysis techniques involve the execution of a product and analysis
of its response to sets of input data to determine its validity and to detect
errors. The behavioral properties of the program are also observed. The
most common type of dynamic analysis technique is testing. Testing of
software is usually conducted on individual components (e.g., subroutines,
modules) as they are developed, on software subsystems when they are
integrated with one another or with other system components, and on the
complete system. Another type of testing is acceptance testing, often
conducted at the customer's site, but before the product is accepted by the
customer. Other examples of dynamic analyses include simulation, sizing and
timing analysis, and prototyping, which may be applied throughout the
lifecycle.
Formal methods involve rigorous mathematical techniques to specify or
analyze the software requirements specification, design, or code. Formal
methods can be used as an error detection technique. One method is to write
the software requirements in a formal specification language (e.g., VDM, Z),
and then verify the requirements using a formal verification (analysis)
technique, such as proof of correctness. Another method is to use a formal
requirements specification language and then execute the specification with an
automated tool. This animation of the specification provides the opportunity
to examine the potential behavior of a system without completely developing a
system first.
Criteria for selection of techniques for this report include the amount of information available on them, their citation in standards and guidelines, and their recent appearance in research articles and technical conferences. Other techniques exist, but are not included in this report. Table 3-1 provide a mapping of the error detection techniques described in Appendix A to software lifecycle phases. In these tables, the headings R, D, I, T, IC, and OM represent the requirements, design, implementation, test, installation and checkout, and operation and maintenance phases, respectively. The techniques and metrics described in this report are applicable to the products and processes of these phases, regardless of the lifecycle model actually implemented (e.g., waterfall, spiral). Table B-2 in Appendix B lists which high integrity standards cite these error detection techniques.