The EFFective Manager Tool

Dolores Wallace, Mark Zimmerman

DISCLAIMER: Certain trade names and company products are mentioned in the text. In no case does such mention imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the products are necessarily the best available for the purpose.



Fault and Failure Management

While we use the definitions of the terms error, fault, and failure in IEEE Std. 610.12 [1], for simplicity, usually the term fault when used alone in this paper implies either fault or failure. The EFF tool asks during what activity a fault is found - those found during unit, integration, or system test are assumed to be failures.

To be competitive, companies need to get their products out the door. To remain competitive, they need to learn from the current project and apply those lessons if possible during the current project and certainly to the next one. We will examine how collecting and analyzing data on faults and failures provides support to these objectives. We recognize that many steps and data from other processes are needed for overall process improvement; we are concerned here only with the collection and analysis of fault data.

The assignment to get the product out should translate to getting a quality product built and delivered within budget, on time. We will not belabor the definition of quality but will assume the project manager has guidelines for the acceptability of the product(1). Regardless of the definition of quality, the project manager likely must know the status of the faults and failures and resolve them.

The status of a fault may be either open or resolved, and the fault or failure may have a priority for resolution and a person assigned to its resolution. Resolution of a fault or failure may result in any of these states:

correction,

deferral of correction until another version or a later time,

correction resulting from a correction to another problem,

dismissal, that is, upon examination, no correction needed.

Typical questions that the project manager may ask:

How many faults with highest priority are open?

What is the average number of days to resolve a fault?

How many faults/ failures were discovered in a specific month (e.g., June 1997) ?

How many faults/ failures were resolved in a specific month (e.g., June 1997)?

How many faults / failures discovered month were resolved in a specific month (e.g., June 1997)?

These very general questions comprise a subset of many whose answers provide in-depth information about the project. For this subset, the data needed to answer them is basic: the date each fault is found, if it is assigned to someone for resolution, the date of its resolution and its priority for resolution. The answers assist project management in keeping a tracking file that provides a simple count of faults yet to be resolved and their priority.

However, this simple subset may not be sufficiently helpful in many cases. Other information, such as the type of unresolved fault or the activity in which the fault was discovered may enable decisions about methods for developing the software and activities to discover faults. A large number of faults relative to project size, wild schedule swings due to discovery of many faults and hence rework, and many faults not revealing themselves until they become failures in system test are among the many types of signals that a project needs some correction, but the manager may not even know these circumstances exist with out having collected the data in the first place and being able to query on the data in the second. In a situation like this, the manager needs more information about the faults and failures. We will not belabor what should have been done, e.g., possible use of inspections,  better examination of impact if not corrected. We are not providing a summary of the decisions a manager would make either, for in every project, there are differences(2). Rather we are providing an opportunity to collect and analyze data to monitor and analyze your project data to get the current project out the door. By keeping track of fault attributes, amount of effort to correct a fault or failure, numbers of days to achieve the correction, and other features, one can effectively manage progress.

When the company collects and keeps the data from a project that company will have valuable information for the next version of the product. The data provide lessons learned about methods for developing the software and for finding faults. Profiles can be generated to indicate frequency of faults found during specific activities or the frequency of types of faults being found and when they are usually found. By looking at profiles of the previous version or a similar product, project managers may be able to fend off some problems via better staff training, improved checklists that address the most frequent faults (and guidance on how to use the checklists), more accurate test scheduling.

The EFF tool is an easy to use tool that requests data about the characteristics of each fault, and then allows queries regarding the characteristics of those data. Section 3. describes how to ask questions with the EFF tool based on the data collected with the Collection component.

The EFFTool Analysis Component

The EFF tool is a capability to collect and manage fault and failure data collected during software development or maintenance processes. The Web-based tool has a simple format for recording data about faults and failures discovered and resolved during those processes. The tool provides analysis capability to sort and count information about the faults and failures according to various characteristics and for monitoring the status of open and resolved faults. The profiles derived from the analysis capability provide visibility to project managers and staff regarding priorities for resolving faults / failures, status of project as a whole, effectiveness of development or testing methods, training needs, technology needs, and other issues serving to improve current and future projects.

The user can locate information on a specific collection of faults and failures by successively narrowing characteristics about the collection. The queries are basically in two parts:

1) select fault/ failures by record number groups or by time periods, and

2) select specific features of a fault/ failure.

Then, the record numbers and status are displayed for that group of records. Additional category information may be displayed as selected.

EXAMPLES Analyze Data Menu:

Example 1:

How many faults with the highest priority are open?

[Attributes] on Analyze Data Menu

[*] Status

Mark Open.

[Return to Attribute Menu]

[*] Priority

Mark 1-urgent

[Return to Attribute Menu]

[Return to Analyze Data Menu]

[Execute Query]

The last line of Display it tells how many faults/failures are still open and in the highest priority.

Example 2:

What is the average number of days to resolve a fault

[Computations] on Analyze Data Menu

Mark box at bottom next to Show the average days open for each status

[Return to Analyze Data Menu]

[Attributes]

[*] status

Mark the 4 resolved statuses

[Return to Attribute Menu]

[return to Analyze Data Menu]

[Execute Query]

The lower table indicates the Average Number of days.

Example 3:

How many faults/failures were discovered in June 1997

[Computations] on Analyze Data Menu

Set Discovery Date After to 05/31/1997

Set Discovery Date Before to 07/1/1997

[Return to Analyze Data Menu]

[Execute Query]

The last line of display indicates how many faults/failures were discovered between those dates.

Example 4:

How many faults/failure were Resolved in June 1997

[Computations] on Analyze Data Menu

Set Date Resolved After to 05/31/1997

Set Date Resolved Before to 07/1/1997

[Return to Analyze Data Menu]

[Execute Query]

The last line of display indicates how many faults/failures were discovered between those dates.

Example 5:

How many faults/failures were discovered in June 1997 and Resolved in June 1997.

1. [Computations] on Analyze Data Menu

Set Discovery Date After to 05/31/1997

Set Discovery Date Before to 07/1/1997

Set Date Resolved After to 05/31/1997

Set Date Resolved Before to 07/1/1997

[Return to Analyze Data Menu]

[Execute Query]

The last line of display indicates how many faults/failures were discovered between those dates.

Example 6:

Of the faults/failures supplied as SAMPLE with the tool, which were discovered during requirements definition and design but resolved during any of the other activities? Of these faults/ failures show the Status, Detector and Resolver and the Average days open for each status.

[Computations] on Analyze Data Menu

Set Fault Numbers Less Than 11.

Mark box beside Shoe the average days open for each status.

[Return to Analyze Data Menu]

[Attributes]

[*] Activity when Discovered.

Mark requirements definition and design

[Return to Attribute Menu]

[*] Activity when Resolved

Mark all but Not Supplied, requirements definition, and design

[Return to Attribute Menu]

Mark Detector, and Resolver.

[Return to Analyze Data Menu]

[Execute Query]

Results from this query should be displayed on the screen.





For further information, contact Dolores Wallace, National Institute of Standards and Technology, NIST North Bldg 820, Rm 517, Gaithersburg, MD 20899; Phone (301)975-3340;Fax (301) 926-3696; or email to: Dolores.Wallace@Nist.Gov. For a descriptive paper, http://hissa.nist.gov/eff/qweff.html/

1. Note that standard profiles of faults and failures for specific application domains would assist in defining acceptability, and in bettering the "standard" to beat competition.

2. For example, in one case, maybe an inappropriate design tool for the application domain lead to timing faults. In another, not having sufficient traceability data lead to not assigning higher priority to some faults. In yet another, time was running out, inspections were bypassed, unit test was hopelessly bogged down. You answer: what can be done in these situations? On this project? On the next project?