QUALITY CHARACTERISTICS AND METRICS FOR REUSABLE SOFTWARE (Preliminary Report)

List of causes: distinct input conditions

List of effects: distinct output conditions or system transformations

A existing = number of ambiguities remaining

A tot = total number of ambiguities identified

The requirements are analyzed and the requirements are broken down into specific causes and effects. Each cause and each effect is assigned a unique node identifier. A Boolean graph is made by connecting the cause and effect nodes based on the semantic content of the requirements. Ambiguities are defined as any cause with no effect, and effect with no cause, and any combination of cause and effects that are inconsistent with the requirements or are impossible to achieve.

Calculation of Cause/Effect Completeness:

When no ambiguities exist, and all causes and effects are represented, then the CE measure is 100%. See [IEEE982.2] for use of this metric.

3.1.2 Correctness: number of problem reports per phase, priority, category, or cause [NIST500-209]; number of reported problems per time period [NIST500-209]; number of open real problems per time period [NIST500-209]; number of closed real problems per time period [NIST500-209]; number of unevaluated problem reports [NIST500-209]; age of open real problem reports [NIST500-209]; age of unevaluated problem reports [NIST500-209]; age of real closed problem reports [NIST500-209]; rate of error discovery [NIST500-209]

3.1.3 Reliability: RELY - Required Software Reliability [IEEE982.1]; RELY can be used to provide a reliability rating for a product through examination of the processes used to develop it. At the early planning phases of a project, RELY can be used to measure the trade-offs of cost and degree of reliability. RELY provides ratings very low, low, nominal, high, and very high. See [IEEE982.2] for a table of reliability factors for requirements and product design, detailed design, code and unit test, and integration and test.

3.2 Quality Metrics for Requirements Documentation

Because requirements documentation is usually written in human readable format, the metrics that have been defined for requirements typically require manual analysis. The data needed for many of the metrics must be gathered by hand (i.e. counting requirements). Many of the metrics are subjective in nature, and it is up to the analyst to decide what is a "good" value. For example, the analyst must determine the acceptable readability level of the documents.

Many of the metrics for quality characteristics in this section can be used for other products. The readability metrics may be applied to preliminary and software system design documents, for example.

3.2.1 Completeness

completeness metric [IEEE982.1] [AFSCP800-14]

This metric uses eighteen primitives (number of functions not satisfactorily defined, number of functions, number of functions not used, etc.). Ten derivatives are defined (ratio of functions satisfactorily defined to functions defined, ratio of defined functions used to defined functions, etc). The completeness measure (CM) is calculated as the weighted sum of the ten derivatives expressed as: (See [IEEE982.2])

requirements traceability [IEEE982.1]

Aids in identifying requirements that are missing from, or in addition to, the original requirements. R1 is number of requirements met by the architecture. R2 is the number of original requirements. The traceability measure (TM) is computed:

                                       R1
                                  TM = -- X 100%
                                       R2

deviation between planned number of System/Segment Design Document (SSDD) software requirements to be documented in the Software Requirements Specification (SRS) and actual number of SSDD software requirements completely documented in the SRS [AFP800-48]

Document Relationships [NSWC2]

Document relationship is defined as "the structure among and within the individual documents of a set of documentation, which ties the documents together into a single, unified set." There are two categories: Decompositional and Referential.

Decompositional relationship is the natural, hierarchal ranking of a set of documentation. Documents at a lower level in the hierarchy give more detail on a given subject. For completeness, there should exist a definite or obvious hierarchial ranking among the documents.

Referential relationships are the links between documents; i.e. a specific cited reference to another document or another section in the same document. Appropriate references to another documents (or within the same document) should be maintained.

3.2.2 Correctness

number of discrepancies as a result of each review [STEP]

number of conflicting requirements [IEEE982.1]

requirements compliance [IEEE982.1]

System Verification Diagrams (SVDs) are used to detect inconsistencies, incompleteness, and misinterpretations in the requirements. Requirement errors detected using the SVDs are:

N 1 = Number of errors due to inconsistencies
N 2 = Number of errors due to incompleteness
N 1 = Number of errors due to misinterpretation

requirement errors reported / total number of requirements [GPALS2]

requirement errors corrected / total number of requirements [GPALS2]

number of requirements faults and structural design faults detected during detailed design [NIST500-209]

3.2.3 Generality: size of the application domain [STARS]; Reverse engineering performed on the requirements documentation is useful in identifying common design and architectural features of existing systems. Reusable requirements specifications must define the boundaries of the problem space, as well as a set of variability descriptions. Requirements documentation can then be assessed for application as subdomains of other domain areas.

3.2.4 Understandability

size metrics
- number of requirements [NIST500-209]
this is often expressed as the number of shalls
- number document pages [AMI] [NIST500-209]
- number of document words [NIST500-209]
- number of functions [NIST500-209]
size metrics: - number of requirements [NIST500-209]; this is often expressed as the number of shalls; - number document pages [AMI] [NIST500-209]; - number of document words [NIST500-209]; - number of functions [NIST500-209]

readability metrics: - number of grammatically incorrect statements; - number of misspellings; - readability indices such as Flesch-Kincaid, Gunning's Fog Index [MURRAY]; These readability formulas are based on sentence length and polysyllable frequency. Gunning's Fog index is used to produce a grade level required for reading the document. For example, an index of 12 means that 12 years of education is required to understand the document.; - physical readability [NSWC2]; Four indicators of physical readability are given: format appropriateness, adequacy of print, format consistency, and module appropriateness.; Format appropriateness assesses the suitability of the presentation style or layout. For example, numeric data should be presented in a table or graph.; Adequacy of print includes quality of reproduction, font sizes, paper quality, and font styles. Assessment can be done by sampling document pages.; Format consistency means that tables should all have the same format and indexes should all have the same format. The same rule should be applied to Tables of Contents, chapters, sections, etc. Again, sampling is used to provide measurements.; Module appropriateness measures the suitability of the physical division of the chapters, sections, paragraphs, tables, figures, and so forth. The assessment is whether the document is physically divided in a manner consistent with the logical division of the material.

complexity

- function points [NIST500-209] [SQE] [SYMONS]

Function point analysis was developed by Alan Albrecht at IBM in the 1970's, and furthered refined by Albrecht and others. As a measurement of size, function points are language independent (as opposed to lines-of-code), and can be applied to requirements specifications. Applying to requirements allows estimates of size and complexity earlier in the life cycle.

The first part of calculating function points is determining information processing size, measured in unadjusted function points (UFPs). The application is divided into five types of components: inputs, outputs, enquiries (combinations of inputs and outputs that give immediate results), interfaces to other applications, and logical internal files. Each component is classified as simple, average, or complex, and is assigned a simple number of weighting points. The UFP count for the system is the sum of the individual UFPs.

Second, a technical complexity adjustment (TCA) is determined by estimating the degree of influence of general application characteristics, such as data communications, performance, on-line update, etc. TCA is calculated: TCA = 0.65 + 0.01 X DI

where DI is the total degree of influence.

Finally, the function point (FP) count is computed as: FP = UFP x TCA

- Mk II Information Processing Size [SYMONS]

This metric looks evaluates the system as a collection of logical transactions, each consisting of input, process, and output components. A logical transaction is defined as a unique input/process/output combination triggered by a unique event, or a need to retrieve information. Mk II Information Processing Size also uses Unadjusted Function Points (UFPs) calculated with the formula:
UFP's = WI X (no. of input data element-types) + WE X (no. of entity-types referenced) + WO X (no. of outputdata element-types)

WI = weighting factor for input data element-types

WE = weighting factor for entity-types

WO = weighting factor for output data element-types

The weighting factors can be determined by calibration.

Next, the technical complexity adjustment (TCA) is calculated: TCA = 0.65 + C X DI

DI = the total degree of influence

C = coefficient obtained by calibration

Finally, the MkII Function Points is calculated as UFP x TCA.

3.3 Quality Metrics for Design Documentation

Software design documentation is often divided into three activities: functional allocation, software system design, and unit design. These software design activities occur in three chronological phases: preliminary design, detailed design, and unit design [CARD].

The first phase is preliminary design. Designers collect related requirements into functional groups and identify dependencies among functions. The preliminary design may be represented by data flow diagrams, high-level structure charts, or a simple list of requirements by subsystems [CARD]. The choice of metrics depends on the representation used. Requirements traceability can be applied to any representation to verify that requirements are being met. Data flow complexity can be applied to data flow diagrams to evaluate the understandability of the diagrams.

Next comes detailed design, where the overall architecture of the software system is defined. This step allocates data and functions to individual units or design parts. Internal interfaces must also be specified at this stage [CARD]. A structure chart is commonly used to represent the system design. Some of the metrics applied to this type of design are data or information flow complexity, external (De) complexity, fan-in/fan-out per module, graph-theoretic complexity, and readability metrics.

The final design phase is unit design. In this phase, algorithms and data structures are defined. Application and implementation specific information is added to the design. The design itself is often represented as pseudo-code and module prologues [CARD]. Nearly all of the metrics specified below can be applied to unit design documents. Some metrics specific to unit designs are internal (Di) complexity, document lines of code, and number of states per parameter.

The unit design phase is the first indication of the reusability of individual modules. For example, the complexity of a module can be determined by its design, before any coding is done. Many of the same metrics that are typically applied to code can be applied to unit designs. Modularity can often be assessed at the unit design level. See [CARD] for references to studies on heuristics for achieving modularity. These heuristics are small modules, limited data coupling, medium span of control, and singleness of purpose. Modularity is a good indicator of the reusability of an individual software module.

3.3.1 Completeness

requirements traceability [IEEE982.1] (Applied to all designs)

For design, this metric indicates the percent of requirements that have been documented in the design. See Quality Metrics for Requirements Documentation (section 3.2) for a description of calculating this metric. See [IEEE982.2] for a complete description of this metric.

deviation between planned number of Software Requirements Specification (SRS) requirements to be documented as Computer Software Components (CSC) into the Software Design Document (SDD) and actual number of SRS requirements completely documented as CSCs in the SDD [AFP800-48] (Applied to system design)

3.3.2 Correctness

defect density [IEEE982.1] [NIST500-209] (Applied to unit design)

For design, the defect density is calculated after each design inspection of new development or large block modifications. A structured design language is assumed. See [IEEE982.2] for a description on using this metric.

Di = total number of defects found during ith design inspection

I = total number of inspections to date

KSLOD = number of source lines of design statements in thousands

Calculation of Defect Density:

Design defect density may also be measured in terms of design defects/KSLOC [SPC]. Each defect from later phases that is determined to be a design defect is counted.

number of structural (architectural) design faults detected during detailed design [NIST500-209] (Applied during unit design)

number of design faults associated with each module [NIST500- 209] (Applied to system and unit design)

number of integration test cases planned/executed involving each module [NIST500-209] (Applied to system and unit design)

number of black box test cases planned/executed per module [NIST500-209] (Applied to system and unit design)

number of design errors reported / total number of units [GPALS2] (Applied to all)

number of design errors corrected / total number of units [GPALS2] (Applied to all)

3.3.3 Generality

size of application domain [STARS] (Applied all designs)

Reverse engineering can be done on system designs to identify data structures and data management patterns to aid in separating the functionality into two categories: functionality that supports general domain concepts and functionality that achieves a computer-based solution.

Reverse engineering can be done on unit designs to identify the modularization, relationship among structural elements, declare/set/use patterns for variables, control flow within structural elements, and scoping information. The information extracted can be used to identify solution concepts and their interrelationships. Also, requirement choices can be connected with design choices and the effect of performance, timing, sizing, and functionality.

3.3.4 Efficiency

The percent metrics that follow can be applied to system and unit designs if the target capacity for CPU and I/O usage are known. Likewise, if the target random access memory (RAM) and storage capacities are known, then percent usage calculations may be made. These estimates may then be used to assess the efficiency of the design and its reusability in new environments. However, comparisons made between environments must take into account processor speeds, I/O devices, and operating system effects.

target CPU usage as percent of capacity [STEP]

target I/O usage as percent of capacity [STEP]

target upper bound storage usage [STEP]

percent actual of target upper bound storage usage [STEP]

target upper bound RAM usage [STEP]

percent actual of target upper bound RAM usage [STEP]

3.3.5 Modularity

cohesion metric [NIST500-209] [SQE] [CARD] (Applied to unit design)

Cohesion, or module strength, refers to the relationship among the elements of a module. The cohesion value for a module is assigned using a standard rating chart, that can be found in [SQE]. The best cohesion level is functional, and the worst is coincidental.

A high-strength (functional) module performs one function. A low-strength (coincidental) module includes multiple unrelated functions. Studies referenced in [CARD] find that higher strength modules tend to have lower fault rates, and tend to cost less to develop. Also, the modularity is greater for higher strength modules.

The Software Measurement Guidebook [SPC] gives a strength metric proposed by Cruickshank and Gaffney. Strength is a measure of the degree of cohesiveness of the elements of a software module.

Calculation of Strength:

X = reciprocal of the number of assignment statements in the module

Y = number of unique function outputs divided by number of unique function inputs

coupling [NIST500-209] [SQE] [CARD] (Applied to system and unit designs)

Coupling is a measure of the degree to which modules share data. Module coupling is rated using a standard rating chart, that can be found in [SQE]. Data coupling is the best type of coupling, while content coupling is the worst.

Data coupling is the sharing of data via parameter lists, while common coupling is the sharing of data via global (common) areas. Earlier recommendations stated that common coupling should be avoided. However, later studies have shown that the distribution of error rate does not depend on the coupling mechanism [CARD]. However, In terms of modularity and the modules independence of external factors, a lower coupling value is better.

A coupling metric given in [SPC] and originally proposed by Cruickshank and Gaffney is calculated as follows:

Calculation of Coupling:

Mj = sum of the number of input and output items shared between components i & j

Zi = average number of input and output items shared over m components with component i

n = number of components in the software product

3.3.6 Portability

number of features that are language-specific (Applied to unit design)

number of features that are operating system specific (Applied to unit design)

Features that are operating system, hardware, or interface specific should be isolated in modules that can then be changed for other platforms.

3.3.7 Reliability

cumulative failure profile [IEEE982.1]

For design, a graph is made of the cumulative failures in the software resulting from design deficiencies. The curve of the graph can be used to predict the reliability of the design. (Applied to unit design)

3.3.8 Understandability

Design Structure Metric [IEEE982.1]

Used to determine the simplicity of the detailed design of a software program. The values determined for the primitives can be used to identify problem areas within the software design. See [IEEE982.2] for information on using this metric. (Applied to system and unit designs)

P1 = total number of modules in the program

P2 = number of modules dependent on the input or output

P3 = number of modules dependent on prior processing (state)

P4 = number of database elements

P5 = number of non-unique database elements

P6 = number of database segments (partition of the state)

P7 = number of modules not single entrance/single exit

D1 = design organized top-down (Boolean)

D2 = module dependence (P2/P1)

D3 = module dependent on prior processing (P3/P1)

D4 = database size (P5/P4)

D5 = database compartmentalization (P6/P4)

D6 = module single entrance/single exit

Calculation of Design Structure Measure:

where Wi is the weight given to the ith derived measure

Several studies have been done to investigate the affect of module size on the modularity (and hence, reusability) of single modules. The conclusion, as given by [CARD], is that module size alone does not affect fault rate. Also, larger modules cost less per executable statement than smaller ones. These statements suggest that lines-of-design and -code are not good indicators of modularity. However, when used with other measurements (complexity, etc.), size measures can provide an indication that a module may need to be separated into smaller modules. The unit design phase is a better time to perform the separation of function, rather than during coding.

- number of design modules [NIST500-209] (Applied to system design)

- number document pages [NIST500-209] (Applied to all design)

- document lines-of-code [NIST500-209] (Applied to unit design)

- number of functions [NIST500-209] (Applied to preliminary and system designs)

- number of inputs and outputs [NIST500-209] (Applied to system and unit designs)

- number of interfaces [NIST500-209] (Applied to system and unit designs)

complexity metrics

- graph-theoretic complexity for architecture [IEEE982.1]

(Applied to system and unit designs)

There are three graph-theoretic complexity metrics:

Static complexity is used to measure the complexity of the software architecture, as represented by a network of modules, useful for design tradeoff analysis.

Generalized static complexity is used to measure of the complexity of the software architecture, as represented by a network of modules and the resources used.

Dynamic complexity is used to measure the complexity of the software architecture as represented by a network of modules during execution.

K = number of resources, indexed by k = 1,...,K

E = number of edges, indexed by i = 1,...,E

N = number of modules, indexed by j = 1,...,N

ci = complexity for program invocation and return along each edge e

rki = 1 if kth resource required for ith edge, 0 otherwise

dk = complexity for allocation of resource k

Calculation of Complexity Measures:

Static Complexity:

C = E - N + 1

Generalized Static Complexity:

Dynamic Complexity:

Dynamic complexity is calculated using the formula for static complexity at various points in time. The behavior of the measure is then used to indicate the evolution of the complexity of the software.

- number of entries/exits (fan-in/fan-out) per module [IEEE982.1] [NIST500-209] (Applied to unit design)

This metric can be used to determine the difficulty of the software architecture. It is assumed that a modular specification/design language is used. This metric can also be used to evaluate the encapsulation of the data at the design phase. If data is properly encapsulated, the number of entry and exit points for each module function will be small [IEEE982.2].

e i = number of entry points for the ith module

xi = number of exit points for the ith module

Calculation of Entries/Exits:

mi = ei +xi

- data or information flow complexity [IEEE982.1] (Applied to all designs)

This metric can be used to evaluate the information flow structure of large systems, the procedure and module information flow structure, and the complexity of interconnections between modules. This metric can also be used for code evaluations. See Quality Metrics for Code (section 3.4) for a description of the metric primitives and calculations. See [IEEE982.2] for a complete description of this metric.

- number of parameters per module [NIST500-209] (Applied to system and unit design)

- number of states or data partitions per parameter [NIST500-209] (Applied to unit design)

- decision count [CONTE] (Applied to unit design)

See Quality Metrics for Code (section 3.4) for a description of this metric. Can be used, with the above two metrics, to identify early in development modules that are potentially complex or hard to test.

- external (De) complexity [ZAGE] (Applied to all designs)

Based on information available during architecture design such as hierarchial module diagrams, dataflow, functional descriptions, and interface descriptions.

Calculation of D e: De = e1 (inflow * outflow) + e2 (fan-in * fan-out)

inflow is the number of data entities passed to the module

outflow is the number of data entities passed from the module

fan-in is the number of superordinate modules directly connected to the module

fan-out is the number of subordinate modules directly connected to the module

e1 and e2 are weighting factors for the two terms

- internal (Di) complexity [ZAGE] (Applied to unit design)

Based on information available after detailed design, including information used for D e plus the chosen algorithms and possibly pseudo-code or program-design-language representations.

Calculation of : Di =i1 (CC) + i2 (DSM) + i3 (I/O)

CC (central calls) is the number of procedure or function invocations

DSM (data-structure manipulations) is the number references to complex data types

I/O number of external device accesses

i1, i2, and i3 are weighting factors

- composite metric (D(G)) to measure design quality [ZAGE]:

D(G) = De + Di

readability metrics

- number of grammatically incorrect statements (Applied to all designs)

- number of misspellings (Applied to all designs)

- readability indices such as Flesch-Kincaid, Gunning's Fog Index [MURRAY] (Applied to preliminary and system designs)

See Quality Metrics for Requirements Documentation (Section 3.2) for a description of these indices)

3.4 Quality Metrics for Code

Much of the research into metrics has focused on code metrics. Hence, there are many different kinds of code metrics, and many variations on common metrics. In terms of reusability, useful metrics are the product metrics are used to measure the size, complexity, and readability of the source program. In order for a component to be reusable, it must be understandable by the software engineers. Also, the component should encapsulate as much implementation detail as possible. Well-defined, simple interfaces are desirable.

In assessing existing components for reusability, it is useful to examine the history of the component in actual use. Fault density, code-related problem counts, defect density, and efficiency are some of the metrics used for this assessment. The longer a component has been in actual use, the higher the confidence in the component's correctness, assuming low fault and defect counts. Also, the testability of the component is critical when reusing the software. A well-defined set of test cases aids in quickly assessing the components use in a new environment. The testability of a component is defined in part by its complexity, as well as its size.

There are many methods used to calculate lines-of-code. Two documents, [IEEE1045] and [SEI], give methods which are used to ensure consistent counting of lines-of-code.

3.4.1 Completeness

number of ambiguous references [SAC]

References to inputs, functions, and outputs should be unique. An example of an ambiguous reference is a function being called one name by one module and a different name by another module.

number of improper data references [SAC]

All data references should be properly defined, computed, or obtained from identifiable external sources.

percentage of defined functions used [SAC]

All functions defined within the software should be used.

percentage of referenced functions defined [SAC]

All functions referenced within the software should be defined. There should be no dummy functions present.

percentage of conditional processing defined [SAC]

All conditional logic and alternative processing paths for each decision point should be defined.

3.4.2 Correctness

fault density [IEEE982.1]

This metric can be used to predict remaining faults by comparison with expected fault density, and determine if sufficient testing has been completed [IEEE982.2]. A fault density is calculated for each severity level.

Calculation of Fault Density:

Fd = F / KSLOC

F = total number of unique faults found in a given time interval resulting in failures of a specified severity level

KSLOC = number of source lines of executable code and non-executable data declarations in thousands

number of code-related problems/errors reported [CONTE]

number of code-related problems fixed [CONTE]

number of program changes per time period [CONTE]

number of changed lines of code per time period [CONTE]

number of coding errors / total number of units [GPALS2]

defect density [IEEE982.1] [NIST500-209]

For code, the defect density is calculated after each code inspection of new development or large block modifications. See [IEEE982.2] for information on using this metric.

Di = total number of defects during i th code inspection

I = total number of inspections to date

KSLOC = number of source lines of executable code and non-executable data declarations in thousands

Calculation of Defect Density:

defect indices [IEEE982.1]

Defect indices provide a relative index of how correct the software is as it proceeds through the development cycle. For each phase of development, calculate index PI:

Di = Total number defects detected during the ith phase

Si = Number of serious defects found

Mi = Number of medium defects found

Ti = Number of trivial defects found

PS = Size of product at the ith phase

W1 = Weighting factor for serious defects

W2 = Weighting factor for medium defects

W2 = Weighting factor for trivial defects

Defect index (DI) is calculated at each phase by cumulatively adding the calculation of P i2 as the software proceeds through development.

Calculation of Defect Index:

3.4.3 Efficiency (of execution)

non-loop dependent statement in loops: (number of modules with non-loop dependent statement in loops) / (total number of modules) [SAC]

Practices such as calculating values treated as constants within loops should be avoided.

compound expression evaluation: (number of modules with repeated compound expression evaluation) / (total number of modules) [SAC]

Repeated compound statements should be avoided.

total number of memory overlays [SAC]

The use of memory overlays imposes processing overhead and should be avoided.

amount of non-functional executable code: (number of modules with non-functional executable code) / (total number of modules) [SAC]

The presence of non-functional executable code is an obvious inefficiency. This condition often arises during maintenance or redesign updates with incomplete removal of obsolete code.

coding of decision statements: (number of modules with inefficient decision coding) / (total number of modules) [SAC]

Decision statements should be coded for efficient execution, e.g., the most frequently exercised alternative of an IF statement should normally be specified in the THEN clause, rather than in the ELSE clause.

data grouping: (number of modules with inefficient data grouping) / (total number of modules) [SAC]

Example of inefficient data grouping: complicated nesting of pointers and indices.

initialization of variables: (number of modules with variables not initialized whendeclared) / (total number of modules) [SAC]

Efficiency is lost when variables are initialized during execution or repeatedly initialized during iterative processing.

target CPU usage as percent of capacity [STEP]

actual CPU usage as percent of capacity [STEP]

projected CPU usage as percent of capacity [STEP]

target I/O usage as percent of capacity [STEP]

actual I/O usage as percent of capacity [STEP]

projected I/O usage as percent of capacity [STEP]

Target and actual CPU and I/O usage counts are useful in determining the degree CPU and I/O usage is approaching or exceeding the maximums specified in the requirements. As software modules are reused in new environments, it is necessary to assess the impact of the resource usage in the new environment. Ideally, projected CPU and I/O usage should be specified in the design phase, while upper bounds should be specified in the requirements.

3.4.4 Efficiency (of storage)

duplicate global data definitions: (number of modules with duplicated data definitions / (total number of modules) [SAC]

This metric is used to measure the frequency in which global data items and constants (e.g., pi, acceleration of gravity) are defined more than once within a software system. Duplicate data definitions consume additional storage, so the greater the value of the measure, the lower the storage efficiency.

duplicate code: (number of modules with duplicated code) / (total number of modules) [SAC]

This metric is used to measure the percentage of modules with duplicated code. Code for commonly-used functions (e.g., vector dot product or arithmetic mean) is often duplicated and consumes additional storage. The higher the value of the measure, the lower the storage efficiency.

software requirements allocation [SAC]

This metric can be used indicate whether a storage (sizing) requirement allocation was performed during the system design phase to allocate overall sizing or storage utilization requirements to individual modules. A value of 1 means yes, 0 means no.

dynamic memory management [SAC]

Generally, the use of dynamic memory management techniques (e.g., buffer memory allocation and release as necessary) promotes efficient utilization of storage. A value of 1 means yes, 0 means no.

storage optimizer [SAC]

This metric can be used to indicate whether a storage optimizing compiler or assembler is being used. A value of 1 means yes, 0 means no.

target upper bound storage usage [STEP]

actual storage usage [STEP]

percent actual of target upper bound storage usage [STEP]

projected storage usage [STEP]

target upper bound RAM usage [STEP]

actual RAM usage [STEP]

percent actual of target upper bound RAM usage [STEP]

projected RAM usage [STEP]

The target and actual storage and RAM usage measures are useful in determining how well software components "fit" into the allocations documented in the requirements. Storage counts the use of disk space and other mass storage, while RAM counts the use of Random Access Memory. Project usage counts are useful during development to scale the usage counts to the full system based on partial system measurements. Ideally, projected CPU and I/O usage should be specified in the design phase, while upper bounds should be specified in the requirements.

3.4.5 Adaptability

expandability

- processing independent of storage: (number of modules whose size constraints are hard-coded) / (total number of modules with such size constraints) [SAC] The processing performed by a module should be independent of storage size, buffer space, array sizes, etc. Provisions for these entities should be provided dynamically, e.g., array sizes passed as parameters.

- percentage of uncommitted memory: (amount of uncommitted memory) / (total memory available) [SAC]

- percentage of uncommitted processing capacity: (amount of uncommitted processing capacity) / (total processing capacity available) [SAC]

3.4.6 Generality

multiple usage metric: (number of modules referenced by more than one module) / (total number of modules) [SAC]

A module is more general if it is referenced by more than one module, so the larger the value of this metric, the greater the generality.

mixed function metric: (number of modules that mix functions) / (total number of modules) [SAC]

A module that performs input/output as well as processing is not as general as one which only performs I/O or only performs processing. The lower the value of this metric, the greater the generality.

data volume metric: (number of modules that are data volume limited) / (total number of modules) [SAC]

A module that is designed to process only a certain number of data item inputs is not as general as one that can accept an unlimited number of inputs.

data value metric: (number of modules that are data value limited) / (total number of modules) [SAC]

A module that is designed to process only a limited range of data item values is not as general as one that is capable of processing a broader range of values.

redefinition of constants metric: (number of constants that are redefined) / (total number of constants) [SAC]

A module should not redefine a constant for the purpose of changing the function of the module, e.g., changing the base of a logarithm function from 10 to e for the purpose of providing a natural log function. Such items should be defined as parameters, not constants.

3.4.7 Maintainability

complexity

- decision count: count of IF, DO, WHILE, CASE, and other conditional and loop control statements [CONTE]

- number of I/O variables per unit [AMI]

- cyclomatic complexity [IEEE982.1] [CONTE] [MCCABE]

Cyclomatic complexity may be used to determine the structural complexity of a code module. The cyclomatic complexity is calculated in a manner similar to the static complexity of the design. The difference is that the cyclomatic complexity is calculated from a flowgraph of the module, with an edge added from the exit node to the entry node.

Calculation of Cyclomatic Complexity: v = e - n + 2

v = complexity of the graph

e = number of edges (program flows between nodes)

n = number of nodes (sequential groups of program statements)

If a strongly connected graph is constructed (one in which there is an edge between the exit node and entry node), the calculation is [IEEE982.2]:

v = e - n + 1

The cyclomatic complexity is also equivalent to the number of splitting nodes (S) in the graph plus 1: v = S + 1

(A splitting node is a node with more than one edge emanating from it.)

Because each splitting node is associated with a condition, the expression v = S + 1 can be calculated by counting the number of conditions in the source code [MCCABE]. Cyclomatic complexity can also be calculated by counting the number of regions in the graph [IEEE982.2] [MCCABE].

The cyclomatic complexity for a multi-module program is sum of the v's for the individual modules [CONTE]:

Alternatively, vprogram can be calculated as [CONTE]:

where DEi is the decision count for the ith module, which is the same as the number of conditions.

- average nesting level [CONTE]

The nesting level of a statement is defined by the location of the statement within control structures. Statements in the main flow of the module are at level one. Statements within loops, conditional clauses, etc. are at higher nesting levels. The average nesting level is calculated as follows: For each statement, determine the nesting level. Average nesting level is the sum of all the nesting levels divided by the total number of statements. A low average nesting level is an indicator of lower complexity in the logic of the module.

- executable lines of code per module [STEP]

- Software Science Metrics [IEEE982.1]

n1 = number of unique operators

n2 = number of unique operands

N1 = total number of operators

N2 = total number of operands.

program vocabulary: l = n1 + n2

observed program length: N =N1 + N2

estimated program length: � = n1 (log2 n1) + n2 (log2 n2)

Jensen's estimator of program length: NF = log2 n1! + log2 n2!.

program volume: V = L(log2l)

program difficulty: D = ( n2/2) (N2/ n2)

program level: L1 = 1/D

effort: E = V / L1

- Data or Information Flow Complexity [IEEE982.1] [CONTE]

lfi = local flows into a procedure

datain = number of data structures the procedure accesses

lfo = local flows from a procedure

dataout = number of data structures that the procedure updates

length = number of source statements in a procedure, excluding comments

Information Flow Complexity IFC = (fanin x fanout) 2

fanin = lfi + datain

fanout = lfo + dataout

Weighted IFC = length x (fanin x fanout) 2

- number of live variables [NIST500-209] [SQE] [CONTE]

A variable is live from its first to its last reference within a procedure. The average number of live variables is calculated [CONTE]:

lvi is the count of live variables in the i th executable statement

n is the total number of executable statements

The average number of live variables for a program of m modules is [CONTE]:

- variable spans [NIST500-209] [SQE] [CONTE]

Variable span is the number of statements between two successive references to the same variable. For a program that references a variable in n statements, there are n - 1 spans for that variable. Average span size is calculated as the total of the span counts divided by the total number of spans. The average span size of a program of n spans is calculated [CONTE]:

- variable scope [NIST500-209] [SQE]

Variable scope is the number of source statements between the first and last reference of a variable. With large scopes, the understandability and readability of the code is reduced.

- lines of code - total lines of code including comments

- total number of code lines [CONTE]

effort to fix bugs

- number of errors to be corrected [AMI]

- number of units that were modified [AMI]

- number of errors detected during system/integration tests [AMI]

3.4.8 Modularity

cohesion [NIST500-209] [SQE] [SPC]

See Quality Metrics for Design Documentation (section 3.3) for a description of cohesion.

coupling [NIST500-209] [SQE] [SPC]

See Quality Metrics for Design Documentation (section 3.3) for a description of coupling.

number of entries/exits per module [IEEE982.1] [NIST500-209]

It is desirable to have one entry and one exit point per major function, with exceptions for error exits. Also, the number of functions per module should be limited; a suggested maximum number of functions is five per module [IEEE982.2]. See Quality Metrics for Design Documentation (section 3.3) for the calculation of measures for this metric.

3.4.9 Portability

software independence

- number of operating systems software is compatible with [SAC]

- total number of system software utilities utilized [SAC] This metric can be used to measure the degree of dependence on system software utilities. The more usage is made of system software utilities, libraries, and operating system calls, the more dependent the system is on that particular software environment.

- common, standard subsets of language used: (number of modules utilizing non- standard constructs) / (total number of modules) [SAC] The usage of non-standard constructs or extensions of programming languages provided by particular compilers may impose difficulties in conversion of the system to new or upgraded software environments.

hardware independence

- open systems [SAC]

Are the programming languages and tools (e.g., compilers, database management systems, user interface shells) used by the implementation available on other machines? A value of 1 means yes, 0 means no.

- input/output references: (number of modules making I/O references) / (total number of modules) [SAC]

Input/output references or calls are frequently a cause of machine dependence. Minimization and localization of these references facilitates machine independence and conversion from one machine to another. [SAC]

- word/character size: (number of modules not following convention / total number of modules) [SAC]

Code that is dependent on machine word or character size should be avoided or parameterized to facilitate use on other machines.

3.4.10 Reliability

reliability models

The Jelinski-Moranda reliability model attempts to predict the time of the next fault by assuming that fault times are independent random variables with exponential distributions. One criticism of the model is that it is assumed that each fault is removed instantaneously and with certainty. Another, more serious, criticism is that the model assumes that all faults contribute equally to the unreliability of the program. The removal of a fault diminishes the rate of occurrence of failures by a fixed amount [LITTLEWOOD].

The Littlewood-Verrall model attempts to capture the uncertainty of the fixing operation; fixes are not certain to improve the reliability of the program. The rate of occurrence of failures is treated as a sequence of independent stochastically decreasing random variables. There is uncertainty about the magnitude of improvement of each fix [LITTLEWOOD].

testability

- number of independent paths [CALDIERA]

- cyclomatic complexity [IEEE982.1] [CONTE] [MCCABE]

See [MCCABE] for details on using cyclomatic complexity in determining the testability of software modules. McCabe's techniques can be used to assess the testability of software based on its complexity.

3.4.11 Understandability

- function points [NIST500-209]

- lines of code [CONTE]

- function count [CONTE]

traceability metrics

- number of comment lines per total source lines of code [GPALS2]

- percent comment lines of total lines [STEP]

- correctness of comments

complexity metrics

- See complexity metrics defined above under Maintainability.

- number of tokens [CONTE] The Halstead software science measure of observed program length can be used as a readability indicator. Observed program length N = N1 + N2, where N1 is the total number of operators, and N2 is the total number of operands. Halstead originally didn't count declaration statements, input/output statements, or statement labels. However, later researchers do count the operands in these types of statements.

readability metrics

- number of grammatically incorrect comments

- number of misspellings

- total number of characters [KHOSH]

- total number of comments [KHOSH]

- number of comment characters [KHOSH]

- number of code characters [KHOSH]

3.5 Quality Metrics for Test Documentation

Software testing metrics are used to assess the adequacy of the test procedures and test data in verifying the software code. In order to gain confidence in a software component's reusability, a comprehensive set of test cases is necessary. A direct relationship between test cases and components is necessary in order for the component to be adequately tested in a new environment. Component test cases should be traceable to the components, and should be maintained as the components are changed. Also, component test cases should be delivered with the components.

System test cases should be linked to requirements specifications, and ideally, to the domain of interest. In order for system test cases to be reusable, there must be a tie-in to a specific requirements area, and therefore, a specific application domain. System test plans may be extracted from several subdomains and regrouped to test subdomains in a new domain area.

3.5.1 Completeness

See [MCCABE] for details using cyclomatic complexity to "measure the completeness of the testing that a programmer must satisfy." Specifically, branch coverage and path coverage are verified for completeness. McCabe's technique can be used to develop a set of test cases which test every outcome of each decision, and execute a minimal number of distinct paths.

test coverage [IEEE982.1]

Functional (modular) test coverage index = FE / FT, where FE is number of software functions (modules) tested and FT is total number of software functions (modules).

Test Sufficiency Indicator [AFSCP800-14] [IEEE982.1]

This indicator is useful in assessing the sufficiency of software integration and system testing, based on the prediction of the remaining software faults. If fewer faults than expected are detected (outside the minimum tolerance limit), an adequate number of tests may not have been designed. See [AFSCP800- 14] for a complete discussion of this indicator.

PF = total number of predicted faults in the software

FP = number of faults detected before software integration testing

UI = number of units integrated

UT = total number of units in the Computer Software Configuration Item

FD = total number of faults detected to date during test

Remaining Faults FR = (PF - FP) x (UI/UT)

Maximum Tolerance MAXT = c1 x FR

Minimum Tolerance MINT = c2 x FR

Percent of remaining faults to total predicted faults where c1 and c2 are maximum and minimum tolerance coefficients

coverage metrics

- statement coverage: percentage of statements executed (to ensure that each statement has been tested at least once) [SQE]

- branch coverage: percentage of branches executed [SQE]

- path coverage: percentage of program paths executed, or the number of paths tested divided by total number of paths [SAC]

It is generally impractical and inefficient to test all paths in a program. The number of paths may be reduced by treating all possible loop iterations as one path.

- Test Coverage Indicator [AFSCP800-14] [IEEE982.1]

Data flow metrics [WEYUKER]

Data flow testing requires the selection of test data that exercise certain paths from a point in a program where a variable is defined, to points at which the variable definition is subsequently used.

- Categories for variable occurrences:

Definition: variable is given a new value

P-use: variable is used in predicate portion of a decision statement

C-use: all other variable uses, including variable occurrences in the right-hand side of an assignment statement, or an output statement

- Six data flow testing criterion are defined:

all-definitions: test data be included that causes the traversal of at least one subpath from each variable definition to some p-use or some c-use of that definition

all-c-uses: test data be included that causes the traversal of at least one path from each variable definition to every c-use of that definition

all-p-uses: test data be included that causes the traversal of at least one path from each variable definition to every p-use of that definition

all-uses: test data be included that causes the traversal of at least one subpath from each variable definition to every p-use and every c-use of that definition

all-du-paths: test data be included that causes the traversal of every simple subpath from each variable definition to every p-use and c-use of that definition

- Metrics

Percent of all-definitions covered by test scenarios

Percent of all-c-uses covered by test-scenarios

Percent of all-p-uses covered by test scenarios

Percent of all-uses covered by test scenarios

Percent of all-du-paths covered by test scenarios

percentage of defects uncovered in testing: (number of defects located by testing) / total number of system defects) [PERRY]

3.5.2 Efficiency

execution time of test cases

(number of tests required) / (number of system errors) [PERRY]

This metric shows the number of tests occurring per detected error. The smaller the ratio, the greater the test efficiency. This shows the efficiency of tests in uncovering errors.

(number of defects uncovered) / (size of system) [PERRY]

This metric assumes there is a common number of defects in an application system based upon its size.

3.5.3 Understandability