Previous IDL Analyst Reference Guide: Correlation and Covariance Next

IMSL_COVARIANCES

Syntax | Return Value | Arguments | Keywords | Discussion | Example | Errors | Version History

The IMSL_COVARIANCES function computes the sample variance-covariance or correlation matrix.


Note
This routine requires an IDL Analyst license. For more information, contact your ITT Visual Information Solutions sales or technical support representative.

Syntax

Result = IMSL_COVARIANCES(x [, /DOUBLE] [, VAR_COVAR=value] [, CORRECTED_SSCP=value] [, CORRELATION=value] [, STDEV_CORRELATION=value] [, FREQUENCIES=array] [, INCIDENCE_MAT=variable] [, MISSING_VAL=value] [, MEANS=variable] [, NMISSING=variable] [, NOBS=variable] [, SUM_WEIGHTS=variable] [, WEIGHT=array])

Return Value

If no keywords are used, IMSL_COVARIANCES returns a two-dimensional matrix containing the sample variance-covariance matrix of the observations in which value in element (i, j) corresponds to the sample covariance between the i-th and j-th variable.

Arguments

x

Two-dimensional matrix containing the data. The data value for the i-th observation of the j-th variable should be in x(i,j).

Keywords

DOUBLE

If present and nonzero, double precision is used.

VAR_COVAR

Variance-covariance matrix (default).


Note
Exactly one of these keywords — VAR_COVAR, CORRECTED_SSCP, CORRELATION, STDEV_CORRELATION — is used to specify the type of matrix to be computed.

CORRECTED_SSCP

Corrected sum-of-squares and crossproducts matrix.


Note
Exactly one of these keywords — VAR_COVAR, CORRECTED_SSCP, CORRELATION, STDEV_CORRELATION — is used to specify the type of matrix to be computed.

CORRELATION

Correlation matrix.


Note
Exactly one of these keywords — VAR_COVAR, CORRECTED_SSCP, CORRELATION, STDEV_CORRELATION — is used to specify the type of matrix to be computed.

STDEV_CORRELATION

Correlation matrix, except for diagonal elements which are standard deviations.


Note
Exactly one of these keywords — VAR_COVAR, CORRECTED_SSCP, CORRELATION, STDEV_CORRELATION — is used to specify the type of matrix to be computed.

FREQUENCIES

Array containing the vector of frequencies for the observation. Default: all observations have a frequency of 1.

INCIDENCE_MAT

Named variable into which the incidence matrix is stored. If Missing_Val is 0, the number of valid observations is returned through this keyword; otherwise, the nvar x nvar matrix, where nvar is the number of variables in x, contains the number of pairs of valid observations used in calculating the crossproducts for covariance.

MISSING_VAL

Scalar integer which defines the method used to exclude missing values in x from the computations, where NaN is interpreted as the missing value code. The methods are as follows:

MEANS

Named variable into which array containing the means of variables in x is stored. The i-th components of the array correspond to x(*, i).

NMISSING

Specifies a variable into which the total number of observations that contain any missing values (NaN) is stored.

NOBS

Named variable into which the sum of the frequencies is stored. If Missing_Val is 0, observations with missing values are not included in Nobs; otherwise, all observations are included except for observations with missing values for the weight or the frequency.

SUM_WEIGHTS

Specifies a variable into which the sum of the weights of all observations is stored. If keyword Missing_val is equal to 0, observations with missing values are not included in Sum_weights. Otherwise, all observations are included except for observations with missing values for the weight or the frequency.

WEIGHT

Array containing the vector of weights for the observation. Default: all observations have equal weights of 1.

Discussion

The IMSL_COVARIANCES function computes estimates of correlations, covariances, or sum of squares and crossproducts for a data matrix x. The means, (corrected) sum of squares, and (corrected) sums of crossproducts are computed using the method of provisional means.

Let:

denote the mean based on i observations for the k-th variable, fi and wi denote the frequency and weight of the i-th observation, respectively, and let cjki denote the sum of crossproducts (or sum of squares if j = k) based on i observations. Then, the method of provisional means finds new means and sums of crossproducts shown in the example below.

The means and crossproducts are initialized as:

 

where p denotes the number of variables. Letting xk, i + 1 denote the k-th variable on observation i + 1, each new observation leads to the following updates for:

and cjki using update constant r i + 1:

Syntax Notes

The IMSL_COVARIANCES function uses the following definition of a sample mean:

where nr is the number of cases. The formula below defines the sample covariance, sjk, between variables j and k.

The sample correlation between variables j and k, rjk, is defined below:

Example

This example illustrates the use of IMSL_COVARIANCES for the first 50 observations in the Fisher iris data (Fisher 1936). Note that the first variable is constant over the first 50 observations.

x = IMSL_STATDATA(3)  
x = x(0:49, *)   
cov = IMSL_COVARIANCES(x)  
; Call IMSL_COVARIANCES.   
PM, cov  
; Output the results.   
0.00000  0.00000   0.00000    0.00000    0.00000   
0.00000  0.124249  0.0992163  0.0163551  0.0103306  
0.00000  0.0992163 0.143690   0.0116980  0.00929796  
0.00000  0.0163551 0.0116980  0.0301592  0.00606939  
0.00000  0.0103306 0.00929796 0.00606939 0.0111061  

Errors

Warning Errors

STAT_CONSTANT_VARIABLE—Correlations are requested, but the observations on one or more variables are constant. The corresponding correlations are set to NaN.

Version History

6.4
Introduced

  IDL Online Help (March 06, 2007)