A VISUAL DETECTION MODEL FOR DCT QUANTIZATION

A VISUAL DETECTION MODEL FOR DCT QUANTIZATION

Albert J. Ahumada, Jr. and Andrew B. Watson
NASA Ames Research Center
Moffett Field, California
and
Heidi A. Peterson
IBM T. J. Watson Research Center
Yorktown Heights, New York

Abstract

The discrete cosine transform (DCT) is widely used in image compression, and is part of the JPEG and MPEG compression standards. The degree of compression, and the amount of distortion in the decompressed image are determined by the quantization of the transform coefficients. The standards do not specify how the DCT coefficients should be quantized. Our approach is to set the quantization level for each coefficient so that the quantization error is at the threshold of visibility. Here we combine results from our previous work to form our current best detection model for DCT coefficient quantization noise. This model predicts sensitivity as a function of display parameters, enabling quantization matrices to be designed for display situations varying in luminance, veiling light, and spatial frequency related conditions (pixel size, viewing distance, and aspect ratio). It also allows arbitrary color space directions for the representation of color.

1. Introduction
1.1 DCT Image Compression

The discrete cosine transform (DCT) has become an image compression standard.^{1, 2, 3} Typically the image is divided into 8× 8-pixel blocks, which are each transformed into 64 DCT coefficients. The DCT transform coefficients I_m,n, of an N × N block of image pixels i_j,k, are given by

I_m,n = S_j=0^N-1 S_k=0^N-1 i_j,k c_j,m c_k,n ,

m,n = 0 ,..., N-1, Eq. (1a)

where

c_j,m = a_m cos ( [pm / 2N ] [ 2j + 1 ] ) , Eq. (1b)

and

a_m = (1/N)^0.5 , m = 0 a_m = (2/N)^0.5 , m > 0 . Eq. (1c)

The block of image pixels is reconstructed by the inverse transform:

i_j,k = S_m=0^N-1 S_n=0^N-1 I_m,n c_j,m c_k,n , Eq. (2)

which for this normalization is the same as the forward transform. Quantization of the DCT coefficients achieves image compression, but it also generates distortion in the decompressed image. If a single coefficient is quantized and its block is reconstructed, the difference between the original image block and the reconstructed block is the error image. This error image has the form of the associated basis function, and its amplitude is proportional to the quantization error of the coefficient. Since the inverse transform is linear, the error image resulting from quantizing multiple coefficients is a sum of such images.

1.2 The Quantization Matrix

The JPEG compression standard^{1, 2} requires that uniform quantization be used for the DCT coefficients, but the quantizer step size to be used for each coefficient is left to the user. The step size used for coefficient I_m,n is denoted by Q_m,n. A coefficient is quantized by the operation

S_m,n = Round( I_m,n / Q_m,n ) , Eq. (3a)

and restored (with the quantization error) by

I_m,n = S_m,n Q_m,n . Eq. (3b)

Two example quantization matrices can be found in the JPEG standard.² These matrices appear in Table 1 following the references. These matrices were designed for a particular viewing situation. No suggestions were provided for how they should be changed to accommodate different viewing conditions, or for compression in a different color space. Our research was initiated to provide quantization matrices suitable for compression in the RGB color representation.⁴ Subsequently, a theoretical framework was constructed and additional measurements have been done.^5-8 Here we summarize the quantization matrix design technique. It can be applied under a wide variety of conditions: different display luminances, veiling luminances, spatial frequencies, and color spaces.

The basic idea of the technique is to develop a detection model that predicts the detectability of the artifacts in a perceptual space representation. This step is described in Section 2. A quantizer step size is then determined from the sensitivity of the perceptual space representation to the quantization distortion. This step is described in Section 3.

2. The Detection Model

2.1 The Luminance Detection Model

The luminance detection model predicts the threshold of detection of the luminance error image generated by quantization of a single DCT coefficient I_m,n. We use the subscript Y for luminance, since we assume that it is defined by the 1931 CIE standard.⁹ This error image is assumed to be below the threshold of visibility if its zero-to-peak luminance is less than a threshold T_m,n given by

log T_{Y, m,n} = P( f_m,n ; b_Y, k_Y, f_Y) , Eq.(4)

= log b_Y , if f_m,n <= f_Y ,

= log b_Y + k_Y (log f_m,n - log f_Y)², if f_m,n > f_Y .

This function P represents a low-pass contrast sensitivity function of spatial frequency. Although luminance contrast sensitivity is more correctly modeled as band-pass, we choose a low-pass function for this application. This ensures that no new artifacts become visible as viewing distance increases. A low-pass function is also convenient because purely chromatic channels are low-pass in this spatial frequency range. We will use P for the luminance and chrominance channels of our model.

The spatial frequency, f_m,n, associated with the m,nth basis function, is given by

f_m,n = (1/2N)((m/W_x)²+(n/W_y)²)^0.5 , Eq. (5)

where W_x and W_y are the horizontal and vertical pixel spacings in degrees of visual angle.

The term b_Y has three components:

b_Y = s T_Y / q_m,n . Eq. (6)

The parameter s is a fraction, 0 < s <= 1, to account for spatial summation of quantization errors over blocks. We set it to unity to model detection experiments with only one block.⁶ Our summation results suggest that it should be equal to the inverse of the fourth root of the number of blocks contributing to detection.⁸ We suggest the value s = 0.25, corresponding to 16× 16 blocks. The factor T_Y gives the dependence of the threshold on the image average luminance Y₀.

T_Y = Y₀^a_T Y_T^1-a_T / S₀ , Y₀<= Y_T , Eq. (7) T_Y = Y₀ / S₀ , Y₀ > Y_T ,

where suggested parameter values are Y_T = 15 cd/m², S₀ = 40, and a_T = 0.65.

The product of a cosine in the x with a cosine in the y direction can be expressed as the sum of two cosines of the same radial spatial frequency but differing in orientation. The factor

q_m,n = r + ( 1 - r ) (1 - [ 2 f_m,0 f_0,n / f_m,n² ]² ) . Eq. (8)

accounts for the imperfect summation of two such frequency components as a function of the angle between them. Based on the fourth power summation rule for the two components when they are orthogonal, r is set to 0.6. An additional oblique effect can be included by decreasing the value of r.

The parameters f_Y and k_Y determine the shape of P and depend on the average luminance Y₀.

f_Y = f₀ Y₀^a_f Y_f^-a_f , Y₀<= Y_f , Eq. (9)

f_Y = f₀ , Y₀ > Y_f ,

and

k_Y = k₀ Y₀^a_k Y_k^-a_k , Y₀<= Y_k , Eq. (10)

k_Y = k₀ , Y₀> Y_k ,

where f₀ = 6.8 cycles/deg, a_f = 0.182, Y_f = 300 cd/m², k₀ = 2 , a_k = 0.0706 , and Y_k = 300 cd/m².

2.2 The Chrominance Detection Model

We now add two chromatic channels to the luminance-only model. From the large number of color spaces that have been proposed for chromatic discriminations, we have selected one close to that suggested by Boynton¹⁰: a red-green opponent channel and a blue channel. Our channels are defined in terms of the CIE 1931 XYZ color space. The blue channel is just Z, and the opponent red-green channel O is given by

O = 0.47 X - 0.37 Y - 0.10 Z . Eq. (11)

This opponent channel is approximately the Boynton⁹ (Red-cone) - 2(Green-cone) channel.

Our model now needs the threshold for quantization noise in the O and Z channels. For simplicity, we model the chromatic thresholds by

log T_O,m,n = P( f_m,n ; 0.36 b_Y, k_Y, f_Y/4) , Eq. (12)

and

log T_Z,m,n = P( f_m,n; 3.00 b_Y, k_Y, f_Y/4) , Eq. (13)

These shapes of these are in rough agreement with experimental results,¹⁰ except that the slopes for the chromatic channels are found to be steeper than that of the luminance channel. The reason for keeping them the same is to prevent strong quantization of purely chromatic channels, since there is a fair amount of individual variability in the exact direction of isoluminance. Although we previously used Z₀ to set the level of the the Z threshold,⁷ we are using Y₀ here under the assumption that the average image color is close to white and hence that they are roughly equal.

Finally, we say that the errors from the quantization of a coefficient are visible, if the error in any of the three channels is visible.

3. Quantization Matrix Design

Suppose that one color dimension D in a color space linearly related to our YOZ color space is to be quantized. Let D_Y, D_O, and D_Z be the amplitudes of the errors in YOZ space generated by a unit error in D. An error generated in the D image by quantizing the m,nth DCT coefficient is then below threshold if it is less than

T_D,m,n = min( T_Y,m,n/|D_Y|, T_O,m,n/|D_O|, T_Z,m,n/|D_Z|) . Eq. (14)

The D quantization matrix entries are obtained by dividing the thresholds above by the DCT normalization constants ( a_m in Equation (1c) ):

Q_D,m,n = 2 T_D,m,n / (a_m a_n) , Eq. (15)

The factor 2 results from the maximum quantization error being half the quantizer step size.

3.1 Quantization in YC_rC_b Space

An example computation. In an attempt to put all luminance information in a single channel, color images are often represented in the YC_rC_b color space for image compression. Pennebaker and Mitchell² give the transformation from RGB to YC_rC_b as

Y' = 0.3 R + 0.6 G + 0.1 B ,

C_r = (R - Y')/1.6 + 0.5 ,

C_b = (B - Y')/2 + 0.5 . Eq. (16)

Suppose that the viewing conditions are set so that the average image luminance Y bar is 40 cd/m², the pixel spacings are 0.028 deg, and the monitor calibration of the XYZ outputs for unit RGB inputs are given by the matrix

             R        G         B     
      X    26.1     25.2       9.3   
      Y    13.3     48.9       4.7   
      Z     2.3     10.2      35.7       Eq. (17)

The values of D_Y, D_O, and D_Z for each dimension turn out to be:

             Y'      C_b     C_r   
      D_Y   66.9    -7.0   -17.8   
      D_O   -1.1     0.6    17.1   
      D_Z   48.2    67.9    -4.5       Eq. (18)

The quantization matrices appear in Table 2 following the references. They have been multiplied by 255 and then rounded.

4. Conclusions

We have presented a model for predicting visibility thresholds for DCT coefficient quantization error, from which quantization matrices for use in DCT-based compression can be designed. We regard this as preliminary results of work in progress. The quantization matrices computed by the techniques described above take no account of image content. At this meeting and elsewhere,^{11, 12} A. B. Watson shows how an extension of this model may be used to optimize quantization matrices for individual images or a class of images.

5. Acknowledgments

We appreciate the help of Jeffrey B. Mulligan. This work was supported in part by the IBM Independent Research and Development Program and by NASA RTOP Nos. 506-59-65 and 505-64-53.

6. Note

This version (12/9/93) of a paper presented to the 1993 AIAA Computing in Aerospace Conference,¹³ corrects a number of minor errors.

References

1. G. Wallace, "The JPEG still picture compression standard", Communications of the ACM, vol. 34, no. 4, pp. 30-44, 1991.

2. W. B. Pennebaker, J. L. Mitchell, JPEG Still Image Data Compression Standard, van Nostrand Reinhold, New York, 1993.

3. D. LeGall, "MPEG: A video compression standard for multimedia applications", Communications of the ACM, vol. 34, no. 4, pp. 46-58, 1991.

4. H. A. Peterson, H. Peng, J. H. Morgan, W. B. Pennebaker, "Quantization of color image components in the DCT domain", in B. E. Rogowitz, M. H. Brill, J. P. Allebach, eds., Human Vision, Visual Processing, and Digital Display II, Proc. SPIE, vol. 1453, pp. 210-222, 1991.

5. A. J. Ahumada, Jr., H. A. Peterson, "Luminance-model-based DCT quantization for color image compression," in B. E. Rogowitz, ed., Human Vision, Visual Processing, and Digital Display III, Proc. SPIE, vol. 1666, pp. 365-374, 1992.

6. H. A. Peterson, "DCT basis function visibility thresholds in RGB space," in J. Morreale, ed., 1992 SID International Symposium Digest of Technical Papers, Society for Information Display, Playa del Rey, CA, pp. 677-680, 1992.

7. H. A. Peterson, A. J. Ahumada, Jr., A. B. Watson, "An improved detection model for DCT coefficient quantization," in B. E. Rogowitz, J. P. Allebach, eds., Human Vision, Visual Processing, and Digital Display IV, Proc. SPIE, vol. 1913, paper 13, 1992.

8. H. A. Peterson, A. J. Ahumada, Jr., A. B. Watson, "The visibility of DCT quantization noise," in J. Morreale, ed., 1993 SID International Symposium Digest of Technical Papers, Society for Information Display, Playa del Rey, CA, pp. 942-945, 1993.

9. R. M. Boynton, Human Color Vision, Holt, Rinehart, and Winston, New York, 1979

10. K. T. Mullen, "The contrast sensitivity of human colour vision to red-green and blue-yellow chromatic gratings," Journal of Physiology, vol. 359, pp. 381-400, 1985.

11. A. B. Watson, "DCT quantization matrices visually optimized for individual images," in B. E. Rogowitz, J. P. Allebach, eds., Human Vision, Visual Processing, and Digital Display IV, SPIE, Bellingham, WA, 1993.

12. A. B. Watson, "Visual optimization of DCT quantization matrices for individual images," AIAA Computing in Aerospace 9 Conference Proceedings, vol. CP939, American Institute of Aeronautics and Astronautics, Washington, D. C., pp. 286-291, 1993.

13. A. J. Ahumada, Jr., H. A. Peterson, "A visual detection model for DCT coefficient quantization", AIAA Computing in Aerospace 9 Conference Proceedings, vol. CP939, American Institute of Aeronautics and Astronautics, Washington, D. C., pp. 314-319, 1993.

Table 1. The default quantization matrices. The Q_0,0 value is located in the upper left corner of each quantization matrix.

luminance
  16  11  10  16  24  40  51  61
  12  12  14  19  26  58  60  55
  14  13  16  24  40  57  69  56
  14  17  22  29  51  87  80  62
  18  22  37  56  68 109 103  77
  24  35  55  64  81 104 113  92
  49  64  78  87 103 121 120 101
  72  92  95  98 112 100 103  99
chrominance
  17  18  24  47  99  99  99  99
  18  21  26  66  99  99  99  99
  24  26  56  99  99  99  99  99
  47  66  99  99  99  99  99  99
  99  99  99  99  99  99  99  99
  99  99  99  99  99  99  99  99
  99  99  99  99  99  99  99  99
  99  99  99  99  99  99  99  99

Table 2. YC_rC_b quantization matrices. The values in these matrices are obtained following the procedure described in Section 3. The Q_0,0 value is located in the upper left corner of each quantization matrix. As specified in the JPEG standard, the values have been rounded to the nearest integer. JPEG also requires that values in the quantization matrix be <= 255.

Y'
  15  11  11  12  15  19  25  32
  11  13  10  10  12  15  19  24
  11  10  14  14  16  18  22  27
  12  10  14  18  21  24  28  33
  15  12  16  21  26  31  36  42
  19  15  18  24  31  38  45  53
  25  19  22  28  36  45  55  65
  32  24  27  33  42  53  65  77
C_r
  21  21  41  45  55  71  92 120
  21  37  39  38  44  55  70  89
  41  39  51  54  59  69  83 103
  45  38  54  69  80  91 106 126
  55  44  59  80 100 117 136 158
  71  55  69  91 117 144 170 198
  92  70  83 106 136 170 206 243
 120  89 103 126 158 198 243 290
C_b
  45  43 103 114 141 181 236 306
  43  78  99  97 113 140 178 228
 103  99 130 138 150 175 212 262
 114  97 138 176 203 232 270 321
 141 113 150 203 254 299 347 403
 181 140 175 232 299 367 434 505
 236 178 212 270 347 434 525 619
 306 228 262 321 403 505 619 739