Albert J. Ahumada, Jr. and Andrew B. Watson
NASA Ames Research Center
Moffett Field, California
and
Heidi A. Peterson
IBM T. J. Watson Research Center
Yorktown Heights, New York
The discrete cosine transform (DCT) is widely used in image compression, and is part of the JPEG and MPEG compression standards. The degree of compression, and the amount of distortion in the decompressed image are determined by the quantization of the transform coefficients. The standards do not specify how the DCT coefficients should be quantized. Our approach is to set the quantization level for each coefficient so that the quantization error is at the threshold of visibility. Here we combine results from our previous work to form our current best detection model for DCT coefficient quantization noise. This model predicts sensitivity as a function of display parameters, enabling quantization matrices to be designed for display situations varying in luminance, veiling light, and spatial frequency related conditions (pixel size, viewing distance, and aspect ratio). It also allows arbitrary color space directions for the representation of color.
1.1 DCT Image Compression
The discrete cosine transform (DCT) has become an
image compression standard.1, 2, 3
Typically the image is divided into 8× 8-pixel blocks,
which are each transformed into 64 DCT coefficients.
The DCT transform coefficients Im,n
,
of an N × N
block of image pixels ij,k
,
are given by
Im,n =
Sj=0N-1
Sk=0N-1
ij,k cj,m ck,n ,
m,n = 0 ,..., N-1,
Eq. (1a)
where
cj,m =
am cos ( [pm / 2N ] [ 2j + 1 ] ) ,
Eq. (1b)
and
am = (1/N) 0.5 , m = 0
am = (2/N) 0.5 , m > 0 .
Eq. (1c)
The block of image pixels is reconstructed by the inverse transform:
ij,k
=
Sm=0N-1
Sn=0N-1
Im,n cj,m ck,n ,
Eq. (2)
which for this normalization is the same as the forward transform. Quantization of the DCT coefficients achieves image compression, but it also generates distortion in the decompressed image. If a single coefficient is quantized and its block is reconstructed, the difference between the original image block and the reconstructed block is the error image. This error image has the form of the associated basis function, and its amplitude is proportional to the quantization error of the coefficient. Since the inverse transform is linear, the error image resulting from quantizing multiple coefficients is a sum of such images.
1.2 The Quantization Matrix
The JPEG compression standard1, 2
requires that uniform quantization be used for the DCT coefficients,
but the quantizer step size to be used for each coefficient is
left to the user.
The step size used for coefficient Im,n
is denoted by Qm,n
.
A coefficient is quantized by the operation
Sm,n = Round( Im,n / Qm,n ) ,
Eq. (3a)
and restored (with the quantization error) by
Im,n = Sm,n Qm,n .
Eq. (3b)
Two example quantization matrices can be found in the JPEG standard.2 These matrices appear in Table 1 following the references. These matrices were designed for a particular viewing situation. No suggestions were provided for how they should be changed to accommodate different viewing conditions, or for compression in a different color space. Our research was initiated to provide quantization matrices suitable for compression in the RGB color representation.4 Subsequently, a theoretical framework was constructed and additional measurements have been done.5-8 Here we summarize the quantization matrix design technique. It can be applied under a wide variety of conditions: different display luminances, veiling luminances, spatial frequencies, and color spaces.
The basic idea of the technique is to develop a detection model that predicts the detectability of the artifacts in a perceptual space representation. This step is described in Section 2. A quantizer step size is then determined from the sensitivity of the perceptual space representation to the quantization distortion. This step is described in Section 3.
The luminance detection model predicts the threshold
of detection of the luminance error image generated by quantization
of a single DCT coefficient Im,n
.
We use the subscript Y
for luminance, since we assume that
it is defined by the 1931 CIE standard.9
This error image is assumed to be below the threshold of visibility
if its zero-to-peak luminance is less than a threshold
Tm,n
given by
log TY, m,n
= P( fm,n ; bY, kY, fY) ,
Eq.(4)
= log bY , if fm,n <= fY ,
= log bY
+ kY (log fm,n - log fY)2, if fm,n > fY .
This function P
represents a low-pass contrast sensitivity function
of spatial frequency.
Although luminance contrast sensitivity is more correctly
modeled as band-pass,
we choose a low-pass function for this application.
This ensures that no new artifacts become visible as
viewing distance increases.
A low-pass function is also convenient because
purely chromatic channels are low-pass in
this spatial frequency range.
We will use P
for the luminance and chrominance channels of our model.
The spatial frequency, fm,n
, associated with the m,n
th
basis function, is given by
fm,n =
(1/2N)((m/Wx)2+(n/Wy)2)0.5 ,
Eq. (5)
where Wx
and Wy
are the horizontal and vertical pixel spacings
in degrees of visual angle.
The term bY
has three components:
bY
= s TY / qm,n .
Eq. (6)
The parameter s
is a fraction, 0 < s <= 1
, to account for spatial
summation of quantization errors over blocks.
We set it to unity to model detection experiments
with only one block.6
Our summation results suggest that it should be equal to the
inverse of the
fourth root of the number of blocks contributing to detection.8
We suggest the value s = 0.25
,
corresponding to 16× 16 blocks.
The factor TY
gives the dependence of the threshold
on the image average luminance Y0
.
TY
= Y0aT YT1-aT / S0 ,
Y0<= YT ,
Eq. (7)
TY = Y0 / S0 , Y0 > YT ,
where suggested parameter values are
YT = 15
cd/m2,
S0 = 40
,
and aT = 0.65
.
The product of a cosine in the x with a cosine in the y direction can be expressed as the sum of two cosines of the same radial spatial frequency but differing in orientation. The factor
qm,n =
r + ( 1 - r )
(1 - [ 2 fm,0 f0,n / fm,n2 ]2 ) .
Eq. (8)
accounts for the imperfect summation of two such frequency components
as a function of the angle between them.
Based on the fourth power summation rule for the two
components when they are orthogonal, r
is set to 0.6.
An additional oblique effect can be included by decreasing the value of r
.
The parameters fY
and kY
determine the shape of P
and depend on the average luminance Y0
.
fY
= f0 Y0af Yf-af , Y0<= Yf ,
Eq. (9)
fY
= f0 , Y0 > Yf ,
and
kY
= k0 Y0ak Yk-ak ,
Y0<= Yk ,
Eq. (10)
kY
= k0 , Y0> Yk ,
where
f0 = 6.8
cycles/deg,
af = 0.182
,
Yf = 300
cd/m2 ,
k0 = 2
,
ak = 0.0706
, and
Yk = 300
cd/m2 .
2.2 The Chrominance Detection Model
We now add two chromatic channels to the luminance-only model.
From the large number of color spaces that have been proposed
for chromatic discriminations,
we have selected one close to that
suggested by Boynton10:
a red-green opponent channel and a blue channel.
Our channels are defined in terms of
the CIE 1931 XYZ
color space.
The blue channel is just Z
, and
the opponent red-green channel O
is given by
O = 0.47 X - 0.37 Y - 0.10 Z .
Eq. (11)
This opponent channel is approximately the Boynton 9
(Red-cone) - 2
(Green-cone) channel.
Our model now needs
the threshold for quantization noise in
the O
and Z
channels.
For simplicity, we model the
chromatic thresholds by
log TO,m,n
= P( fm,n ;
0.36 bY, kY, fY/4) ,
Eq. (12)
and
log TZ,m,n
= P( fm,n;
3.00 bY, kY, fY/4) ,
Eq. (13)
These shapes of these are in rough agreement with experimental
results,10
except that the slopes for the chromatic channels are found
to be steeper than that of the luminance channel.
The reason for keeping them the same is to prevent strong
quantization of purely chromatic channels, since there is
a fair amount of individual variability in the exact
direction of isoluminance.
Although we previously used Z0
to set the level of the
the Z
threshold,7
we are using Y0
here under the assumption
that the average image color is close to white and hence that
they are roughly equal.
Finally, we say that the errors from the quantization of a coefficient are visible, if the error in any of the three channels is visible.
Suppose that one color dimension D
in a color space linearly related to
our YOZ
color space is to be quantized.
Let DY
, DO
, and DZ
be the amplitudes of the errors in YOZ
space generated by a unit error in D
.
An error generated in the D
image by quantizing the m,n
th
DCT coefficient is then below threshold if it is less than
TD,m,n = min(
TY,m,n/|DY|,
TO,m,n/|DO|,
TZ,m,n/|DZ|) .
Eq. (14)
The D
quantization matrix entries are obtained
by dividing the thresholds above by the DCT normalization constants
( am
in Equation (1c) ):
QD,m,n = 2 TD,m,n /
(am an) ,
Eq. (15)
The factor 2 results from the maximum quantization error being half the quantizer step size.
3.1 Quantization in YCrCb
Space
An example computation.
In an attempt to put all luminance information in
a single channel,
color images are often represented in the YCrCb
color
space for image compression.
Pennebaker and Mitchell2 give the transformation from
RGB
to YCrCb
as
Y' = 0.3 R + 0.6 G + 0.1 B ,
Cr = (R - Y')/1.6 + 0.5 ,
Cb = (B - Y')/2 + 0.5 .
Eq. (16)
Suppose that the viewing conditions are set so that
the average image luminance Y bar
is 40 cd/m2, the
pixel spacings are 0.028 deg, and
the monitor calibration of the XYZ
outputs for unit
RGB
inputs are given by the matrix
R G B X 26.1 25.2 9.3 Y 13.3 48.9 4.7 Z 2.3 10.2 35.7 Eq. (17)
The values of DY
, DO
, and DZ
for each dimension
turn out to be:
Y' Cb Cr DY 66.9 -7.0 -17.8 DO -1.1 0.6 17.1 DZ 48.2 67.9 -4.5 Eq. (18)
The quantization matrices appear in Table 2 following the references. They have been multiplied by 255 and then rounded.
We have presented a model for predicting visibility thresholds for DCT coefficient quantization error, from which quantization matrices for use in DCT-based compression can be designed. We regard this as preliminary results of work in progress. The quantization matrices computed by the techniques described above take no account of image content. At this meeting and elsewhere,11, 12 A. B. Watson shows how an extension of this model may be used to optimize quantization matrices for individual images or a class of images.
We appreciate the help of Jeffrey B. Mulligan. This work was supported in part by the IBM Independent Research and Development Program and by NASA RTOP Nos. 506-59-65 and 505-64-53.
This version (12/9/93) of a paper presented to the 1993 AIAA Computing in Aerospace Conference, 13 corrects a number of minor errors.
1. G. Wallace, "The JPEG still picture compression standard", Communications of the ACM, vol. 34, no. 4, pp. 30-44, 1991.
2. W. B. Pennebaker, J. L. Mitchell, JPEG Still Image Data Compression Standard, van Nostrand Reinhold, New York, 1993.
3. D. LeGall, "MPEG: A video compression standard for multimedia applications", Communications of the ACM, vol. 34, no. 4, pp. 46-58, 1991.
4. H. A. Peterson, H. Peng, J. H. Morgan, W. B. Pennebaker, "Quantization of color image components in the DCT domain", in B. E. Rogowitz, M. H. Brill, J. P. Allebach, eds., Human Vision, Visual Processing, and Digital Display II, Proc. SPIE, vol. 1453, pp. 210-222, 1991.
5. A. J. Ahumada, Jr., H. A. Peterson, "Luminance-model-based DCT quantization for color image compression," in B. E. Rogowitz, ed., Human Vision, Visual Processing, and Digital Display III, Proc. SPIE, vol. 1666, pp. 365-374, 1992.
6.
H. A. Peterson,
"DCT basis function visibility thresholds in RGB
space,"
in
J. Morreale, ed.,
1992 SID International Symposium Digest of Technical Papers,
Society for Information Display,
Playa del Rey, CA,
pp. 677-680,
1992.
7. H. A. Peterson, A. J. Ahumada, Jr., A. B. Watson, "An improved detection model for DCT coefficient quantization," in B. E. Rogowitz, J. P. Allebach, eds., Human Vision, Visual Processing, and Digital Display IV, Proc. SPIE, vol. 1913, paper 13, 1992.
8. H. A. Peterson, A. J. Ahumada, Jr., A. B. Watson, "The visibility of DCT quantization noise," in J. Morreale, ed., 1993 SID International Symposium Digest of Technical Papers, Society for Information Display, Playa del Rey, CA, pp. 942-945, 1993.
9. R. M. Boynton, Human Color Vision, Holt, Rinehart, and Winston, New York, 1979
10. K. T. Mullen, "The contrast sensitivity of human colour vision to red-green and blue-yellow chromatic gratings," Journal of Physiology, vol. 359, pp. 381-400, 1985.
11. A. B. Watson, "DCT quantization matrices visually optimized for individual images," in B. E. Rogowitz, J. P. Allebach, eds., Human Vision, Visual Processing, and Digital Display IV, SPIE, Bellingham, WA, 1993.
12. A. B. Watson, "Visual optimization of DCT quantization matrices for individual images," AIAA Computing in Aerospace 9 Conference Proceedings, vol. CP939, American Institute of Aeronautics and Astronautics, Washington, D. C., pp. 286-291, 1993.
13. A. J. Ahumada, Jr., H. A. Peterson, "A visual detection model for DCT coefficient quantization", AIAA Computing in Aerospace 9 Conference Proceedings, vol. CP939, American Institute of Aeronautics and Astronautics, Washington, D. C., pp. 314-319, 1993.
Table 1. The default quantization matrices.
The Q0,0
value is located in the upper left corner of each
quantization matrix.
luminance 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 chrominance 17 18 24 47 99 99 99 99 18 21 26 66 99 99 99 99 24 26 56 99 99 99 99 99 47 66 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
Table 2. YCrCb
quantization matrices.
The values in these matrices are obtained following the procedure
described in Section 3.
The Q0,0
value is located in the upper left corner of each
quantization matrix.
As specified in the JPEG standard, the values have been rounded to the
nearest integer.
JPEG also requires that values in the quantization matrix be
<= 255
.
Y' 15 11 11 12 15 19 25 32 11 13 10 10 12 15 19 24 11 10 14 14 16 18 22 27 12 10 14 18 21 24 28 33 15 12 16 21 26 31 36 42 19 15 18 24 31 38 45 53 25 19 22 28 36 45 55 65 32 24 27 33 42 53 65 77 Cr 21 21 41 45 55 71 92 120 21 37 39 38 44 55 70 89 41 39 51 54 59 69 83 103 45 38 54 69 80 91 106 126 55 44 59 80 100 117 136 158 71 55 69 91 117 144 170 198 92 70 83 106 136 170 206 243 120 89 103 126 158 198 243 290 Cb 45 43 103 114 141 181 236 306 43 78 99 97 113 140 178 228 103 99 130 138 150 175 212 262 114 97 138 176 203 232 270 321 141 113 150 203 254 299 347 403 181 140 175 232 299 367 434 505 236 178 212 270 347 434 525 619 306 228 262 321 403 505 619 739