Notes on image coaddition, PSF homogenization, and matched filters

1 Introduction

As seen from the Earth, stars appear as point sources; the diameter of the star is usually too small to be measured. On images from telescopes, however, they appear slightly blurry due to turbulance in the atmosphere and (in space or in small telescopes) limitations in the optics of the instrument. The apparent morphology of a point source such as a star in an image called the point spread function (PSF), or sometimes the point response function, of the image. The PSF of different exposures from the same telescope may vary significantly, and the PSF can also vary across a single image.

In the approximation that the PSF is uniform across an image, the effect of the atmosphere and instrument on the image is that of the application of a low-pass Fourier filter.

There is a significant body of work dedicated to altering the point spread function of images, particularly in approximating the true sky from available images. Much of this work has been done specifically to address the early optical problems of the Hubble Space Telescope; see the STScI conference on restoration of HST images. Good reference papers include the review paper by Hunt and Digital Image Processing by Gonzales and Woods.

Sky surveys have an additional motivation for studying PSF alteration. When multiple exposures are combined into a single image of the sky, and the exposures do not all have the same PSF, the PSF in the combined image will jump discontinuously at the edge of each exposure. In a large survey with many exposures at differing offsets, this is a significant problem.

These notes were prepared for a meeting to discuss the impact of PSF homogenization in such a survey. The plots and examples below were all generated using the R statistical computing software package, and the R code accompanies the plots.

2 A Matched Filter on a single exposure

2.1 The special case of an isolated point source in given location

For an image of a single point source, the total number of counts recorded in each pixel of an image is:

ltxpng/PSFhomogenization_0012.png

Note that h can be a point spread function (scaled by the photometric solution), but could apply to more arbitrary mappings.

The final image will also contain noise:

ltxpng/PSFhomogenization_0013.png

When we are concerned with objects near the limits of detection, the photon noise from the sky will dominate; the noise will be uniform over the image.

If we wanted to measure the flux from the point source, we might simply estimate it from the brightest pixel:

ltxpng/PSFhomogenization_0014.png

We are not limited to using just that single pixel; we can get such an estimate for any pixel with flux:

ltxpng/PSFhomogenization_0015.png

but the uncertaitly will not be uniform:

ltxpng/PSFhomogenization_0016.png

We can do a weighted mean of these different estimates to get an estimate with minimal uncertainty:

ltxpng/PSFhomogenization_0017.png

Note that this is the value of the convolution of the image by its point spread function, evaluated at the position of the point source, and scaled by constant that depends on the point spread function and photometric solution:

ltxpng/PSFhomogenization_0018.png

Because the noise variance is independent of position, the uncertainty in the estimate of the flux is the the standard deviation of the sky noise in the image, after convolution by the point spread function. Because we know this particulary weighting minimizes the uncertainty in the flux relative to the flux, we can conclude that this weighting maximizes the value of the flux relative to the sky noise. This is precisely what we want for object detection; the point spread function is our optimal matching filter.

2.2 Example

First, I define a Moffat PSF:

moffat <- function(r,fwhm,beta=4.765) {
  hwhm <- (fwhm/2.0)
  alpha <- hwhm*(2^(1.0/beta)-1)^(-0.5)
  p <- (1.0 + (r/alpha)^2.0)^(-1.0*beta)
  p/sum(p)
}

Plot the raw signal, with the sky sigma equal to the square root of the sky and the flux in a Moffat profile with a total sky flux 5 times the sky sigma:

set.seed(53727)
npoints <- 64
sigma <- 4.0
flux <- 5*sigma
x <- (-1*npoints/2):(npoints/2)
h <- moffat(x,4.25)
n <- rnorm(x,sd=sigma)
b <- sigma^2+flux*h
g <- b+n

png("sample.png",height=256,width=640)
par(mar=c(1.1,1.1,0.1,8),bty="l",xaxt="n",yaxt="n",mgp=c(0,0,0))
plot(x,g,type="s",ylab="flux")
mtext("Data with noise",4,0,las=1)
lines(x,b,col="red",type="s")
mtext("Data w/o noise",4,0,las=1,padj=2,col="red")
dev.off()

Our representation of the pixel transfer function is centered on the center of the array rather than the edge, so the default R convolution shifts everything. Define a new convolution to center it on the 0 index.

recenter <- function(x) {
  npts <- length(x)
  hnpts <- ceiling(npts/2)
  z <- c(x[(hnpts+1):npts],x[1:(hnpts)])
}

Now, apply the convolution

fhat <- recenter(convolve(g,h,type="circular"))

png("matched.png",height=256,width=640)
par(mar=c(1.1,1.1,0.1,8),bty="l",xaxt="n",yaxt="n",mgp=c(0,0,0))
plot(x,g,col="gray",type="s")
mtext("Data with noise",4,0,padj=-2,las=1,col="gray")
lines(x,fhat,type="s")
mtext("Matched filtered",4,0,padj=0,las=1)
lines(x,b,col="green",type="s")
mtext("Data w/o noise",4,0,las=1,padj=2,col="green")
dev.off()

2.3 The Matched filter in Fourier space

Frequency filtering alters both the signal and noise of an image, but because the power spectrum of the signal and noise differ, the filtering will alter the signal and noise differently. To detect an image, we need the signal from a given object to be as high as possible relative to the noise; in other words, we need to apply a filter that maximizes the signal to noise of the objects near the detection limit.

Near the detection limit, the noise will be approximately uniform, and the power spectrum of the noise will be flat; the Fourier transform of the noise will be of equal magnitude at all frequencies, and have a mean of zero. The Fourier transform of the signal, however, will be the Fourier transform of the object, multiplied by the Fourier transform of the PSF. In the case of a point source, the power spectrum will be the shape of the PSF, with a magnitude that varies with the flux of the object.

So, our goal is to derive a matched filter that maximizes the signal to noise. In other words, our filtered image:

ltxpng/PSFhomogenization_0019.png

should be a least squares fit, minimizing:

ltxpng/PSFhomogenization_0020.png

This is the Wiener filter, taking considering only low signal to noise objects as part of the signal. In Fourier space

ltxpng/PSFhomogenization_0001.png

If we are looking at low signal to noise objects (|B|² < < |N|² ) and we are looking at a random distribution of point sources (F(ν) is flat), then

ltxpng/PSFhomogenization_0021.png

and the matched filter is a mirror reflection of the point spread function, normalized by some factor (which doesn't affect the effectiveness of the matched filter).

2.4 A Fuzzy-minded explanation

What, then, does this mean? A pixel in the "true" image f(x) has contributions from each F(ν) Fourier components. Convolution by the point spread function by the instrument, atmosphere, etc. reduces this amount to H(ν)F(ν), and noise N(ν) is added to the result, giving us our observed image g(x), whose Fourier transform is:

ltxpng/PSFhomogenization_0022.png

Our best estimate of F(ν) will be

ltxpng/PSFhomogenization_0023.png

Our image g(x), however, does not have the same signal to noise in each frequency component. If we weight each frequency component by its (S/N)² rather than doing a direct inverse transform, then we get

ltxpng/PSFhomogenization_0002.png

which is just the inverse transform of our match filtered image.

2.5 An example

Going back to our previous example,

GG <- abs(fft(g))^2
BB <- abs(fft(b))^2
NN <- abs(fft(n))^2
FFhat <- abs(fft(fhat))^2
k <- 1:(npoints/2)

png("matchedps.png",height=256,width=640)
par(mar=c(4,4,0.1,0.1),bty="l")
plot(k,GG[k],type="s",log="y",lwd=3,xlab="wavenumber",ylab="power")
lines(k,FFhat[k],type="s",col="blue")
lines(k,BB[k],type="s",col="green")
lines(k,NN[k],type="s",col="red")
leglabels<-c("Original data","Matched filtered data","Noise-free data", "Noise")
legcols<-c("black","blue","green","red")
legend(20,1e+06,leglabels,col=legcols,lty=1,bty="n")
dev.off()

Of course, when we are doing our fit, we do not know specifically what all the noise values are, so we use our expectation value for our magnitude of the noise:

NNe <- rep(0,times=npoints+1)+npoints*(sigma^2)
GGe <- rep(0,times=npoints+1)+BB+NNe

png("matchedpsue.png",height=256,width=640)
par(mar=c(4,4,0.1,0.1),bty="l")
plot(k,GG[k],type="s",log="y",lwd=2,col="gray",xlab="wavenumber",ylab="power")
lines(k,FFhat[k],type="s",col="blue")
lines(k,GGe[k],type="s",lwd=2)
lines(k,BB[k],type="s",col="green")
lines(k,NNe[k],type="s",col="red")
leglabels<-c("Original data","Matched filtered data","Expected signal+noise", "Noise-free signal", "Expected noise")
legcols<-c("gray","blue","black","green","red")
legend(20,1e+06,leglabels,col=legcols,lty=1,bty="n")
dev.off()

3 Application of matched filters to resampled images

Interpolation by a sinc function that is critically sampled at a higher frequency than the point spread function, such that it doesn't smooth the original PSF in physical space, is benign with respect to the matched filter; the optimal matched filter for point sources is still the PSF (but in the image space of the interpolated image).

Recall that the Fourier transform of a sinc function is a square; interpolation by a sinc is equivalent to taking the fourier transform, possibly replacing frequencies higher that some threshhold with zero, padding the result with zeros to get an array of the desired size, and inverting the Fourier transform. For example, in the above example, the power spectro of the intrepolated images would look like this:

snpoints <- 100
sk <- 1:(snpoints/2)
sGG <- rep(0,times=snpoints+1)
sFFhat <- rep(0,times=snpoints+1)
sGGe <- rep(0,times=snpoints+1)
sBB <- rep(0,times=snpoints+1)
sNNe <- rep(0,times=snpoints+1)
sGG[k] <- GG[k]
sFFhat[k] <- FFhat[k]
sGGe[k] <- GGe[k]
sBB[k] <- BB[k]
sNNe[k] <- NNe[k]

png("matchedpssincresamp.png",height=256,width=640)
par(mar=c(4,4,0.1,0.1),bty="l")
plot(sk,sGG[sk],type="s",log="y",lwd=2,col="gray",xlab="wavenumber",ylab="power")
lines(sk,sFFhat[sk],type="s",col="blue")
lines(sk,sGGe[sk],type="s",lwd=2)
lines(sk,sBB[sk],type="s",col="green")
lines(sk,sNNe[sk],type="s",col="red")
leglabels<-c("Original data","Matched filtered data","Expected signal+noise", "Noise-free signal", "Expected noise")
legcols<-c("gray","blue","black","green","red")
legend(30.5,1e+06,leglabels,col=legcols,lty=1,bty="n")
dev.off()

The works because, although there is significant correlation in the noise in the interpolated image, if a sinc function with a reasonable width was used, all of the correlation is in frequencies where the matched filter has no power, and therefore these frequencies have zero weight in the filtered image.

Spline intrepolation gives nearly the same answer as sinc interpolation, so it is nearly benign as well:

spg <- spline(g,n=(snpoints+1))$y
sph <- spline(h,n=(snpoints+1))$y
spfhat <- recenter(convolve(spg,sph,type="circular"))
spGG <- abs(fft(spg))^2
spFFhat <- abs(fft(spfhat))^2

png("matchedpssplineresamp.png",height=256,width=640)
par(mar=c(4,4,0.1,0.1),bty="l")
plot(sk,sFFhat[sk],type="s",log="y",lwd=2,col="lightblue",xlab="wavenumber",ylab="power")
lines(sk,sGG[sk],type="s",lwd=2,col="gray",xlab="wavenumber",ylab="power")
lines(sk,spGG[sk],type="s",lwd=2,col="black",xlab="wavenumber",ylab="power")
lines(sk,spFFhat[sk],type="s",col="red")
leglabels<-c("Sinc data","Filtered sinc data","Spline data","Filtered spline data")
legcols<-c("gray","lightblue","black","red")
legend(0,5e-01,leglabels,col=legcols,lty=1,bty="n")
dev.off()

Note that, while the high frequency power in the spline interpolated version does not go precisely to zero as it does in the sinc interpolated version, the contribution from these frequencies is still many orders of magnitude less than in lower frequencies.

4 Consequences of using a different filter for matching

Consider the situation where, instead of using a filter that matches our PSF, we use a different filter, H₀, for object detection, perhaps one that matches the global average PSF of many observations.

Instead of using an optimal image for object detection:

ltxpng/PSFhomogenization_0024.png

we get a slightly different image:

ltxpng/PSFhomogenization_0003.png

Because the H₀ is probably close to H, this is likely to be fairly benign. However, recall one important assumption we made in deriving this matched filter: that the image was a random distribution of point sources. Different matched filters will match objects with different morphologies. If, for example, H₀ is slightly wider than H, the filter may not be optimally suited for detecting point sources, but it may be better suited for detecting slightly extended sources. If H₀ has a different ellipticity than H, then it will be better at detecting objects with some ellipticities, and worse at detecting objects with others. So, if we use the same matched filter for all images, ignoring variations is PSF, then our limiting magnitude will depend on object morphology in some complicated way.

5 PSF Homogenization

PSF homogenization is the process of convolving an image by some kernel that results in an image with a desired PSF. This is easist to understand in Fourier space. Consider a starting image with PSF H:

ltxpng/PSFhomogenization_0025.png

If we want an image G_h(ν) such that

ltxpng/PSFhomogenization_0026.png

then we must multiply G(ν) by

ltxpng/PSFhomogenization_0027.png

to get

ltxpng/PSFhomogenization_0004.png

where

ltxpng/PSFhomogenization_0028.png

Note that the point spread function is in the denominator of the homogenization filter: the filter amplifies spatial frequencies where there is little power in the actual PSF relative to the desired PSF, while it suppresses frequencies where there is more power.

Consider the example we have been using:

GG <- abs(fft(g))^2
BB <- abs(fft(b))^2
NN <- abs(fft(n))^2
NNe <- rep(0,times=npoints+1)+npoints*(sigma^2)
k <- 1:(npoints/2)

png("hmsampstart.png",height=256,width=640)
par(mar=c(4,4,0.1,0.1),bty="l")
plot(k,GG[k],type="s",log="y",lwd=2,xlab="wavenumber",ylab="power",col="grey")
lines(k,BB[k],type="s",col="lightgreen")
lines(k,NNe[k],type="s",col="pink")
leglabels<-c("Original data","Noise-free data", "Expected noise")
legcols<-c("gray","lightgreen","pink")
legend(20,1e+06,leglabels,col=legcols,lty=1,bty="n")
dev.off()

Now, lets construct and apply a homogenization filter. Start with the case where the target PSF is wider (has weaker high frequency spatial components):

h0 <- moffat(x,6.0)
GGh <- abs(fft(h0)*fft(g)/fft(h))^2
NNh <- NNe*abs(fft(h0)/fft(h))^2
BBh <- abs(fft(h0)*fft(b)/fft(h))^2

png("hmsampwide.png",height=256,width=640)
par(mar=c(4,4,0.1,0.1),bty="l")
plot(k,GG[k],type="s",log="y",lwd=2,xlab="wavenumber",ylab="power",col="grey")
lines(k,GGh[k],type="s",lwd=2)
lines(k,BBh[k],type="s",col="green")
lines(k,NNh[k],type="s",col="red")
leglabels<-c("Original data","Homogenized data","Homogenized signal", "Homogenized expected noise")
legcols<-c("gray","black","green","red")
legend(15,1e+06,leglabels,col=legcols,lty=1,bty="n")
dev.off()

Now we move to the case where the target PSF is sharper (has stronger high frequency spatial components):

h0 <- moffat(x,2.5)
GGh <- abs(fft(h0)*fft(g)/fft(h))^2
NNh <- NNe*abs(fft(h0)/fft(h))^2
BBh <- abs(fft(h0)*fft(b)/fft(h))^2

png("hmsampsharp.png",height=256,width=640)
par(mar=c(4,4,0.1,0.1),bty="l")
plot(k,GG[k],type="s",log="y",lwd=2,xlab="wavenumber",ylab="power",col="grey")
lines(k,GGh[k],type="s",lwd=2)
lines(k,BBh[k],type="s",col="green")
lines(k,NNh[k],type="s",col="red")
leglabels<-c("Original data","Homogenized data","Homogenized signal", "Hmg. expected noise")
legcols<-c("gray","black","green","red")
legend(20,1e+06,leglabels,col=legcols,lty=1,bty="n")
dev.off()

Note that the high frequency noise gets greatly amplified. This effect can be reduced by a more careful choice for the form for the target PSF (for example by using a sinc instead of a gaussian), but to at least some extent amplification of high frequency noise is inherent in the the technique.

6 Naive matching filtering of a PSF homogenized image

Our derivation of the matching filter in previous sections assumed that the noize was uniform, at least over the range for the frequency range has finite values. This is clearly not the case for the PSF homogenized image. What happens if we ignore this, and pretend that the noise is indeed uniform after homogenization? The final, matched filtered image would look like this:

ltxpng/PSFhomogenization_0005.png

The fractional difference between applying an optimal matched filter and naively filtering a homogenized image is the square of the difference between the optimal filter and simple application of a standardized filter on non-homogenized images. When using a standardized PSF, homogenization pushes the weighting further from optimal, and exagerates the sensativity of the detection limit to object morphology.

What, then, is the optimal filter for an homogenized image?

ltxpng/PSFhomogenization_0006.png

So the optimal filter is

ltxpng/PSFhomogenization_0029.png

7 Coaddition of PSF homogenized images

7.1 Coaddition in frequency space

The effects of PSF homogenization are much clearer in frequency space than in image space, so in this discussion I will make use of the addition theorem (see Bracewell, The Fourier Transform and its Applications, p104):

ltxpng/PSFhomogenization_0007.png

7.2 Direct coaddition

What happens if we coadd images? If we use the traditional approach, adding the images:

ltxpng/PSFhomogenization_0008.png

with weights w₁ and w₂, perhaps setting w₂ = 1 - w₁, then we get

ltxpng/PSFhomogenization_0030.png

7.3 Coaddition after homogenization

After homogenization, we have:

ltxpng/PSFhomogenization_0009.png

and after coaddition:

ltxpng/PSFhomogenization_0031.png

Note that for any given ν, the coadded G_ah(ν) is still a weighted average of G₁(ν) and G₂(ν), but this time the weights are a function of ν. Note that the signal to noise G₁(ν) at a given ν is proportional to H₁(ν), but that H₁(ν) appears in the denominator of the varying weights: the weight of a given pixel in frequency space is inversly proportional to its signal to noise! At frequencies where G₁ has a high signal to noise but G₂ has a low one, the G₂ will dominate in the coadded image, and vice versa. This is exactly the opposite of the weighting one wants to maximize the signal to noise in the coadded image.

If you examine the weights of N₁(ν) and N₂(ν), you can see that the weights of either can become arbitrarily large, depentding on the corresponding H. Note that this is not possible in direct coaddition, where the weight is indepentend of frequency.

7.4 Coaddition after matched filtering

After application of the matched filter, we have

ltxpng/PSFhomogenization_0010.png

and after coaddition:

ltxpng/PSFhomogenization_0032.png

Now, the relative weights of G₁ and G₂ still vary with ν, but in a more apprepriate way: frequencies with more signal get weighted more heavily.

At frequencies where a given exposure has a small value for H(ν), note that the noise contribution to the coadded image is suppressed.

8 A Matched Filter on coadded, PSF homogenized exposures

Going back to the step in the matched filter derivation before we assume uniform noise, and substituting in the noise from above, we get:

ltxpng/PSFhomogenization_0033.png

Note that, for an ν for which H(ν) < < H₀(ν), that term is likely to dominate the denominator, reducing the weight for the frequency. For coadded homogenized images, the image with the worst seeing determines which frequencies can be used for detection.

If we make our target PSF such that it is wider than the input images, this will not be a problem. Instead, the numerator will drop high frequency components to small values.

9 A Matched Filter on naively coadded exposures

Traditionally, astronomers first align and add multiple exposures, and then apply a matched filter. Because the interpolation can be done such that the noise remains flat over the frequency range required, the simple use of the measured PSF as the matching filter on such an image is justified.

This approach may, however, may not be taking maximum advantage of the available data.

10 The Wiener Filter

Using Parseval's theorem (see Bracewell, The Fourier Transform and its Applications, p112):

ltxpng/PSFhomogenization_0034.png

and the addition theorem (see Bracewell, The Fourier Transform and its Applications, p104):

ltxpng/PSFhomogenization_0035.png

we know we can do the minimization in Fourier space:

ltxpng/PSFhomogenization_0036.png

where the convolution by the filter can be expressed more conveniently:

ltxpng/PSFhomogenization_0011.png

We can now minimize:

ltxpng/PSFhomogenization_0037.png

We then need to differentiate the above equation with respect to M, and set the derivative to 0 to find the minimum. If we assume the signal and noise are not correlated, this ultimately reduces to

ltxpng/PSFhomogenization_0038.png

If we are looking at low signal to noise objects,

ltxpng/PSFhomogenization_0039.png

ltxpng/PSFhomogenization_0040.png

If the noise is uniform,

ltxpng/PSFhomogenization_0041.png

and if the signal is a uniform distribution of point sources,

ltxpng/PSFhomogenization_0042.png

11 Detection limits expressed in terms of sky noise

After convolution by a matching filter:

ltxpng/PSFhomogenization_0043.png

the variance of uniform noisy sky will be

ltxpng/PSFhomogenization_0044.png

so the noise in the sky drops by a factor of the uniformly weighted mean of the squares of the filter values.

The peak of an object that matches the filter can be thought of as dropping weighted mean of these values, where filtering pixels with larger values get weighted more heavily; the peak drops by less than the noise, making the same object appear more standard deviations away from the noise.

The specification for detection depth as a function of "sky sigma" is therfore meaningless for comparing matching filters. Detection at a given sigma threshold is possible for any matching filter, but different filters will significantly effect the physical flux that corresponds to.

What a specification of depths at a given sky sigma does restrict is the number of false detections.