Astronomical Statistics


General References

Because Astronomy is an observational, not experimental, science, our conditions are not as well controlled as they are for many physics experiments. In particular, astronomical measurements often deal with large numbers of upper and lower limits. The statistical tools we use are often more obscure (to scientists, if not to statisticians) than those used by physicists (see the Statistical Consulting Center for Astronomy for examples of the types of statistical questions astronomers run up against).

Knowledge of statistics is absolutely essential for an observational astronomer. It is rare when one obtains sufficient signal-to-noise that one can dispense with error estimates. Telescope time is sufficiently oversubscribed that one only has time to obtain marginal detections of sources, or one is often working at the very limits of detectability. We detect two things: signal and noise: you need to know how to separate them.

The most important thing you need to know is the question you are asking. A statistic is just a number; if you apply the wrong statistical technique, or apply an inappropriate statistical technique, or your model is wrong, then your answer is meaningless.

Remember that you cannot use statistics to prove that a model is correct; you can only use statistics to reject hypotheses. If you test the wrong model, even if you cannot reject it, then all bets are off. Increasing the number of trials improves estimates of means and variances only up to the limit of your systematic errors. Nonstatistical fluctuations do exist. Tails of distributions are not necessarily representative of the true errors. Finally, it is acceptable to thow away discrepant data. Chaunevet's criterion says that if you expect less than half an event, you can disregard any data in that bin. But, before you do, ask yourself whether your model may not be incorrect, or whether the distribution may be non-Gaussian.

Important Concepts

I. The Basics

You will need to be acquainted with all the concepts in Bevington & Robinson (or the earlier version by Bevington). What follows is a synopsis of the most important concepts:

II. Propagation of Errors

Whenever you attempt to measure the true brightness S of an object, there is some background (sky, or instrumental background, or a combination) B. You actually observe S+B. You can usually measure, or at least estimate, what B is independently. The true brightness S of a source is given by S=(S+B)-B. Counting statistics apply to both the observed brightness (S+B) and to the observed background B. The variance is then <em>s<sup>2</sup></em> (assuming only counting errors), and the signal-to-noise ratio is given by <em>NR</em>. Note that this is always less than the SNR in the absence of background, which is the square root of the total number of counts.

III. Goodness of Fit Criteria and Confidence Contours

A statistic S is used to measure the goodness-of-fit of a model to some data. One minimizes S by adjusting model parameters to better fit the data. If one has N independent data points and m adjustable parameters in the model, one has <em>nu</em>=N-m degrees of freedom. In the case of data which are normally-distributed, one expects that an acceptable model will have a reduced chi-square, <em>reduced chi-square</em>= <em>chi-square</em>/(N-m), of unity. Note that this is only a rough rule of thumb: check Table C.4 in Bevington and Robinson. For small <em>nu</em>, <em>reduced chi-square</em>=1 gives a probability of about 30% of exceeding <em>reduced chi-square</em> by chance, but for <em>nu</em> large the probability approaches 50%.

Confidence Contours


We often measure confidence levels in terms of <em>s</em>, which refers to the width of the Gaussian probability distribution function. Consequently, you often hear of 3<em>s</em> results, or 1<em>s</em> upper limits. 68% of the area is at less than 1<em>s</em> from the mean, so a 1<em>s</em> deviation will occur by chance 32% of the time. 2<em>s</em> deviations occur with 5% probability, 3<em>s</em> deviations have a 0.3% probability P of occurring by chance, and 4<em>s</em> deviations have P=0.01%. A quoted 3<em>s</em> detection means that the probability is 0.997 that the detection is real, and only 0.003 that it is corresponds to a random noise fluctuation.

A 1<em>s</em> upper limit means that if you were to repeat the observation, 68% of the time the result would be less than that value, and 32% of the time it would be greater. This is the same as quoting a result with 68% confidence.

Note that, if you have a detector with one million detection elements, such as a 1000X1000 CCD, you expect 3000 3<em>s</em>, and 100 4<em>s</em> deviations just by chance. This is why, in imaging surveys, the threshold for source detection is generally higher than 5<em>s</em>. Often it is set to that level which will produce the expectation of at most one spurious source.

<em>chi-square</em>, or some other statistic S, can be used not only to find the best fit point in parameter space (where S is minimized), but also to find the region of parameter space which provides acceptable fits. You need to

You want to be able to reject unsuitable hypotheses, but not the correct hypothesis.

f(S) is the probability distribution of S. The probability of wrongful rejection is the integral of f(S) over some region dS. In general, we use <em>chi-square</em>. In so doing, we assume that the data are normally distributed with the variance equal to the mean, a mean deviation of 0, and the data points are independent of each other. In this case, the statistic S is distributed as <em>chi-square</em>. <em>alpha</em>, the level of significance of the test, is defined as <em>alpha</em>. In general, if <em>alpha</em>> 0.1, you should not reject the fit.

If you reject a hypothesis at <em>alpha</em>> 0.1, you have a 90% chance of not rejecting the correct hypothesis.

The Composite Hypothesis

If you have more than one adjustable parameter, you are testing a composite hypothesis. To reject the composite hypothesis, you must be able to reject the best fit with all the parameters adjusted. Assume that there are p adjustable parameters. You can find the minimum value of your statistic, Smin by adjusting all parameeters. Smin will be distributed as <em>chi-square</em> with N-p degrees of freedom. You can reject the model at some level of confidence <em>alpha</em> if Smin><em>chi-square</em>N-p(<em>alpha</em>).

Constructing Confidence Contours

The acceptable region in parameter space is that region wherein your statistic S is less than some threshhold value. Suppose you measure a quantity X from a sample xi with a mean x. then, with 1<em>s</em> confidence,

Given 2 parameters, the region x-<em>s</em><X<x+<em>s</em> and y-<em>s</em><Y<y+<em>s</em> encloses the 0.68X0.68, or 46% confidence region. Joint confidence regions are larger than single regions, even if all the parameters are independent.

To determine a confidence contour, follow this procedure: Measure a statistic STrue for p (unknown) parameters. STrue is distributed as <em>chi-square</em> with N degrees of freedom. You can minimize S=Smin to find the best fit parameters.
<em>Delta</em>S<em>=</em>STrue-Smin is distributed as <em>chi-square</em>p. This follows because <em>chi-square</em>N-p-<em>chi-square</em>p is approximately distributed as <em>chi-square</em>N.
Note that the <em>Delta</em>S statistic is independent of the Smin statistic.

Because <em>Delta</em>S is distributed as <em>chi-square</em>N, one can determine the confidence of <em>Delta</em>S as <em>del-S confidence</em>, where f is the probability density of <em>chi-square</em>.
For some chosen confidence contour, you determine a limiting area SL=Smin +T

Margon, Lampton, & Bowyer (1976) provide a table giving values of T as a function of parameters p for various values of significance <em>alpha</em> and confidence = 1-<em>alpha</em>.

<em>alpha</em> Confidence p=1 p=2 p=3 p=4
0.80 0.20 (0.25<em>s</em>) 0.06 0.45 1.00 1.65
0.32 0.68 (1<em>s</em>) 1.00 2.3 3.5 4.7
0.10 0.90 (1.6<em>s</em>) 2.71 4.61 6.25 7.78
0.01 0.99 (2.6<em>s</em>) 6.63 9.21 11.3 13.3

Note that the rule of thumb - to add 1 to <em>chi-square</em> - is appropriate when p=1 for a 1<em>s</em> level of confidence.

IV. Maximum Likelihood and the C statistic


From the point of view of an observational astronomer, the main limitations of <em>chi-square</em>, and Gaussian statistics are that One can get around these problems by using Poisson statistics and a maximum likelihood analysis.

Maximum likelihood is discussed briefly in Bevington & Robinson. For a given set of outcomes, the probability of that outcome is the product of the probabilities of each individual outcome, and the most likely set of outcomes is that which maximizes the product of the individual probabilities. For example, suppose you have a Gaussian parent distribution with mean <em>u</em> and variance <em>s</em>. We do not know the mean, but can only estimate it (<em>u</em>') from the data. The gaussian probability distribution Pi =

V. Smoothing Data

Real data are often noisy. Real data are often oversampled. In either case, it is sometimes advisable to smooth the data before analysis. At the very least, smoothing the data gives you a better idea of what is really in the data.

Astronomical data are usually binned rather than continuous, that is, each event can be placed in a discrete bin. For example, a photon detected by a CCD detector falls in a particular pixel on the detector, with integral values of X and Y coordinates, or a flux is measured during a particular readout of the detector, which falls within a discrete time interval.

There are a number of ways to smooth the data:

VI. How to Fit Data

In general, we use the technique of least-squares or chi-square minimization. This issue is covered in some detail by Bevington and Robinson (chapters 6-8).

Since there won't be a lot of time available in this course for developing fancy software, feel free to use existing code to fit data. But you should understand how the code works, and be sure that the results are reasonable.

In IDL, the following routines are useful:

VII. Issues in Regression Analysis


In linear regreession analysis, we try to determine the best linear relation between Y and X, Y(X). There are at least 5 different ways to do ordinary least squares (OLS) linear regression, and so long as the linear correlation coefficient |R| is less than unity, they generally give different answers. In fact, the only times these 5 methods give identical answers for Y(X) if |R|=1, or if |R| ne 1 and the variances in X and Y are identical and the slope of the relation is unity.

The methods are:

Reminder: in general, at least in astronomy, there are no perfect measurements. OLS(Y|X) is generally the best choice if Note that, even if one or more of these assumptions are violated, OLS(Y|X) can still be used to predict Y(X). Isobe et al. ran a series of Monte Carlo simulations on data consisting of N points, with uncertainties in both X and Y, and found the following:

Feigelson & Babu (1992) discuss proper treatment of least-squares regression in the presence of known errors.


Data are truncated if there is an artificial upper or lower limit imposed. An example is a magnitude-limited survey. Results can be biased

Malmquist Bias

Malmquist bias (ref) occurs when the truccation is a function of the independent variable.

Censored data

Data that includes either upper or lower limits is called censored data. It is almost never appropriate to ignore limits and only fit the detected points - the only exception would be in the case of meaningless upper limits that lie far above all the other data points. Dealing with censored data is the subject of the next section.

VIII. Censored Data and Survival Analysis


Return to top

Return to PHY 445/515 Astronomy page