Astronomical Data Analysis and Software

Contents


I. Introduction

Astronomical data generally consist of either two-dimensional images (or 3-dimensional image cubes containing many images), or one-dimensional spectra. Spectra may either be temporal (light curves) or in wavelength or frequency. The basic astronomical measurements are position (astrometry) and flux (spectroscopy or photometry). Flux may be measured as a function of position, wavelength or time.

Reduction is the process of turning raw data into a calibrated product; anything after that is data analysis. Data you obtain from the Mt. Stony Brook Observatory must be reduced; data you download from the archivess is at least partially reduced and calibrated.

The data analysis is the heart of these labs. You need to decide what measurements are needed, and then how to interpret those measurements. Because of the volume of data, it is generally practical to analyse astronomical data only with a computer. There are software packages which haave been written specifically for astronomical data analysis (see section IV, but in this lab we will avoid canned software, and ask you to do some rudimentary programming. For this, we use IDL, the Interactive Data Language. It greatly simplifies the tasks of data visualization, which is a major part of data analysis. Students who are more comfortable with other languages (e.g., FORTRAN, BASIC, C) can undertake analysis in those languages.

If you need a refresher on the LINUX operating system, please read this.


II. A primer on IDL

IDL is a powerful language, with many intrinsic capabilities. It can be run interactively, line by line, from the terminal, or you can create programs (called procedures) which you can then run. The syntax is very similar to FORTRAN, so a user familiar with FORTRAN should have little difficulty coming up to speed in IDL (however, it takes years to become expert). Unlike Fortran or C, IDL code need not be compiled and linked. You can stop a procedure anywhere in the code, and examine intermediate results, change values, or do whatever you might think of doing. The best advice is, as often, to read the manuals before you begin. The "Using IDL" manual is the basic user's guide. A much more useful introduction to IDL can be found in "IDL Programming Techniques", by David W. Fanning, which is available in the lab. There are a number of good web sites devoted to IDL programming. Among these are:

But before you go to any of the above-mentioned sites, read the rest of this primer first.

We have upgraded the lab computers to the Linux operating system. You need to check the following before you get started:

You can run IDL in one of two ways: on the command line (my preference) or in the IDLDE widget window.

To run IDL, either type idl at the prompt in a terminal window, or click on the IDL icon on the desktop (this starts idlde). In the dlse envronment, you enter commands at the bottom of the screen, on the command input line. Commands (and their results) are echoed in the upper output window. Read Chapter 4 of "Using IDL" for details about the IDL Windows interface. It is possible to change the layout. There is on-line help available.

Once you start IDL, you should check your default directory. One way to find where you are is to type
CD,CUR=CUR & PRINT,CUR. You should be in /home/student, unless you earlier moved to a subdirectory. If you are not, get there by typing
CD,'~'. The CD command in IDL works the same as it dows in UNIX, except that the argument must be a string, and the delimiter is a comma. i.e., the UNIX command cd mydir translates to cd,'mydir'. You can also set the default directory in the idlde window by clicking on File, then Preferences, then going to the Startup page. Change the Working Directory, and then apply the change (click on Apply and OK).

In addition to the intrinsic capabilities there exists a large and growing library of user-supplied routines. The most useful for our purposes is the Astronomy Users Library. These routines are in /home/student/idlprocedures/astrolib. A brief description of the contents is in c:/home/student/idlprocedures/astrolib/contents.txt.

For the most part, you will only need primitive IDL commands for displaying and measuring your data. The data files are generally in FITS format, which you can easily read in IDL by following these instructions. The data will then consist of a header H (a string array) and a data array D, which may be multi-dimensional.

Pointers on Using IDL

Pointers on IDL graphics under Windows

Most of the graphics/display software in C:\idl is written for single-plane, 256 color graphics. The PCs in the Phy515/445 lab are three-plane with about 16 million colors. If your plots come up black, or you can't get color tables to work, try DEVICE,DECOMPOSED=0. Feel free to experiment with colors.

Unpacking Dimensions

It is easy to extract M-dimensional data from N-dimensional data. For example, given a 2-dimensional array of spectra D (as you get from the IUE high dispersion data), you can extract the nth row with the command S=D(*,n), or the nth column with the command S=D(n,*). Note that D(n,*) results in a 1 x m array. You can use the transpose function to turn this into an m-element vector.

To extract columns 2 and 3 from an n x m array D, use the command S=D(2:3,*). This results in a 2 x m array.
To extract columns 1, 3, and 4 from an n x m array D and create a 3 x m array, S, first create a vector of indices, as k = [1,3,4]. Then the command S=D(k,*) extracts the 3 columns.

Extracting rows is done exactly the same way.

Plotting and Measuring Data

For one-dimensional data (e.g., spectra), use the PLOT command. Set the plot scale and range with SET_XY.

Two-dimensional data can be displayed using the TV. Note that the window displays byte values, and PLOT does not do any byte-scaling. Use TVSCL,data or TV,BYTSCL(data) to scale your data appropriately. Note that, if you have a few pixels that are very much brighter than the others, you may not see much. This is because the full range is compressed into about 256 intensity levels. Use the < operator to truncate the image (e.g., TVSCL,data<500 will plot all pixels with values >500 as 500. Used in this manner, the operator does not overwrite the data.).

Using the Cursor to Measure Positions

The basic way to measure a value on the screen is with the CURSOR command. Type CURSOR,X,Y. Click the left mouse button at the point of interest, and then PRINT,X,Y to get the values back. On a 2-dimensional image, X and Y will be the pixel number, and the value at that point is IMAGE(X,Y).

In order to use the CURSOR command for images, you need to set up the plot limits properly (there may be more elegant methods, but this works!).

s=size(image)                  ;this returns a vector containing the size
                               ;of the image. s(0) is the number of dimensions,
                               ;which should be 2 for an image array
sx=s(1)                        ;size of X dimension
sy=s(2)                        ;size of Y dimension
window,xsize=sx,ysize=sy       ;create window the size of the image
!p.position=[0,0,1,1]          ;plot boundaries are window boundaries - there
                               ; are no margins
set_xy,0,sx-1,0,sy-1           ;set plot limits
!x.style=1 & !y.style=1        ; force exact plot boundaries
plot,indgen(10)                ;this dummy plot will define the plot limits
tvscl,image                    ; display image
x=sx/2 & y=sy/2                ; initialize to center of plot
cursor,x,y                     ; use cursor command to get position
This is encoded in the procedure TVPLOT, which is in the idl/tvlib directory. Type TVPLOT,/HELP. After plotting an image using TVPLOT, you may use the procedure PIXEL, to call the cursor and extract positions and array values. Type PIXEL,/HELP for instructions.

If your Image is Bigger than the Screen

The TV and TVPLOT commands plot the entire image: if it is larger than the screen you will see only part of the image. You can use the IDL REBIN command to shrink the image. REBIN rebins by an integral number of pixels. The number of pixels in the original image must be an integral multiple of the number of pixels in the rebinned image.

Use the help or size command to determine the length of the axes. If they are odd, truncate the image. For example, if your image is a 1663x1663 image, truncate it to 1662x1662 with the following:

my_new_array=my_old_array(0:1661,0:1661)
Then rebin it.
my_small_array=rebin(my_new_array,831,831)
will produce an 831 x 831 array.

If you want to shrink the image by a factor of 4, you would have had to truncate the original image to 1660x1660, and then rebin to 415 pixels.

Rebinning Caveats:

Making Hard-Copy Plots

The default plot device on the PCs is called X. To send a plot to the printer, use the printer device. You do this as follows:
   set_plot,'printer'       ;reset plot device
   device,/landscape        ;sets landscape format; default is portrait
   plot,x,y                 ;plot your data
   ...                      ;do other things, like oplot, or xyout
   device,/close            ;close device and send plot to the printer
   set_plot,'X'           ;reset the plot device to the terminal

you can save the plot as a postscript file by using the device name ps. If you do, the set_plot,'ps' command opens a file called idl.ps, and the device,/close command closes the file. You can then print this file, save it to floppy, or FTP it somewhere. Rename it if you want to save it, though. You can use the device command to set the name of the file - see the documentation.

IDL Structures

Structures are essentially arrays of arrays within IDL that can be used to simplify your variables. Say, for example, you have arrays of magnitudes in a number of bands (such as U, B, and V) for a number of stars. You can store this as single arrays (called, for example, U, B, and V). Or you can define a structure called to link the arrays together. You will see a single variable (say, mags) which is made up of three arrays, mags.u, mags.v, and mags.v. You can also store other useful information, such as the star names, in the same structure.

Structures are analagous to two dimensional arrays, but you can mix variable types within a structure.

You would define and populate this magnitude structure as follows

   mag={magnitudes,id:'',u:0.,b:0.,v:0.}   ;define the structure
   mag=replicate(mag,nstars)               ; there are nstars stars
   mag.id=id             ;id is a string array of star names 
   mag.u=u               ;u is array of U magnitudes
   mag.b=b               ;b is array of B magnitudes
   mag.v=v               ;v is array of V magnitudes

It is easy to extract an array or a value from a structure. In this example, the V band array can be extracted into the array V by v=mags.v. To extract the V magnitude for star i, use v=mags(i).v.

Read the IDL manual for more information.

Problems

On occasion, a procedure will encounter an error and stop. You will know this has happened when you see an error message followed by a traceback like
% Attempt to subscript K with  is out of range.
% Execution halted at:  DBHELP            192
  /data/gremlin/fwalter/astrolib/pro/database/dbhelp.pro
%                       $MAIN$
When this happens, you can either try to fix the problem, or bail out. The latter is generally recommended unless you know what you are doing. To bail out, type RETALL. This will close the procedure and return to the main level.

If you attempt to exit IDL after getting an error in a procedure, a dialog box may pop up asking you if you wish to save the changes to the procedure. Answer NO.


III. Uncertainties

All measurements include some uncertainty (often called the error), and astronomical measurements are notorious for having large uncertainties. The problem is that astronomy is observational and not experimental. We do not set the parameters; we cannot run controlled experiments. Given the demand on telescope time, it is often not possible to repeat observations. To determine the uncertainties on a measurement, one must understand all possible sources of error.

To first order, we rely on Poisson (counting) statistics. Most modern astronomical detectors are digital. Each count coming out of the detector represents a certain number of photons. The ratio of counts to photons is called the detector gain. This is largely fixed by the detector and the electronics, but may depend on, for example, the spectrum of the input photons. The uncertainty on the signal is given not by the square root of the number of counts, but by the square root of the number of detected photons. Since most detections involve 2 or more pixels, errors must be summed in quadrature to estimate to net uncertainty.

All astronomical measurements take place against some background. This background may consist of a number of components. There may be instrumental background (e.g., read noise in a CCD) or sky background (the sky is not black). The background generally must be subtracted from the measurement. For example, if you measure the flux in an emission line, it is often superposed on a continuum, or the image of an object is superposed on a non-zero background. Since these backgrounds contribute counts, and they have an associated measurement uncertainty. You must estimate the errors for the background region independently, and then propagate errors when you subtract the background (refer here for details).

Always include the uncertainty when you state a result.

IV. Software Packages for Astronomy

There are good software packages for astronomy. They tend to do certain things very well, but they sometimes don't do exactly what you want them to do (if you want to do something right, do it yourself!). Nonetheless, they are invaluable assets for the research astronomer. Among the packages available are

V. Useful Web Sites

There are a number of web sites that provide useful information. Among these are: