PHY 515: Archival Data Analysis Project
Goals of this project
- To formulate a problem that can be solved using archival data
- To learn how to access the NASA archives
- To learn how to analyze astrometric, spectroscopic and/or photometric data
- To understand FITS data format
- To apply multi-wavelength observations to attack a problem of your choosing
The grade will be based on
- Originality of project (10%)
- Downloading and use of multiwavelength data sets (30%)
- Data analysis (20%)
- Error analysis (20%)
- Writeup (20%)
The primary role of a scientist is not to solve problems, but to ask questions
that can be solved. A good scientist then goes out and solves those
problems. The purpose of this lab is to permit the student who has some
astronomical background to select (or perhaps to create) a problem of his or
her own cloosing, and then to attack the problem with existing archival data.
There are few guidelines, hence this exercise is inappropriate for students
who have no astronomical background. The research need not be original - there
are few worthwhile, truly new projects that can be done in 3 weeks. Rather,
select some known target and undertake a basic investigation.
The first task is to identify a problem which can be solved by using data in
the NASA archives. There are no limitations to the scope of the problem, except
that you must be able to explain succinctly the problem, and how you will
solve it, and then you must do so within about 3 weeks. You can either
undertake a comprehensive analysis of a single data set, or a less detailed
multi-wavelength investigation involving 2 or more data sets.
You may analyze images, spectra, or photometry. Your analysis may be
astrometric, spectroscopic, or photometric in character. You may undertake
temporal variability studies, determinations of the properties of a class of
objects, or comparisons between objects.
Be aware that different missions archive data in different stages of
readiness. Also, your instructor and TA may be vell versed in handling some
types of data, and completely ignorant of others. You are facing exactly the
same problems that any professional astronomer faces in dealing with a
new kind of data set.
You should read the page on astronomical
instruments. This will give you some idea of what types of data are
available in the archive, and may help you decide what to investigate.
- Compare the coronal and chromospheric spectra of an active star and an
inactive star. Use ROSAT PSPC and IUE data.
- Determine the power-law slope of the Crab nebula spectrum. Use X-ray,
UV, and IR photometry.
- Determine the temperature of a star from its spectral energy distribution.
- Examine the temporal variability of a cataclysmic variable in X-rays and
- Measure the radius and surface brightness distribution of a galaxy.
- Determine the redshift of a sample of AGNs (IUE/HST data).
- Compare the UV spectra of a sample of QSOs, Seyfert I, and Seyfert II
Spend a few days
examining the available data, and crafting your problem. Then write a
one page proposal to your instructor explaining the problem and how you
will use archival data to address it. The proposal must outline the method
you will use to solve the problem, and must indicate what data you will be
using. The purpose of the proposal is to insure that the problem is tractable.
Because of the limited amount of time available to do the lab, you need to
hand in your proposal no later than the second lab period. You should be
thinking about a project before the lab begins, and then spend the first
lab session examining the available data.
Your second task is then to download the data. After this, analyze what you
need to and then write up your results in the format of an ApJ Letter
(9 or fewer double-spaced typed pages, plus figures). Easy, eh?
There are three major archives sites you should check out.
This is not meant to be an inclusive list of data sets, but represents some of
the data which you may find most useful. Do not limit yourself to these
datasets, but explore the archives if these data do not include what you want.
The International Ultraviolet Explorer
A brief description of the satellite is here.
The IUE database a large and uniform spectroscopic archive.
If you plan to work with IUE data, read this for
Read the ROSAT writeup in the
Astronomical Instrumentation page.
ROSAT data file names are of the format
RxYYYYYY_ext.FITS, where x indicates
the instrument (H for HRI, P for PSPC, F for PSPC with the Boron filter, which
cuts out soft photons below the carbon edge at 0.28 keV) and ext
indicates the type of file. YYYYYY is the observation number; if the
observation was made in many pieces, and the pieces were processed separately,
the pieces will be indicated by a further descriptor of the form a00 or
n01. All files are in FITS format;
some have extensions.
The main file types are:
- *_BAS.FITS This file contains all the good photons. You can
construct images, light curves, and spectra using this file alone.
The good time intervals are stored in the first FITS extension. The
units are in seconds of spacecraft time. The time of the start of the
observation in seconds (as well as in UT) is in the header.
The second FITS extension is a binary table containing a list of
photon arrival times, positions, both in detector coordinates and in
RA/DEC, and pulse height information. PH is the raw pulse height; PI
(pulse invariant) is corrected for various detector effects, and should
be used for most analysis projects.
- *_IMx.FITS These are the image files. The HRI observations have a
single image file (IM1); PSPC observations have images in 3 broad
bands, IM1, IM2, IM3. The units are counts. The images are 512x512; pixel
sizes are in the header.
- *_BKx.FITS These are the background files. It is a 512x512 image
in units of counts, giving the expected detector background. They
correspond to the *_IMx.FITS files.
- *_MEX.FITS This is the exposure map. The units are seconds. This
accounts for all vignetting by the telescope and the window support
structures. HRI observations do not have an exposure map.
- The image files (*_IMx.FITS) are in units of counts. You can convert this
to a count rate by dividing by the exposure map (units of seconds) in the
*_MEX.fits file. Note that the round field of view is mapped into a square
array, with exposure time set to zero outside the field of view of the
detector. Near the edge of the field, vignetting (shadowing of the
mirror) greatly decreases the effective exposure time. If you simply
divide the image by the exposure map, you can introduce apparent spurious
sources near the edges of the detector, including the window support ribs,
because dividing a small number of counts by a vary small exposure time
can yield a large apparent count rate. You will also waste time
dividing by zero (which generates error messages).
To avoid edge effects and division by zero, reset all the exposure values
less than some value to a very large value. Use code like the following:
tmax=max(mex) ;mex is the exposure array.
;tmax is the maximum exposure time
k=where(mex lt 0.1*tmax,nk) ;identify points where the exposure time is
;less than 10% of the maximum.
if nk gt 1 then mex(k)=tmax*1000. ;set low exposures to 1000 times tmax
cr_image=image/mex ;create count-rate image
- The energies of the PSPC photons are encoded in the PI field of the
_BAS.FITS file. PI ranges from 0 to about 256 in units of 10 eV.
Values less than 11 or greater than 235 are usually discarded.
The PSPC images (IM1, IM2, IM3) represent the three broad energy bands.
Band 1 is soft, band 2 is hard, and band 3 (total) sums bands 1 and 2.
The exact PI ranges are given in the header.
- The HRI has no intrinsic energy resolution. HRI photon energies are
set purely by the mirror response, which peaks near 1.2 keV. To first
order, you may assume that all photons are 1.2 keV photons. The full
energy sensitivity of the HRI runs from about 0.1 to 2.2 keV.
- Background images (*_BKx.FITS) are
most useful if you are studying diffuse
or extended structures; if you are measuring point sources you may
ignore the background files (but don't forget to subtract background if
you are measuring fluxes).
- If your photon arrival times all seem to be quantized to the nearest
second, print them in another format. The times stored are in seconds
since 1/1/1970; this is a large number, and IDL's default format
for printing double precision numbers only gets the integral part.
Try subtracting the time of the first photon from all arrival times - that
makes the numbers more tractable.
Rossi X-Ray Timing Explorer
Read the RXTE
Getting Started Guide. There is also an
Analysis Cookbook available. Both these documents are maintained at the
Processing and Analysis section of the
Observer Facility (GOF).
To download data from the archives, follow
directions for W3browse.
Unfortunately, RXTE data as archived are very complex. There is no standard
data product (although there are plans to make one), excemt for the All Sky
Monitor. It is strongly recommended that you NOT attempt to analyse RXTE data
until the standarad data products are available, but rather work with EINSTEIN,
ROSAT, or ASCA X-ray data.
The Hubble Space Telescope
HST data are archived at the
Multimission Archive at
Space Telescope Science Institute (MAST).
If you plan to download HST data, read this.
You will need a password. See your instructor or the TA for the account
If you cannot find what you are looking for at one of these sites, it is
probably not available. Note that most data taken at ground-based telescopes
(KPNO, CTIO, Keck)
are not formally archived.
Solar data are
observations from the VLA are, but if you want to work with these data you
are on your own.
Data may be downloaded in a variety of ways. The archival sites listed above
will give you various options. The most common are:
- Direct download over the web. This is the easiest method, and works well
for pictures. It is not generally used for scientific data sets.
anonymous ftp. The data are placed on a disk at the archival site, and
you copy the files directly.
- Autonomous upload. You provide the archive site with your machine's
internet address, and your username and password. The data will be sent
to you by ftp. This is currently not available on the lab's PCs.
Recent datasets (especially Hubble Space Telescope data) can be large, and
can take some time to transfer. Transfers often go faster in the morning
(before the west coast wakes up) or after 5PM.
Most of the archival data is stored in
format. These data are easily readable in either IDL or IRAF.
The data are often stored in compressed mode to save disk space and
decrease downloading time. Compressed data have filenames ending in
.Z (UNIX compression) or .gz (gzip compression).
The WinZip utility on the PC can be used to
Once you have your data in FITS format, you can
it using IDL. Then, analyze your data. A short
tutorial on how to do basic tasks in IDL is here.
"Our only product is paper" (J.L. Linsky)
is a somewhat outmoded, but still apt, comment on science.
You are to write up your results in a the format of an Astrophysical
Journal paper. A standard layout is:
Be sure to include references to the literature, and figures and tables
as appropriate. If you write IDL procedures (or other code) for the data
analysis, please include the code as an appendix.
- Abstract, wherein you provide a short summary of the paper
- Introduction, wherein you lay out the problem and describe past work.
- Data Description, wherein you state how you got the data, and how you
reduced and analyzed it. You should provide sufficient detail that a
fellow student can reproduce your results.
- Discussion, wherein you describe the results of the analysis, and discuss
any relevance to earlier data. This may include your conclusions
- Summary, wherein you summarize your conclusions and reiterate the
importance or scientific impact of this work.
If we get LaTeX running on the PCs, then you will be expected to write your
paper in AAS TEX format. Instructions will be forthcoming.
Return to top
Return to main PHY 445/515 Astronomy page