Joint Astronomy Centre
Show document only
JAC Home
JCMT
UKIRT
Contact info
JAC Divisions
OMP
Outreach
Seminars
Staff-only Wiki
Weather
Web Cameras
____________________

JCMT home
Observing at JCMT
OMP Observation Manager
Telescope
Spectral Line Observing
Continuum Observing
Schedule
Data Archive
Future Developments
Legacy Surveys
Newsletter & Publications
A RDBMS for the JCMT

Introduction

The Director of the JCMT proposed in 1993 that the responsibilities for a JCMT data archive be moved to Hawaii and funds were set aside for a new archiving system. Through discussions in Hawaii this project evolved into the use of a commercial Relational Database Management System (RDBMS) for the archiving of both astronomical and also engineering data of the JCMT. The necessary software and hardware has now been acquired and implementation of the RDBMS is in progress. In this article I want to give a brief outline of the project, as well as the changes that users can expect over the course of the next year in terms of the operations at the telescope and the archiving of their data.

But first I should perhaps say a little bit about the background of the project. Given that the JAC is moving towards UNIX this immediately presented the question what to do with the current VMS-based system ARCQUERY. Based on my experience with a commercial database system at the Owens Valley MM-array, I strongly suggested we pursue this route rather than trying to do a rewrite of ARCQUERY. Given the hundreds, if not thousands, of person-years invested in the commercial packages, there is no chance that any effort the JCMT can come up with is going to produce something that can compete. Just like high-level programming languages, commercial RDBMS's offer a good isolation from the underlying hardware i.e. an almost complete transparency regarding the actual location of the database on the network and the actual method of storing the information on disk.

In addition, the commercial packages have moved beyond a mere archival functionality and are dealing with the on-line and real-time management of information such as world-wide ATM transactions. Just recording a transaction is not enough: the RDBMS has to check authorization, availability of funds, guard against simultaneous access and make the updated information instantly available. Moreover, the system has to survive crashes, ATM malfunctions, as well as losing communication links (next time you open an account ask the bank whether they first update your account balance or first pay you the money: both options are in use). Coming from such a money-sensitive background RDBMSs have evolved to be extremely robust and failure-resistant. Also, in general there is multi-platform support and a sophisticated user interface dealing both with simple character based VT100 terminals as well as the latest X displays.

Aside from managing an astronomical archive, a RDBMS is an extremely powerful tool to also manage all the configuration and engineering data associated with the telescope. It is in this use of a RDBMS, rather than the archive, that I see substantial advantages it might bring both for the JCMT staff as well as the users. We have dubbed such a system a `Telescope Management System' (TMS; for lack of a better term) as opposed to the `Archive' which becomes only one part of it. Below I will explain each of these in more detail, as well as the overall proposal of what I think the actual implementation will look like.

After an evaluation of a number of commercial RDBMSs, the JCMT has purchased SYBASE from Sybase, Inc. Not only was this RDBMS technically and financially the most attractive, it is the one in use by e.g. the Canadian Astronomical Data Centre and Owens Valley, and the one on which I have hands-on experience.

Overall scheme

The over-riding philosophy behind the plans is to put the local effort where it uniquely benefits the JCMT observing and operations, and not where it re-invents wheels. Not surprisingly, this also coincides with my own interests. As a consequence, we will try to adopt an existing design for the Archive and put all the effort in the Telescope Management System. Specifically, discussions are going on with Dennis Crabtree of the Canadian Astronomical Data Centre (CADC) to utilize their extensive resources for the long-term, on-line archiving of JCMT data and to adopt the STARCAT utility for use at the JCMT. STARCAT is a sophisticated tool for accessing, browsing, previewing, selecting, and manipulating astronomical data, much beyond what we would be able to provide through local efforts alone.

Conceptually the JCMT Archive consists of two parts: a `catalogue' and the `data'. For simplicity, the `catalogue' can be thought of as all the header information associated with the `data', such as source names, positions and telescope configuration.

Figure 1. Figure 1 schematically shows the global design idea as it forms the basis for current developments. This design incorporates input from various discussions with CADC, but there is no official agreement and any real involvement of CADC will be decided upon at the time the local development has progressed to the point where such an involvement becomes relevant. The design consist of three or four components:

Ø the Telescope Management System: this is a RDBMS which runs on a SUN Sparc computer at the summit. Its design is optimized for interacting with the real-time operating system of the telescope and, in addition to astronomical data items, it may archive information which is relevant for hardware monitoring and engineering purposes. A subset of its information will be forwarded automatically to the Archive at the JAC.

Ø the Archive: this RDBMS runs on a SUN Sparc at the JAC and in design will be compatible with the CADC Archive. It will hold all astronomical data on-line during the proprietory period. Besides STARCAT, tools will be provided to extract data into GSD, HDS, or FITS files for authorized users.

Ø CADC: this RDBMS provides the long-term on-line storage. CADC has the hardware resources to provide such service. The idea is that the `catalogue' will be forwarded from the JCMT Archive e.g. every night and the `data' once the proprietory period has expired.

Ø ROE: this is an optional RDBMS which exactly mirrors the JCMT Archive and to which data is automatically forwarded. The necessity for a RDBMS locally in the UK strongly depends on the reliability and the development of bandwidth of the Internet. In principle local RDBMSs could be located at any place deemed necessary, but ROE is an obvious candidate.

The Telescope Management System

In general astronomical observations produce a collection of observation files (e.g. FITS, GSD) which store the actual data but also the information about the hardware configuration of the telescope and e.g. weather conditions. Whereas such a setup is quite acceptable for astronomers, who in general only marginally use all the configuration information, it is highly inconvenient for the scientific staff and engineers who are responsible for the performance of the telescope. The simple task of checking e.g. a receiver temperature over several months requires information to be extracted from a huge number of individual files. Moreover, events at the telescope do not necessarily happen in step with `10-min' observations: the frontend or backend configuration may only change a few times during the night, or some atmospheric conditions one might want to sample on a few seconds time scale.

The TMS differs in setup from such `traditional' astronomical archive in that it can deal with a hierarchy of the different levels of `header' information. The first step is to recognize these levels. This is not too difficult if one realizes that configuration changes at the telescope occur typically for the major hardware components as a whole: the parameters for the secondary mirror change because it is chopped in a different mode, or, the frontend changes because the astronomer changes lines. Hence, the TMS typically has a table for each of the hardware components, so that a change of the telescope configuration may typically result in a new entry in one table only, with the other tables remaining unchanged. In addition the TMS has tables to log the information associated with each scan. Clearly, it is important in this scheme to keep track of which lines of the tables go together to form an observation. Fortunately, this is exactly what the 'relational' of a 'R'DBMS is about. Also, the whole design is something which can be made transparent to the user, who typically will see a single, large table with an entry for each scan.

Figure 2. Figure 2 shows the tables as have been implemented currently to handle DAS data. The tables belong to three categories. The first set (colored gray) stores information associated with each scan, the second set (the wagon wheel on the right) stores overall setup and the telescope configuration information, and the third along the top keeps information administrative information (such as PATT numbers, project titles etc.).

wea#	utstart		utstop		tamb	pressure	humidity

1 5/5/94 3:15am 5/5/94 3:41am 6.07 627.4 26

2 5/5/94 3:41am 5/5/94 4:03am 5.60 627.5 29

3 5/5/94 4:03am 5/5/94 4:43am 4.04 627.9 31

4 5/5/94 4:43am 5/5/94 8:38am 2.96 627.9 32

5 5/5/94 8:38am 5/5/94 10:17am 1.69 628.7 31

6 5/5/94 10:17am null 2.28 628.2 29

Table 1.

The centre of all this is formed by the INH table (INtegration Header), which for each scan stores scan-specific header information and indices to entries in many of the other tables. For instance, it uses the index wea# to point to the WEAther table which looks like Table 1.

An index wea# = 3 in the INH table points to an unique entry in the WEA table, and similar for most other tables. For redundancy, the same unique connection can be made using the UT timestamps within each table.

Not to worry, all this complexity is hidden from the normal user, but the above concepts are needed to understand the benefits of the TMS. First of all, it is an efficient method of storing information: the above period of about eight hours during which about 25 DAS scans were taken, resulted in only 6 unique entries in the weather table, because the DB was instructed only to log weather changes exceeding certain limits. Similar, only 4 unique entries appeared in the backend configuration table. Each table changes with a frequency which corresponds with the actual events associated with its real-world component, rather than the number of scan observed.

Secondly, it is a modular setup. Currently all the entries are constructed by reading the DAS GSD files. However, in the future components like the weather, tau, and seeing monitors may write directly to their associated tables in the database. The telescope STORAGE task won't have to deal with those items making it more efficient and hopefully faster. This strategy could be extended to all components that are setup prior the observation and do not change during the observation: e.g. the smu, frontend and backend configurations, focus and pointing parameters. Rather than with the data, the changes can be logged upon setup.

Thirdly, the modular design promotes the adaptability of the database. New tables can quite easily be added to the structure, a situation which may be relevant when e.g. new receivers are added or specific hardware problems need to be tracked down. Existing tables can be modified (columns added or dropped) without having to touch unrelated information.

From this explanation it should be clear that the TMS is designed with the operation of the telescope in mind, rather than the astronomical observations. This is exactly opposite to what is currently happening with archiving to GSD files. It is the hope that the availability of a TMS will significantly improve the performance of and knowledge about the telescope and its equipment. For instance, a tool may be provided to both the observers and TO to check the receiver temperature, since a simply (SQL) statement like:

select ut, trx from TMS where

frontend = `RXC2'

and obsfreq > 489 and obsfreq < 491

and ut > `6/1/1994'

will show the receiver temperature of C2 within the stated frequency range as a function of time since June 1994. And this information will be delivered within seconds.

Having prompt information like this at your fingertips during observing can be very helpful in judging observations and problems. Obviously, for standard applications like the one above the users will have utilities available and won't have to issue the raw SQL statements. By the way, it will be possible for information from the database to be automatically extracted at the end of each scan and written to a GSD file, so that the end result for the observer will look the same whether the STORAGE task or the database is used. However, we imagine that observers will start to find it more convenient to access the data directly from the database, with all the facilities this will provide.

The JCMT Archive

The JCMT Archive will look much more like a traditional observation file. Selected items from the TMS will automatically be combined and forwarded to the JAC as a FITS table. FITS tables are the file format that CADC has adopted and differ from regular FITS files in that they store wavelength-Tk pairs (DAS data) rather than Tk values only. Moreover, multiple sets can exist within a file. In this they very much resemble GSD files and a one-to-one mapping exists between the two file formats. Regular FITS files are not suitable because it is not possible to store the overlapping sections of a DAS spectrum in a single file. In order to go from the FITS tables to FITS files the for most DAS users familiar SPECX DAS-MERGE (and CONCAT) function has to be performed. Although at least for 94B we expect to still keep the GSD files alongside of the DB, I intend to provide tools which flexibly allow data to be extracted from the DB in a number of formats: FITS tables, GSD, and eventually FITS files (which thus implies a DB CONCAT and DAS-MERGE). Also, in collaboration with Rachael Padman we may provide for SPECX to directly read from the DB.

SCUBA data will come in the form of a set of gridded maps for which the FITS standard is much better defined. Likely, these will essentially be stored as regular FITS images and be made available as HDS or FITS files. The data volume of the SCUBA images may be too large to store on-line for timescales longer than the proprietory period. If this is the case, CADC will keep a `preview' version of the image on-line, which has been reduced in volume by a factor of 10-30 through a process of lossy compression. Having inspected the preview image, a request for the actual data will have to be filed and the images will be made available likely through Internet. The details of this a couple of years in the future still, but this is the outline.

Aside from the observations information like dated efficiencies and beam maps will also be made available through the archive.

Irrespective of how and in what format the data are going to be stored in the JCMT Archive, we intend that STARCAT will be available to browse the `catalogue' part and inspect the `data'. On X compatible displays STARCAT uses generally available X tools (Xmosaic, xv, saoimage) to access the archives. Although it was not set up to be very user friendly, during a demo Dennis managed to on-line `baseline' and `bin' IUE spectra in the archive using these general tools only. Hence, it may very well be that users will be able to concat/merge and baseline the spectra before extracting them from the JCMT Archive.

Similar, CADC has pipeline processing (using IRAF) set up on some of their images to do mosaicing and flat-fielding on-line. Among others, STARCAT currently gives access to HST, IRAS, IUE, and CFHT data, as well as a number of `standard' catalogues. STARCAT also has an European counterpart at ESO (ST-ECF) where it is being used and developed for data from the ESO telescopes. In the near future probably all archives will be transparently accessible through either site. For more information on STARCAT, look at its `xmosaic' page at

`http://cadcwww.dao.nrc.ca/CADC-services.html'.

The Catalogue

The `catalogue' part of the JCMT Archive essentially will be the header information associated with the actual astronomical data. In principle this information could become public immediately following the observation, even while the actual data is still proprietory for a year. This raises a concern for those who would like to keep e.g. sources, positions, and maybe observing strategies secret. Obviously a mechanism will be put in place to prevent anybody, but the PATT project members to access the astronomical `data' during the proprietory period. This could be extended to most of the `catalogue' during the same time, but such policy runs counter to the whole philosophy behind the DB effort. Current plans are to make the catalogue publicly available immediately, but to round the positional information to e.g. the nearest degree during the proprietory period.

Database Tools

SYBASE has a Client-Server structure. What this means is that a continuously running Server process (a `daemon') handles requests from user initiated Client processes. Clients connect to the Server and communicate with it essentially using SQL statements, like the examples above, although in general the SQL part will be hidden from the user. The Server will take care of all the problems associated with accessing the data in a multi-user and distributed environment.

One advantage of this setup is that Clients can be run locally anywhere on the Internet but connect transparently to the Server e.g. at the JAC or CADC. The JCMT has purchased software to be able to build Clients for Suns, VMS hosts, and Dec OS/F hosts, which can be installed at the users institute (which involves a simple copy since they will be statically linked executables). For Sun's we have additional software which enables us to build Forms-based X tools.

Conclusion

I hope that this discussion of the JCMT TMS and Archive has given some idea about the direction the project is taking. Out of necessity I have only touched on many areas and not been able to fully discuss all the issues involved. If you have questions or concerns, feel free to contact me.

Remo Tilanus, JAC

(rpt@jach.hawaii.edu)

Contact: Antonio Chrysostomou. Updated: Tue Aug 17 17:32:12 HST 2004

Return to top ^