The INAOE Science Group and Virtual Observatories
Abstract
The astronomical community is not prepared for the tidal wave of high quality observational data that is fast approaching.
This is because in the last few years, Observational Cosmology has matured into a dominant branch of Astrophysics along with launching of large telescopes and a growing interest in important observational projects, that are destined to understand the structure and evolution of the Universe. With the GTM, Mexico will be a key contributor in this field. Today, almost all important observational projects involve, among other things, collecting and compiling large amounts of data. The astronomical community is not prepared to handle these large groups, simply because they lack the management and analysis abilities to do so.
Our groups’ main objective is to supply infrastructure for interdisciplinary groups’ training; prepare and provide them with experience in computers, astrophysics, statistics, dynamics and visualization so they can take advantage of the flow of data (that is expected to be massive) of present and future experiments related to Cosmology; carried out from earth and from space.
The last objective is to develop a Virtual Observatory that will allow access not only to databases, but also to the infrastructure and programs necessary in carrying out related research projects.
Why Create a Science Group with Virtual Observatories?
In the last few years there has been a huge increase in the amount of data available for astronomical research, this is mostly due to many new opportunities
and also to technological advancements regarding sensors, storage devices and lastly, to growing automation of data acquisition and processing.
As a result of this, in the few next years, astronomers will be able to generate calibrated data faster than they will be able to process and analyze it in detail.
New extended data liberation like: the 2dF, SDSS, 2MASS, VIRMOS and DEEP2, are revolutionizing Astronomy by providing large amounts of high quality data, in addition to the previously existing data, of course. To these large amounts of optic data, -obtained with medium sized telescopes-, we must add vast databases that will be produced by the new generation of earth and space (large) telescopes in X-rays, optic, infrared and radius (mm).
These databases and their nature (wide wavelength coverage) will be a great challenge for unifying optical observations (infrared, millimetric and X-ray), obtained from Observatories in Space and Earth. Also included here are Radio Telescopes like the Great Millimetric Telescope (GTM).
This challenge is of utmost importance and is very complex; within the next five years, data available in databases will grow and reach unprecedented levels. Data will reach a level of tens of thousands of parameters for every 100 million astronomical objects, that is, TERADATASETS. This already complex situation will increase in difficulty due to measurement errors, deviations and data tendencies; however, the main difficulty will be the lack of standardization through different databases. Each database will have its own group of parameters and also its own access software; this will complicate cross-correlations between them.
New applications are necessary in order to achieve progress regarding constant data increase. These applications must create/ manage (intelligently and automatically) questions that will in turn visualize and analyze space and variations of large databases. This will be the Virtual Observatory’s (OV) main task, this is very important to countries like Mexico which lack access to major international observational facilities.
The INAOE is well equipped for playing an important role in this new field. The Astrophysics and Computing Departments are very similar (in high quality and in size) to other international Astronomical Departments. This in conjunction will allow rapid and efficient development of specific interdisciplinary groups such as the one presented here. The collaboration between the Astronomy Departments of Cambridge University and the University College, London provides access to many British and European databases (among other things).
Regarding education and training opportunities, the combinations, statistics, visualizations and scientific rigor, will result in a powerful synergy with broad applications in society’s tasks; not only in pure research. A broad range of applications will benefit from experts is this new field, from economy to banks to medicine. Our group will be well positioned and will distribute this knowledge to all of Mexican society.
A short term objective will be to serve as a foundation for a Virtual Observatory of Mexico (OVM). This will allow the national astronomical community access to data; it will also combine the entire necessary infrastructure for carrying out research projects at the same level as other Virtual Observatories do. Part of the plan is also to incorporate the OVM to the International Observatory Alliance (IVOA), its main objective will be to facilitate coordination and successfully complete the group’s objectives in as little time as possible.
Objectives
We have selected a learning plan that aims for ambitious results even though it starts with relatively modest objectives. We have done this while considering intermediate results in high quality publications and in postgraduate student’s training.
We are interested in developing and comparing diverse statistical methods that allow us to analyze multiple TERADATASETS.
The following are some of our main objectives:
- Developing an interface between the user and different databases (based on remote analysis).
- Define the parameters that characterize prominent features and divide databases into samples and sub samples that reach 1000 gigabytes each.
- Development of visualization tools.
- Classification (without supervision and determination) of the number of object types present in data.
- Search for unusual data, including the ability to detect new types of objects.
- Search for irregularities and correlations in data.
- Support for supervised analysis.
- Database development and analysis (resulting from numeric simulations), simultaneously and while using similar techniques. Particularly, analysis of results of new models of stellar population synthesis -in high spectral resolution-.
The results from this initial analysis will provide new databases for detailed follow-up projects. This will be very relevant to subjects of our interest such as: evolution of galaxies’ (with emission lines) properties, as well as potential uses for Universe evolution data and geometry data; also for stellar population studies (normal and active), and in measuring Helium abundance.
Steps to take
Observational Databases
Our initial plan is based on local analysis of public data and completely eliminates the need to transfer (groups of) tens of data gigabytes through a network that is often times overloaded. This is the real “back-up” in many areas of Computer Physics, memory usually requires almost all data in order to carry out the simplest of analysis. The software that is being developed will run in cheap systems that have large storage capacity . In order to begin this process, we’ve chosen the Sloane Digital Sky Surver (SDSS), which represents the most ambitious collection of spectroscopic data and images of the near- Universe, which will also include luminous and far objects.
At the same time we can start to explore techniques and methods to analyze sub samples of relatively small data (100,000 objects with approximately 100 parameters each). Once we achieve easy access and consultation we can apply the same techniques to bigger samples. Later on in the learning process (given our lack of experience in this particular area) we can begin to work in visualization and presentation.
The current available products for the SLOAN (SDSS) data include a catalogue which enables search and contains the detected objects and the images, parameters and spectral attributes associated with these. It also contains images in three colors in JPEG format, data images in FITS format and spectra in GIF as well as in FITS. The first release of SLOAN data (EDR) covers 462 square degrees of the sky. Our approach to the research problem will be to use the EDR of the SDSS for our first learning curve. The EDR includes over 55000 spectra of galaxies, quasars and stars.
A considerable amount of work has already been done by OLef Lahav and his collaborators at the IoA Cabridge (Olef and his group have moved to the University College London). They have analyzed and classified the group of spectra of galaxies 2dF using the Principal component analysis method (PCA) as well as other methods (see e.g. Lahav, 2001). The 2dF survey was later divided as to obtain groups of luminosity functions in accordance to spectral classes (Madgwick et. Al. 2001 and another in preparation). A first attempt at detection of unusual objects has already been made. Some unsupervised methods of classification have been applied by Olac Fuentes to astronomical samples.
We have planed a two point approach for the analysis of the databases of empirical data as well as theoretical.
- The objective approximation (“unsupervised”) where data “speaks for itself” (e.g. PCA).
- The supervised analysis (physically motivated) for instance: the analysis of indexes and intensity of the traditional lines such as H-alpha, [OIII], Mg2, etc.
We think this combination of methods is important to obtain correlations between physical parameters which would be of astrophysical relevance. This new correlations would constitute the basis for subsequent studies of the evolution of galaxies and environmental effects.
The results of the multi-varied study of DR1 will be used to select interesting sub samples of galaxies with emission lines to study, amongst other things:
- The ages of starburst galaxies (with sprouts of stellar formation) and HII
- Distribution of abundance of chemical elements and its evolution looking back in time.
- Their use as stimulators at a distance of red high slidings and determination of cosmological parameters.
The follow up studies will be carried out using big facilities. These studies consist of spectroscopic observations of high signal/noise ratio with multiple apertures of a moderate number of selected objects of the SDSS and 2dF surveys. Since these surveys are based on observations with fibers in 3-4meters telescope, the quality of the data will be substantially increased with aperture spectroscopy in telescopes of 8 meters.
Theoretical Databases
Another central aspect of our work is the analysis of databases of data produced by synthesis models of stellar populations. The collaboration of the INAOE with Padova has produced the first synthesis models of stellar populations with very high spectral resolution.
The analysis of these models places with a high priority in our immediate work plans and we will apply supervised and unsupervised techniques similar to those used in the database analysis of empirical data.
Address: Luis Enrique Erro # 1, Tonantzintla, Puebla, Mexico ZIP Code. 72840 Tel: (222) 266.31.00 Contact: difusion@inaoep.mx
This work is licensed under a Creative Commons License Attribution-Noncommercial-No Derivative Works 2.5 Mexico.