Broadly speaking, astronomical catalogues consist of lists of celestial bodies. They have been around for thousands of years: the earliest star catalogues known to this day were made in Ancient Babylon during the second millennium BC.  Until the catalogue of the Danish astronomer Tycho Brahe, created in the 16th century, these catalogues were based on observations made with the naked eye.  The invention of the very first telescope in 1608 in the Netherlands  changed the way we look at space forever, and allowed the development of more complete and detailed astronomical catalogues over the next few centuries. Today, rather than being engraved on clay or printed on paper, they tend to be generated as computer files, but they are still an important tool to study the universe.
We were given a collection of 19 catalogues as FITS files, each one corresponding to a specific filter. Each catalogue contains thousands of emissions detected across the GOODS-S field, and a number of properties associated with them, such as observed equivalent width or coordinates.
Figure 1: The GOODS fields. Source: Mauduit et al., 2012.
Our goal is to identify which ones of these emissions are Lyman-alpha emitters (LAEs), and which ones are not. However, before making such a selection, an intermediate step is needed. Indeed, not all emissions correspond to emission lines, and it is crucial to separate true line emitters from, on the one hand, noise, and, on the other hand, cosmic rays, image defects, and other artefacts. In order to perform such a selection, we use different “cuts”, as explained by Emily in a previous blog post. The first one, the Σ cut (Sigma cut), is used to identify with a high level of confidence the emissions that correspond to noise and that should be eliminated from our list of emitters. The second one, the EW cut (equivalent width cut), allows us to further refine our list of line emitters by excluding various artefacts. For example, if, for specific filter, we select a Σ cut of 3 and a EW cut of 100, it means that out of all the emissions detected by this filter, we will only consider the ones that have Σ > 3 and EW_obs > 100 (equivalent width observed > 100) to be “candidate line emitters”. Furthermore, to be selected as a candidate line emitter, an emission must have a positive spectroscopic redshift (specz > 0).
Using the software Topcat, we selected a few filters and we started experimenting by creating scatter plots and histograms with different Σ and EW cuts, to see how varying these values would affect the selected samples.
Figure 2: Scatter plot for IA427, Σ>2. Green dots are the candidate line emitters, while green and red dots together represent all emissions.
Figure 3: Count against spectroscopic redshift for IA427, Σ>2 and specz > 0.
Figure 4: Scatter plot for IA427, Σ>4. Green dots are the candidate line emitters, while green and red dots together represent all emissions.
Figure 5: Count against spectroscopic redshift, for IA427: Σ>4 and specz > 0.
Figure 6: Scatter plot for IA445. Red: Emissions before EW cut; green: emissions for EW_0 > 30; pink: emissions for EW_0 > 90.
It is also possible to combine both cuts.
Figure 7: Scatter plot for IA427, Σ>3. Green dots are the candidate line emitters, while green and red dots together represent all emissions.
Figure 8: Scatter plot for IA427, EW_obs > 50. Blue dots are the candidate line emitters, while blue and red dots together represent all emissions.
Figure 9: Scatter plot for IA427, Σ>3 and EW_obs > 50. Grey dots are the candidate line emitters, while grey and red dots together represent all emissions.
After this period of trials, we had become a bit more familiar with the data and the concepts we were working with. We decided to take a more quantitative approach in order to identify line emitters.
The histograms of selected emitters were expected to look as the following:
Figure 10: Typical histogram after selection of line emitters. Credit: David Sobral.
As illustrated on Fig. 10, the distribution of the number of sources depending on spectroscopic redshift (specz) should be discrete. Any histogram that contains a rather continuous distribution of sources depending on redshift does not correspond to a cut that can be considered efficient.
Let us choose IA484 to illustrate how this works in practice. This filter is centred around the wavelength 4840 Å, and has a width of 229 Å, which means that the wavelengths that can potentially be detected by this specific filter stretch from 4734.65 Å to 4963.75 Å. Following the advice of our supervisor David, we started calculating, for each filter, the corresponding redshift of different emission lines (H-alpha, Oxygen III, Oxygen II, Carbon IV, an Lyman-alpha) at these wavelengths, using the formula:
Rest frame wavelength (Å)
|Min redshift (to achieve an observed wavelength of 4734.65 Å)||2.90||-0.24||0.27||-0.05||2.06|
|Max redshift (to achieve an observed wavelength of 4963.75 Å)||3.08||-0.29||0.33||-0.01||2.20|
Table 1: Expected redshifts for different emission lines, for IA484. The second and the third columns correspond to a negative redshift, so the H-alpha and OIII lines will not be detected by the IA484 filter. For this filter, the Lyman-alpha line should be between z = 2.90 and 3.08.
We obtained tables such as Table 1, containing values for the minimum and maximum redshifts associated with each emission line. Initially, we had started calculating these values for different filters by hand, but it ended up being a tedious task. We decided to write a Python script to automate the process, which made things quicker and easier. We then plotted new histograms, trying to identify emission lines, and modifying the values of our EW and Σ cuts in order to get a discrete distribution of sources.
Figure 11: Count against spectroscopic redshift for IA427, for sigma_NB > 3 AND EW_obs > 50 AND specz > 0:
After having created several plots for a number of different cuts, we needed to find a way to determine the best combination of cuts. The two most important factors to take into account were how much noise remained in the sample, and how many actual line emitters had been excluded from it. In order to find out, we calculated two other values: the purity, which is the proportion of emitters to noise after a cut has been performed, and the completeness, which represents the proportion of emitters present after the cut has been performed in comparison with before. We plotted completeness against accuracy to obtain new graphs.
Figure 12: Completeness against purity (or “accuracy”) in %, for IA709
We also normalised the values in order to get another type of plots.
Figure 13: Variation of completeness (in orange) and purity (in blue) depending on the EW cut, for normalised values.
We modified our Python script so that it would be able to calculate the number of emitters, purity, completeness and effectiveness (purity multiplied by completeness) for a large number of different combinations of EW and Sigma cuts, including for high values that we had not tried before. For a selected filter, the script creates a .ascii file containing a catalogue of emitters that can later be opened in Topcat, which makes it easy to obtain 3D plots to represent EW_Cut, Sigma_Cut, and relative effectiveness (the effectiveness value we calculated divided by the one obtained by our supervisor).
Table 2: Sample from the IA856 catalogue
Figure 14: 3D plot for IA856
During the remainder of the internship, we will try to produce better 3D plots in order to identify which combination of EW and Sigma cut is the most efficient for each filter. The next step will then be to identify Lyman-alpha emitters (LAEs) out of the selected emitters by applying other selection criteria, and performing visual inspections.