SC4K: M* and UV LFs – Part 1, Weeks 1 and 2

The aim of this internship is to obtain stellar mass functions (SMFs) of Lyman-alpha emitter galaxies (LAEs); to analyse how these SMFs evolve on the cosmic timescale, and to observe how these SMFs compare to those of the more common galaxy. I have been investigating 3908 LAEs, obtained by the SC4K cut of the COSMOS survey performed by my internship supervisor, Dr. David Sobral (Sobral et al 2017). The SC4K cut was completed using 16 different narrow-band and medium-band filters over the COSMOS field, with a redshift range of z ≈ 2 – 6.

I have been provided with the SC4K catalogue which contains numerous properties for each LAE, however the properties pertinent to my investigation for the moment are log(M/MSun), Lyα flux, and  Lyα Redshift.

Since my investigation relies heavily upon the log (M/MSun), values for each LAE, my first job required determining which LAEs had values that could be trusted. The log(M/MSun) values of each LAE were obtained from spectral energy distributions (SEDs) that were fit to known flux data points for each LAE, credit to Sergio Da Graca Santos. However, since this was an automated system, it was always likely to produce some anomalous results, for example from a mere initial inspection of the log(M/MSun) values there were some that were > 30, implying a solar mass of the order of 1030; this is more analogous to the mass of the universe let alone a LAE which is expected to be of the order 108/109 solar masses, and thus it was decided that it would be prudent to visually inspect the SEDs of the most extreme predicted mass LAEs and to remove those which had either poor fitting SEDs or SEDs which lead to an unphysical mass value, until I was satisfied that at least the majority of those remaining could be trusted.


Figure 1: An example of an SED which lead to an unphysical mass value of log (M/MSun)=32.05, most likely due to the few number of data points.

Figure 2: An example of a ‘Debatably Poor’ fitting SED that was discarded.

Figure 3: An example of a decent fitting SED, with a reasonable solar mass value, and thus the corresponding LAE was kept.

In order to sort through the SEDs, I invented a classification system in order to ‘more accurately’ (quotation marks are due to my supervisor’s opinion of my system) classify each SED, and thus each LAE’s validity. The system is as follows: the most reliable SEDs, and thus LAEs were assigned ‘G’ i.e. good, the next subset were SEDs that followed most of the data points and limits and these were assigned ‘DG’ meaning Debatably Good, the next is the reversal of this, e.g. a couple/few points/limits were followed but not enough for it to be reliable, see Figure 2; these were assigned ‘DB’, Debatably Bad, finally the most egregious SEDs, such as the one shown in Figure 1, which produce completely unreasonable mass values, were assigned ‘B’ i.e. bad. Alongside these I used a flagging system which removed both DBs and Bs for easier removal of these both in later, while coding.


Figure 4: A histogram depicting the effect of removing all of the Bs and DBs from the catalogue has upon the spread of masses.

As can be seen, the data reduction by my system has little effect at the expected LAE masses, i.e. 108/109 solar masses, but in fact all LAEs with masses above 1012 were removed.

Splitting the reduced data into a filter by filter basis produces the following:


Figure 5: A histogram showing the distribution of mass per used filter in the reduced SC4K catalogue.

For my stellar mass function, I plot the log of the mass function (shown below) against log (M/MSun). It is created in a similar fashion to a histogram, in that the data is split into bins but instead of just merely counting the number of counts per bin, the mass function equation is applied to each individual count (LAE). This need to be done individually with the exception of dlogM, as each source as unique completeness. The dlogM, is merely the width of each bin used to split the data, and thus is more general.

The following plot was obtained as a first attempt, after applying the mass function equation, without completeness and using a general volume (done later due to its more complex application).


Figure 6: My first SMF, neglecting completeness and respective volumes, as well as errors.

The respective volumes that each filters observed, was obtained from the Slicing COSMOS with SC4K: the evolution of typical Lyα emitters and the Lyα escape fraction from z ∼ 2 to z ∼ 6 paper, (Sobral et al 2017). The completeness for each source was determined, from both the filter in which it was observed and the Lyα flux it was observed to have, as each filter has completeness factor, for example:


Figure 7: The completeness function of the medium-band filter IA427. The axis shows completeness against log(flux) of the Lyα line.

Upon adding both completeness, respective volumes, I have thus far been able to obtain a basic stellar mass function for both 13 of the 16 filters used in the SC4K cut, and a SMF for 5 of the 6 redshift at which the LAEs are believed to be located. (This is due to the narrow-band filters, NB392, NB501, and NB816 possessing more complex and erratic completeness functions.)


Figure 8: The SMF for filter IA427, including both completeness and respective volumes, but still neglecting errors.

The obvious exclusion from the previous SMF is the lack of errors, in order to add errors I used the random error method, so for N counts in a bin, the error is  √N, and thus applying the mass function equation to this error (using an average value for volume and completeness), the error of each point in the final SMF is obtained.


Figure 9: The SMF for filter IA427, including both completeness and respective volumes, and now including errors.

It is worth noting that since I am using the random error method, in order to obtain log errors, I have had to set the errors to -1 for any bin containing only 1 source, as doing so without causes a maths error by trying to do the log of zero.


Figure 10: The SMF for filter IA427, including both completeness and respective volumes, and now including errors, showing the blank spots of bins with no sources, and the fact that SMFs of different filters/redshifts use non-uniform axis limits.

At this point, I have SMFs for 13 of the 16 filters used in the SC4K cut (two of which are shown in Figures 9/10), and for 5 of the 6 redshifts. Going forward, the immediate plans are to include upper limit notation for bins containing no sources (illustrated in Figure 10), this upper limit will be placed as if it is a single source. I will also set all of the SMFs of each filter/redshift to a uniform axis limits, as currently the bin location are just determined by the smallest and largest values in the provided data. I will also neglect log M values below 9 for the moment as, due to the large completeness of these values (due to their low flux), it is unwise to trust them, this also brings my work closer to the literature already looking into this area.

In the long term, I plan to improve my plots by fitting schechter functions. I also aim to compare my SMFs for LAEs at each redshift to the SMFs for all galaxies observed, and to see whether a pattern emerges. Furthermore, I wish to see how the stellar mass density of the LAEs observed to compare to all galaxies at each redshift.

I have greatly enjoyed my first 2 weeks of this internship, the opportunity to perform ‘real science’ is amazing.

If you have been interested by the work I have been doing keep posted, as I shall upload further updates, in the next week or two.

-Josh