SC4K: M* and UV LFs- Part 3, Weeks 4, 5 and 6

This is the eagerly anticipated third and possibly final instalment of the world renowned SC4K: M* and UV LFs blog.

In the space of time since the last blog post, I have accomplished the goals that I set myself at the end of the last post and more, and this blog post will document those steps.

Firstly, I did in fact managed to edit my code such that, when applying a Schechter fit to a mass function, the upper limits were accounted for by creating ‘invisible’ data points, shown as stars in Figure 1, at negligible phi values (in this case at -4 of whatever the phi value of upper limit), with error bars that extend up to the upper limit; these ‘invisible’ points and errors were then included when fitting, thus allowing us to account for these upper limits without giving them the same weight as actual bins that contain sources.

Figure 1: The SMF for redshift bin z=5.4=+-0.5, now with the upper limits being accounted for when fitting the Schechter function, and with the ‘invisible points depicted as stars.

And then without the ‘invisible’ points, the SMF appears as follows in Figure 2.

Figure 2: The SMF for redshift bin z=5.4=+-0.5, now with the upper limits being accounted for when fitting the schechter function, with the ‘invisible’ points removed.

As stated in the last blog, we desired to create SMFs for each individual redshift slice, an example of this is shown in Figure 3 as the SMF for the filter IA484, or redshift slice z=2.98.

Figure 3: The SMF for redshift bin z=2.98, corresponding to filter IA484, plotted and fitted like the redshift bins shown previously.

It is worth noting that at this point in the internship I was provided with an updated catalogue of mass values for the LAEs, so any changes in the shape of SMFs may be due to that fact. (Credit: Sergio Da Graca Santos)

A problem with fitting to stellar mass functions is that at a point when going to lower and lower masses, a dip is observed; the issue is that this dip is not caused by a scientific or characteristic reduction of the number of LAEs at these masses, but instead by the effect of completeness, i.e. the less massive and fainter LAEs are much less likely to be observed than when compared to their more massive counterparts. Up until now when fitting, we have used the range 9-11 log solar masses, for all redshift bins and slices, but to account for the completeness ‘dip’ in each individual slice/bin, we decided to set a minimum mass, by which each slice/bin shall be fitted to. This minimum mass is depicted in the following figures by a vertical dashed line.

Figure 4: The SMF of redshift slice z=3.15, corresponding to filter IA505, with the minimum mass line now shown by a vertical dashed line, in this case at log(M_star )= 9.4.
Figure 5: The SMF of redshift bin z=2.5=-0.1, with the minimum mass line now shown by a vertical dashed line, in this case at log(M_star)=9.1, the masses have been extended down to 8.5, in order to show the dip.

After this I produced a new stellar mass density vs redshift plot, now including both the new fitted data using the new minimum mass method, and all of the redshift bins and the redshift slices, shown as yellow stars and blue circles respectively in Figure 6.

Figure 6: The initial SMD created, including both redshift bins and redshift slices, showing a strange phenomenon whereby some redshift bin densities, appear to be above the densities of its constituent slices.

As you may have noticed there is an unusual phenomenon for the redshift bins containing more than one redshift slice, as they appear to have a much higher density than the redshift slices they comprise of.

Is this a new breakthrough in how we think of space?

The answer is of course no. In fact it was a simple error in how the calculating of the volume used to create the phi values of the redshift bins that was the problem. For a redshift bin, the volume surveyed for each source is the sum of the surveyed volumes of its constituent redshift slices, but in my divine wisdom I had decided to find the average of the volumes instead. This was a very frustrating error, which puzzled me for hours, as my fellow interns could attest to.

Correcting this mistake I was able to produce a much more realistic SMD, as shown below in Figure 7.

Figure 7: The revised stellar mass density plot for all redshift slices and bins after fixing the issue with the volume use in the phi calculation, thus creating a more realistic plot.

As can be seen, the redshift bins’ density values now for the most part lie more neatly within the centre of the range of its constituting redshift slices’ density values, as was expected. As can be seen the density values for LAEs seemingly stay approximately constant over redshift.

We then decided that it would be useful to also examine how the schechter parameters, M* and phi*, themselves evolved with redshift, and thus we created the following figures.

Figure 8: A plot depicting the evolution with redshift of the Schechter parameter phi*, showing an initial slope from z~2.5 to a plateau at z~4.5.
Figure 9: A plot depicting the evolution of the Schechter parameter M*, showing a seemingly consistent value of ~10.6.

By observing Figures 8 and 9 we observed that M* seems to stay relatively consistent over redshift, and phi* appears to show a negative slope from z~2.5 until z~4.5 whereby it seems to reach a plateau.

Then, it was decided that the best way to display how the SMFs evolve with redshift, was to plot them all together as a grid, similar to those seen in the SC4K paper (see Figure 8). The first attempt at the resulting grid of all of the SMFs for all of the redshift slices and bins is shown in Figure 10.

Figure 10: The first stellar mass function grid plot, showing all 13 redshift slices and the 5 redshift bins.

Figure 10 represents my first attempt at this kind of plot, but after a discussion with my supervisor I realised a few flaws that I could correct in order to make it paper ready. The flaws with this plot include colouring the completeness points when they should really be colourless to show they are much less important, the Schechter fit not extending to the mass minimum line, and the data points to the left of the line being the same colour as those that the Schechter fit accounts for, when fading them could help show that they are not including when fitting.

After addressing a couple of these flaws, the grid produced was as follows in Figure 11, with a more complete edition being shown in Figure 12.

Figure 11: An intermediate SMF grid plot where we also increased the number of bins to better evaluate the source spread, and also the non-completeness points, now have been coloured white, to better illustrate their insignificance, and the Schechter fit extends to the mass minimum line.
Figure 12: The final grid plot including both redshift slices and redshift bins. It was at this point that it was decided that in order to show to data more clearly, the redshift slices and bins should be split.

It was then decided that combining both the redshift slices and the bins was unnecessary in such a large plot, thus I split them, keeping the individual redshift slices displayed as in Figure 12, but for the redshift bins we decided that showing the change in the Schechter fit was the only thing necessary, the resulting plots are shown in Figure 16 and 17.

In the midst of me working on the grid, I also improved the plots of the stellar mass density’s and Schechter parameters’ evolutions with redshift, shown in the following figures. They now clearly show which data points have been found using which fixed values. The shaded region shown in the Schechter parameter plots show the values of these parameters up to redshift, z=5, found by Iary Davidson et al. in 2017 for galaxies in the COSMOS survey (see here for the paper).

Figure 13: The updated log plot of the evolution of the Schechter parameter M* as a function of redshift, showing both which values are fixed in the creation of a data point, and how my data compares to that of Davidson 2017.
Figure 14: Log plot of the evolution of the phi* Schechter parameter over redshift, when compared to literature data obtained from Davidson 2017.

As can be seen quite clearly, my data shows an evolution of the Schechter parameters agrees quite nicely with the results found by Davidson, especially for phi* in Figure 14, as the slope going to z~4.5 is seen in both, and for M* they found a relatively constant M* as well.

The stellar mass density plot stays relatively similar as I have not yet been able to compare it to data found previously, also from the Davidson paper and more, but this is in progress.

Figure 15: The current version of the stellar mass density plot, showing the evolution of the stellar mass density of LAEs over redshift, better illustrating which values are created using fixed M* values.

It was after the completion of these plots that I was provided with the completeness functions for 2 of the 3 filters that I had to exclude previously, NB392, and NB816, this allowed me to create SMFs for them and improve my results. However, while NB816 had a simple completeness function (i.e. completeness against Lyman-alpha flux) NB392 had multiple, depending upon which area of the survey it was observed in, this is due to the fact that observed in the CALYMHA survey (see here), and as such I had to include a clause in the code such that if the source was detected by NB392, the COSMOS ID of the source is used to determine which completeness function must be computed.

Finally, Figures 16 and 17 show the new and improved stellar mass function grid plot, including all the redshift slices possible (including NB392 and NB816), and the SMF for the redshift bins depicting only their respective fits.

In the new SMF grid, the fit line and shaded region are red for plots that use fixed M* values as well as fixed alpha values, and they are blue for fits using only fixed alpha values. 

Figure 16: The current iteration of the SMF grid plot, with the red shaded fits illustrating that these fits were created using a fixed M* value. The additional SMFs for filter bands NB392 (z=2.22) and NB816 (z=5.71) have been included in this version.

You might notice that the difference between the completeness and non-completeness points for the newly added NB392/z=2.22 slice, are at lot less than for the rest of the slices. This issue highlighted a problem in my selection process up until this point. I should have set a completeness limit for sources as a means to eliminate sources that have such a low completeness that they may be skewing the data. After a discussion with my supervisor we decided that a 30% completeness limit would eliminate the most disturbing of sources. As such when observing these last plots, take them with a pinch of salt, as they shall be altered after I apply this limit.

Figure 17 was also created before the addition of the completeness limit, but the general idea of the plot can be gathered, if not the scientific result.

Figure 17: This is the preliminary version of the plot comparing the fits for all the redshift bins, however this plot shall need changing to account for the 30% confidence limit, and as such needs to be taken with a pinch of salt.

The next steps are as follows: Apply the 30% completeness limit to all of the data, include the data for the new redshift slices into the SMFs and thus create Schechter parameter and stellar mass density plot. Also, we decided that an interesting thing to investigate is the SMF of only high luminosity galaxies (i.e L > 1042.8 erg/s) to see whether they behave any differently. I have also recently obtained a dataset of mass values of the SC4K cut except these were determined, not with SED fitting with MAGPHYS, but instead used CIGALE (a python Code Investigating GALaxy Emission), not the best acronym but still very cool, and it will be interesting to see how much their results differ from those I have used.

I am very excited to write my first, first author paper that will be published, and I must say thank you to my supervisor Dr. David Sobral for this opportunity, and also to the Ogden Trust for giving me the means to take it.

Finally, I have to mention the great fellow interns that I have been working alongside during this internship: Emma Dodd, Heather Wade, Harry Baker, Cass Barlow-Hall, and Amaia Imaz Blanco. It has been great to get to know you over the last few weeks!

This may be my last blog or not, it depends on how long it takes me to write my paper alongside my literature review for my Masters.

Thank you very much for reading.

-Josh Butterworth

SC4K: M* and UV LFs- Part 2, Week 3

This blog post comes off the back of a relatively productive week where I made steady progress towards the goals expressed at the end of the last blog post (see here, if you missed it).

Firstly, as I stated last week, I decided to re-create my stellar mass functions neglecting mass values < 9 due to their large completeness, an example of which is shown below in Figure 1.

Figure 1: The stellar mass function for filter band IA427 of LAEs found in the SC4K cut, now neglecting values below 9 due to their high completeness and thus unreliability.

I also applied markers to display upper limits on mass bins which contain no sources, to better illustrate the distribution of LAE masses, an example given in Figure 2.

Figure 2: The stellar mass function for filter IA624, showing the addition of a downwards triangle depicting an upper limit of a mass bin containing no sources.

From this point on I converted from using the filters the sources were discovered by, to using the redshift bins described in Sobral et al 2017 (z=2.5, 3.1, 3.9, 4.7, 5.4), as my method of separating my sources into separate plots. I also decided to stop the plot at mass values equal to 11, as there were only a few sources detected above this value that survived the purge of the SED examination, and those that remain have a relatively high chance of being ‘lower redshift’ interlopers that appear to be more ‘massive’ than they actually are. An example of my new SMF is Figure 3, for z~2.5.

Figure 3: The SMF for redshift bin z~2.5, plotted on the mass range 9-11.

The next step of my project was to fit Schechter functions to my data (the logarithmic version used is shown below), and thus obtain the values of alpha (the faint end slope parameter) and the Schechter parameters, characteristic mass and the normalisation point that best fits my SMFs at each redshift. The Schechter parameters were obtained, and the fit completed, using the curve_fit function of the python package SciPy. Since my plots are logarithmic in nature, the parameter values obtained were the logarithmic equivalents, i.e log(phi_0) for phi_0, however the desired schechter values were obtained via simple manipulation.

As an example, for redshift bin z=2.5+-0.1 schechter fit is as shown in Figure 4, it is worth noting that the slight change in shape of the data points from Figures 3 and 4, is due to the change in the number of bins from 10 to 15. The parameter alpha was chosen to be -1.4 for this plot, as previous fit attempts of higher values were less accurate. The schechter parameters values obtained from this particular fit were log(phi_0)~-4.26, and log(M_star)~10.1.

Figure 4: A SMF for redshift, z~2.5, including a schechter fit with alpha=-1.4, log(phi_0)~-4.26, and log(M_star)~10.1

It is worth noting a flaw in my plotting technique at this point in time, as currently my schechter fit neglects the upper limits points of bins which contain no sources. This will hopefully be rectified in the next week, by setting a ‘invisible data point’ at a negligible mass function value e.g -9, with error bars that extend up to to the upper limit, this should allow the schechter fit to account for these upper limits without their existence having a drastic effect. An example of this flaw can be seen in the following figure of the SMF of redshift bin, z=5.4+-0.5.

Figure 5: The SMF for redshift bin z=5.4=-0.5, showing the neglectful nature of the schechter fit of the upper limits.

The next property of the LAEs I wished to investigate was their stellar mass density (SMD), and how this evolves with redshift. Stellar mass density can be obtained by integrating the schechter fit of an SMF, the schechter fit can be integrated analytically using the following equation, which uses the incomplete gamma function, and the parameters obtained from the schechter fit.

Using this integration technique on the respective alpha, and schechter parameters of each redshift bin, I was able to produce the following stellar mass density against redshift plot.

Figure 6: A plot, without errors, showing the evolution of stellar mass density of my LAE data with redshift.

The next step after obtaining this stellar mass density plot, is to get errors for each point. After a failed error propagation attempt, I decided to use the Monte Carlo method to obtain errors. When I obtained the schechter parameter values from the Curve_fit function, I was also provided with the 1 sigma standard deviation values in the each of the parameters. Using these standard deviations, and the actual values of each parameters (the parameter values used as the mean), I was able to model both parameters, or the logs of the parameters in this case, as gaussians (see examples below) and by taking the 16th, 84th, and 50th percentiles respectively I was able to obtain estimates of the best value for each parameter as well as a 1 sigma error, using the statistical 68 rule.

Figures 7a & 7b: The gaussian plots from which I have obtained errors by using the Monte-Carlo method, in which to apply to stellar mass density data points, shown in Figure 6.

I also applied an error to my alpha parameter value due to its relatively arbitrary status of my just picking its value by how I deemed the fit to look. The error was obtained by modelling the alpha value as a uniform distribution between -1.3 and -1.5, as to keep -1.4 as the mean but also to include -1.3, which was my initial alpha value.

Using all of the errors my final stellar mass density plot appeared as follows:

Figure 8: The stellar mass density against redshift plot for the 5 redshift bins, now with errors included.

As a preliminary examination of this plot we can observe that the density of LAEs remains approximately constant across all redshift, the actual behaviour shall become clearer once I split the redshift bins back into the individual redshifts of the filter bands used to detect each source, which is something I plan to do. Also, the catalogue which I had used initially has been recently updated, and so I shall be running these new mass values through my code, however it is unlikely that this will have much of an effect.

The aims for the following week, will be to edit my schechter fits such that they account for the upper limit points in my SMFs. Also, I plan to split up the redshift bins into their respective filter band redshift values, as this will provide us with more data points on our SMD plots, and thus will provide a more detailed outlook on how Lyman-alpha emitter stellar mass density evolves with redshift. Following on from that will be comparing these stellar mass densities at each redshift to the stellar mass density of all galaxies at these redshifts, and thus will allow me to explore how the proportion of LAEs to all galaxies has evolved in this time period.

Thank you for reading this blog post, this internship is a great opportunity for me, and it gives me great pleasure to be able to share my progress.


SC4K: M* and UV LFs – Part 1, Weeks 1 and 2

The aim of this internship is to obtain stellar mass functions (SMFs) of Lyman-alpha emitter galaxies (LAEs); to analyse how these SMFs evolve on the cosmic timescale, and to observe how these SMFs compare to those of the more common galaxy. I have been investigating 3908 LAEs, obtained by the SC4K cut of the COSMOS survey performed by my internship supervisor, Dr. David Sobral (Sobral et al 2017). The SC4K cut was completed using 16 different narrow-band and medium-band filters over the COSMOS field, with a redshift range of z ≈ 2 – 6.

I have been provided with the SC4K catalogue which contains numerous properties for each LAE, however the properties pertinent to my investigation for the moment are log(M/MSun), Lyα flux, and  Lyα Redshift.

Since my investigation relies heavily upon the log (M/MSun), values for each LAE, my first job required determining which LAEs had values that could be trusted. The log(M/MSun) values of each LAE were obtained from spectral energy distributions (SEDs) that were fit to known flux data points for each LAE, credit to Sergio Da Graca Santos. However, since this was an automated system, it was always likely to produce some anomalous results, for example from a mere initial inspection of the log(M/MSun) values there were some that were > 30, implying a solar mass of the order of 1030; this is more analogous to the mass of the universe let alone a LAE which is expected to be of the order 108/109 solar masses, and thus it was decided that it would be prudent to visually inspect the SEDs of the most extreme predicted mass LAEs and to remove those which had either poor fitting SEDs or SEDs which lead to an unphysical mass value, until I was satisfied that at least the majority of those remaining could be trusted.

Figure 1: An example of an SED which lead to an unphysical mass value of log (M/MSun)=32.05, most likely due to the few number of data points.

Figure 2: An example of a ‘Debatably Poor’ fitting SED that was discarded.

Figure 3: An example of a decent fitting SED, with a reasonable solar mass value, and thus the corresponding LAE was kept.

In order to sort through the SEDs, I invented a classification system in order to ‘more accurately’ (quotation marks are due to my supervisor’s opinion of my system) classify each SED, and thus each LAE’s validity. The system is as follows: the most reliable SEDs, and thus LAEs were assigned ‘G’ i.e. good, the next subset were SEDs that followed most of the data points and limits and these were assigned ‘DG’ meaning Debatably Good, the next is the reversal of this, e.g. a couple/few points/limits were followed but not enough for it to be reliable, see Figure 2; these were assigned ‘DB’, Debatably Bad, finally the most egregious SEDs, such as the one shown in Figure 1, which produce completely unreasonable mass values, were assigned ‘B’ i.e. bad. Alongside these I used a flagging system which removed both DBs and Bs for easier removal of these both in later, while coding.

Figure 4: A histogram depicting the effect of removing all of the Bs and DBs from the catalogue has upon the spread of masses.

As can be seen, the data reduction by my system has little effect at the expected LAE masses, i.e. 108/109 solar masses, but in fact all LAEs with masses above 1012 were removed.

Splitting the reduced data into a filter by filter basis produces the following:

Figure 5: A histogram showing the distribution of mass per used filter in the reduced SC4K catalogue.

For my stellar mass function, I plot the log of the mass function (shown below) against log (M/MSun). It is created in a similar fashion to a histogram, in that the data is split into bins but instead of just merely counting the number of counts per bin, the mass function equation is applied to each individual count (LAE). This need to be done individually with the exception of dlogM, as each source as unique completeness. The dlogM, is merely the width of each bin used to split the data, and thus is more general.

The following plot was obtained as a first attempt, after applying the mass function equation, without completeness and using a general volume (done later due to its more complex application).

Figure 6: My first SMF, neglecting completeness and respective volumes, as well as errors.

The respective volumes that each filters observed, was obtained from the Slicing COSMOS with SC4K: the evolution of typical Lyα emitters and the Lyα escape fraction from z ∼ 2 to z ∼ 6 paper, (Sobral et al 2017). The completeness for each source was determined, from both the filter in which it was observed and the Lyα flux it was observed to have, as each filter has completeness factor, for example:

Figure 7: The completeness function of the medium-band filter IA427. The axis shows completeness against log(flux) of the Lyα line.

Upon adding both completeness, respective volumes, I have thus far been able to obtain a basic stellar mass function for both 13 of the 16 filters used in the SC4K cut, and a SMF for 5 of the 6 redshift at which the LAEs are believed to be located. (This is due to the narrow-band filters, NB392, NB501, and NB816 possessing more complex and erratic completeness functions.)

Figure 8: The SMF for filter IA427, including both completeness and respective volumes, but still neglecting errors.

The obvious exclusion from the previous SMF is the lack of errors, in order to add errors I used the random error method, so for N counts in a bin, the error is  √N, and thus applying the mass function equation to this error (using an average value for volume and completeness), the error of each point in the final SMF is obtained.

Figure 9: The SMF for filter IA427, including both completeness and respective volumes, and now including errors.

It is worth noting that since I am using the random error method, in order to obtain log errors, I have had to set the errors to -1 for any bin containing only 1 source, as doing so without causes a maths error by trying to do the log of zero.

Figure 10: The SMF for filter IA427, including both completeness and respective volumes, and now including errors, showing the blank spots of bins with no sources, and the fact that SMFs of different filters/redshifts use non-uniform axis limits.

At this point, I have SMFs for 13 of the 16 filters used in the SC4K cut (two of which are shown in Figures 9/10), and for 5 of the 6 redshifts. Going forward, the immediate plans are to include upper limit notation for bins containing no sources (illustrated in Figure 10), this upper limit will be placed as if it is a single source. I will also set all of the SMFs of each filter/redshift to a uniform axis limits, as currently the bin location are just determined by the smallest and largest values in the provided data. I will also neglect log M values below 9 for the moment as, due to the large completeness of these values (due to their low flux), it is unwise to trust them, this also brings my work closer to the literature already looking into this area.

In the long term, I plan to improve my plots by fitting schechter functions. I also aim to compare my SMFs for LAEs at each redshift to the SMFs for all galaxies observed, and to see whether a pattern emerges. Furthermore, I wish to see how the stellar mass density of the LAEs observed to compare to all galaxies at each redshift.

I have greatly enjoyed my first 2 weeks of this internship, the opportunity to perform ‘real science’ is amazing.

If you have been interested by the work I have been doing keep posted, as I shall upload further updates, in the next week or two.