This past week we have kept working on finding the ideal cuts for each of the filters in order to get a data catalogue that’s got the right balance of including as many of the real line emitters as possible but also not be full of ‘trash’ data.
In order to do this, we needed to make 3D plots of EW_Cut, Sigma_Cut, and the relative effectiveness (a value that multiplied purity and completeness and then compared it to a predone basic cut done by our supervisor david), and then find the peak in that data. To start with, we weren’t getting peaks due to various things we hadn’t included or considered.
One of the things we needed to include was subtracting the ‘false’ values due to a spread of noise (visible in the ‘bulge’ in figure 1). These ‘false’ values are sources that are in the right redshift range to be an emitter, but they do not have the expected sigma and EW values. We expected there to be a distribution of fake emitters around Sigma and EW = 0. By using a minimum Sigma and EW cut we already eliminated a big bulk of the fake emitters or data points, however to fine tune it even more some emitters must be eliminated at higher EW and Sigma values than our cut in order to make sure the reliability is good enough for our completeness and purity values. Because when plotting we only plotted from Sigma>3 and EW>100 (why this is will be explained later on), so what we did for the subtraction is find how many sources came under Sigma<-3 and EW_obs<-100 within our redshift ranges for different line emitters and assume that this spread of ‘fake emitters’ is approximately equal both below an EW of 0 and above it. With the number of sources found below this cut, we subtracted them from the number of sources found above the same cut in the positive, so that the ‘fake emitters’ are statistically accounted for and the completeness and purity take this spread into consideration.
When plotting the graphs, we only included cuts that were at least Sigma>3 and EW>100. This is because lower cuts don’t cut out many sources, so the completeness will be very high. Although the purity will be very low, the overall effectiveness will be relatively high because of the completeness. We would rather have a lower effectiveness where the purity and the completeness is more balanced out. Most of the noise has a sigma of 2.5 or lower, which is why we picked a minimum sigma of 3 for our graphs.
For a lot of this process, we used an ever improving code to make our process more efficient (we’re now on version 8 part 3!!). The first Python script we wrote had only one goal: it was used to determine at which redshift we expected to observe emission lines (Lyman-alpha, H-alpha, [OIII], [OIV] and [CIV]). For each filter, it took into account the centre wavelength and the FWHM, in order to calculate the minimum and maximum wavelengths of the filter, and their corresponding redshifts, using the formula:
We then decided to improve the script by adding new parts that would also, for a selected filter, calculate purity, completeness, effectiveness (purity x completeness) and relative effectiveness, for a number of combinations of EW and Sigma cuts. We would thus obtain catalogues that would allow us to plot the EW cut, the Sigma cut and the relative effectiveness on a same 3D plot for each filter: this plot should exhibit a clear peak in relative effectiveness for a certain combination of EW cut and Sigma cut.
Initially, we used a code structure based on the use of multiple while loops: we had three different increments, and three loops within one another, to check if, for each combination of EW and Sigma cuts, each emission satisfied the said cuts or not. We then used another loop to calculate purity, completeness and effectiveness. We displayed the results with a final loop, in a textual form.
One of the earliest modifications we performed was to modify the way the results were displayed: indeed, we quickly found out that having data expressed in sentences is not very handy when you later have to analyse this data or plot it… Therefore, we modified the script so that it would now generate an .ascii file containing all the data we needed. This .ascii file could then be opened in Topcat, the software we used to plot our data.
Another problem we encountered was the time it took to run the code: for each filter, dozens of minutes were necessary to get the full results. After listening to the advice given by David, our supervisor, we realized that although our code worked and produced the desired catalogues, it was not efficient at all. Therefore, we completely changed the structure of the code. Instead of having three while loops within another to check if each emission satisfied each combination of cuts and could be counted as an emitter, we directly summed all the emissions corresponding to our criteria.
So… What’s Next?
After determining the best EW and Sigma cuts per filter, we will be finding an average for the cuts in order to be able to have the a homogenous cut for every filter to be able to compare the data to each other. We will also be comparing the effectiveness of this cut with the cut used in Sobral et al. 2018 to make sure that changing it is worth the lack of being able to compare between fields. After that we will move onto the next steps, narrowing it down to Lyman alpha emitters and not just any emitter. This will be done in 3 steps:
- Use the sources with available spectroscopic redshift to determine which are within the right redshift range to definitely be Ly-alpha emitters
- Find an appropriate photometric redshift range that won’t eliminate many line emitters in a similar way to the way we are finding the most appropriate
- Lyman Break Technique
More on these steps and more detailed explanations coming next week so keep an eye out on the blog and for now: keep being curious, keep learning, because we’re trying to do just that
-Amaia, Louis, Emily