TEABAG: The Environment Around Bright Ancient Galaxies

The Unwilling

Let the challenge………. BEGIN!

Raven – Raven c. 2002

Week 1: The End

But it’s actually the beginning

Largely unprepared and slightly clueless, the TEABAG team was created with the aim of observing the area surrounding three galaxies. Over the next few weeks we aim to examine the environments around the bright, distant galaxies, CR7, VR7 and MASOSA. These galaxies are some of the oldest we can observe, and likely helped the process of reionising the neutral hydrogen and helium present shortly after the big bang.

The start of something…

We have 720 individual images to look through, hunting for any potential galaxies we can find emitting at the wavelength corresponding to CII emissions. This wavelength (158μm) is linked to star formation and is a good indication of a galaxy being present. One potential source of error is CO emissions from closer galaxies. These emission lines have the potential to look like shifted CII lines. To account for this we are using Hubble data to observe whether any galaxies are present in front of our source.

Week 2: The Phantom Code

Initially, we began combing through large amounts of data by hand. With no noise reduction or immediate strategy, any single bright spot could be suggested as a source. Despite being slightly monotonous, manually identifying candidates acts as the baseline for any automated efforts we might attempt in the future.

Manual identification (Left) VS Identification by GAIA (Right)

It was an uphill battle…

Bentley – Coding Lead

In an effort to introduce noise reduction, our coding team worked for many hours on am ultimately ill-fated attempt to measure the flux in regions across each slice. Eventually, through resilience and determination, a breakthrough resulted in data being collected, ready to be processed and used to make everyone’s lives slightly easier.


Matt – Error Lead?

To make sense of our initial data, we started producing a data cube of all our potential candidates. GAIA may have thrown formatting issues at us, but a data cube containing the candidates was made. It shows every bright spot in 10 slices of 1 set of data. Given that we have 360 slices to look through (and just as many inverted slices), automating this process will be the main focus of the next few weeks.

The cube above shows the enormity of the task at hand. Whilst every point in the cube has potential to be a source, the reality is only one or two are likely to be candidates. This data needs to be heavily reduced through the use of noise reduction before any useful analysis can be done.

Weeks 3 and 4: The Code Wars

Programming is notorious for being tedious, with code never working and errors larger than the code itself. However, manually scouring through the data is arguably worse, so we somehow needed to automate the entire process.

It’s going better than it was

Bentley – Coding Lead

Enter SExtractor, a program capable of noise reduction, source identification and data extraction. Once we ultimately figured out what we were doing (thanks to an endless supply of documentation and some example code) our Python code started spewing out lines of positions, fluxes and frequencies for varying amounts of noise reduction.

In order to ensure SExtractor finds a reasonable amount of candidates, further manual calibration needed to be done to select the correct settings. This makes sure SExtractor doesn’t cut out any likely candidates whilst ignoring most of the noise in each file.

Candidates at varying levels of noise reduction. Higher levels are at stricter noise reduction.

Through much debate and a small amount of scientific reasoning, we determined the settings for SExtractor to use. We ultimately decided to use a 25px detection area, and found candidates over varying levels of noise reduction. Candidates most likely to be sources will be detected at 4/5 sigma noise reduction. By running the code on each cube, the program identifies possible candidates according to the settings we have selected. SExtractor’s candidates can then be checked manually to see if they are likely to be sources or not.

Potential candidates identified by SExtractor at 4 sigma noise reduction.

Our next steps involve identifying all potential candidates for the three galaxies, finding their spectra and using COSMOS/HST data to identify whether they are distance CII emitting galaxies, or closer CO emitters. After this we need to find the red-shift and luminosity of each galaxy to help classify them.

Week 5: A New Code

With the bulk of our coding efforts behind us and our candidates successfully identified, we can move on to finding spectra for each source. This, of course, requires a new script to analyse each candidate, measuring the flux at each frequency and producing further lines of text to be plotted.

Since SExtractor cannot identify the frequency on it’s own, a separate calculation needs to be performed to find the frequency of each slice. Fortunately, our data contains the information necessary to work out these frequencies and yet another block of code works out the corresponding frequency for each slice.

else: print (‘I will do nothing but I will not crash!’)

David Sobral on coding

Meanwhile, manual observation is continuing to progress, with comparisons between our regular and inverted data being made. This is important for statistical analysis and can be used as a rough estimation on how many sources are likely to be present. For example, the data below suggests we may find about 35 potential candidates surrounding CR7. However this is only an estimate and the actual number is likely to be lower.

Oh I’m sorry, I won’t do the work next time.

Kyle – Data Lead
A small crop of the data collected and counted over the last few weeks

Next week more data crunching needs to be done. Mainly, duplicates of candidates need to be found so they are not counted twice. Also, the presence of a source in multiple slices is strong reasoning for it to be a valid candidate.

Week 6: Revenge of the Code(rs)

In a sudden wave of optimism and progress, our coding team vaulted one of their most important tasks so far. Successfully producing a program that measures the flux over entire frequency ranges for a given candidate which allows us to check the validity of our candidates.

The spectrum of a potential candidate, with a clear peak at one frequency.

Whilst this is an important step, this data needs to be converted into more useful forms. Notably, luminosity and wavelength, as well as redshift. The conversion from frequency to wavelength is trivial, and redshift can be calculated using either of these values. Redshift is most useful as a comparison to the target galaxies. Knowing the redshift of each source provides a relative position compared to the target. For example a redshift of 6.1 would place the galaxy in front of CR7, whilst a redshift of 6.8 would place it behind.

The equation for redshift

Meanwhile, Team Data have been cleaning up the potential candidate lists, removing duplicates and candidates not visible in the PB corrected files. This is mostly candidates around the edge of the non-corrected files, as they and the corrected files are different sizes. After purging over half the initial candidates in each cube, the amount of data we are expecting to analyse has dropped significantly.

Week 7: Return of the GAIA

Why don’t astronomers like vegetable soup?

They prefer a meteor soup…

Catherine – Administrator

In our last week of scheduled labs, spirits were adequate and mood was reasonably good. Pushing the lab forward an hour meant we were all fairly tired, but we pressed on. Team Data and friends were using COSMOS data to check whether our potential candidates were sources or not, and going through spectra of each, whilst Team Code were generating signal to noise ratios to help identify which sources are our most reliable.

A histogram showing the number of positive sources compared to the number of negative sources for various signal to noise ratios.

By comparing the positive and negative sources at various signal to noise ratios, we are able to estimate a cut off point, beyond which sources are less likely to be real.