Weeks 1 & 2: Alpha Edition

I started on the first day of the internship on the busy Extrav week at the university not really remembering what I had signed up for several months prior, just that it would involve lots of Python coding, which I couldn’t complain about!

Coding is not, at least to my mind, like riding a bike; you do not just learn it in childhood, take the stabilisers off at 9 and yet ride say 10 years later just like you used to be able to all that time ago on summery days along bridal paths that criss-crossed the Derbyshire Dales and feel like you can go as fast as the great locomotives used to thunder down those same routes decades prior. No, with coding it seems your brain has to be re-wired everytime you come back to it, even after the shortest spell away.

Therefore my first task was to conduct an appraisal of the numerous Python scripts that are stored on the Box folder of the XGAL group which would also double up as a re-acquaintance with coding. These have been produced over the years by staff, PhD students, interns and students of the Astrophysics Group Project to reduce images, produce catalogues, fit functions to data and create graphs of data. This took several days but allowed me to discover the huge amount of cool code that has been made and could be utilised to make new, generic code that everyone could use.

LancAstro.py will be the main focus of this internship; a Python package of modules that will be useful for staff and students alike. I envisioned it to be constructed of the following pieces of code, or modules:

  • HistoPlot.py To construct bins from data plot histograms
  • ScatterPlot.py To plot scatter points with error bars
  • FunctionPlot.py To plot functions
  • 3DPlot.py To plot 3D ‘data cubes’
  • Fitter.py To fit to data for analysis
  • SpectralFit.py Fit to lines in spectra
  • ImageRedux.py To reduce images
  • CatExtract.py To make catalogues of stars from an image
  • SSE.py To extract info for a single source
  • BootStrapper.py To produce a region of confidence about points
Fig 1: My optomistic gantt chart for the LancAstro internship project

With this plan laid out (as detailed in the gantt chart in Fig 1) I set off on making the first module; HistoPlot. For this I took a Python script from the Box folder that already made a histogram for a specific data set, and attempted to try and make it generic.

However, I quickly ran into issues. The variables in the code had been named specifically for the task the original author had created it for, so without context it was difficult to work out how it actually worked!

Fig 2: Example of how HistoPlot creates bins from the data. It does this by looping through each data point and checking if it is within +/- width of the bins from the centre of each bin and adding this to the count for the bin.

With a bit of help though, I realised it was actually marvelously simple to code into Python a bin constructor. The code simply loops through each data point one would wish to bin (this is the first ‘for’ loop in Fig 2), then loops through the centre of each bin with each data point and checks whether it is less than/ more than the centre of the bin +/- the width of the bins.

The code then has to convert this into arrays that can be plotted to look like a histogram. This requires it to add the ‘corners’ of each bin so that when plotted as a line, it will look like a bar-histogram.

With this working, the next step was to plot the counts for each bin. It took awhile to realise that the points on the x-axis for the left most and right most edge of the plot must be included so the lines plotted touch the x-axis and complete the bars.

Fig 3: Histogram of SC4K data processed by Sergio Santos and then Josh Butterworth, binned and plotted using an early version of HistoPlot.py

To test HistoPlot, I used SC4K data processed by Sergio Santos into SEDs and then by Josh Butterworth for his internship on SC4K: M* and UV LFs. As can be seen in Fig 3, at first I just plotted one variable but I wanted the code to be as generic as possible so I now needed to make it plot multiple variables on the same plot from data loaded in from text files or FITS files, which is shown in Fig 4.

Fig 4: Method in HistoPlot to load columns of data from a FITS file. The user has to parse arrays with the paths to the files, the names of the files and the names of the columns to extract and put in a 2D array called DATASET to plot with

The code then has to loop through each variable to plot, send this to make_bins to construct the bins as shown in Fig 2, and then plot this result. It also fills underneath in the same colour as the line. The ‘handle’ for each plot is returned to make the legend with all the plots. The handle is an object created with each plot in PyPlot that tells pyplot.legend() what each legend entry should look like. Be warned, legend handles are weird and can be quite tempermental as I found out but eventually I managed to produce this plot in Fig 5.

Fig 5: SC4K data binned into redshifts by Josh Butterworth and then binned into a histogram by my code!

To make the code easier to understand and edit by its users, I moved as many of the variables as I could to control the binning of data and parameters of the plot and figure to the top of the code.

Fig 6: Parameter definitions for HistoPlot in one easy to read place

I then thought I’d spruce up the code by putting a loading bar into the terminal output to update the user on the progress of loading the data, or plotting the histogram. I found some code for this on StackOverflow.com and created a module with it that then could be imported into other LancAstro modules, as shown below!

Fig 7: GIF of the the terminal output for making a histogram with the progress bar!

I’m pretty pleased that I got that code to work!

Fig 8: A finished figure of the same SC4K data from Josh’s project as shown in Fig 5

This had taken me up to the beginning of the second full week of the internship. There had been a week’s break in my internship after Week 1 for the National Astronomey Meeting that was taking place at Lancaster this year. As part of that break, I thought I’d try and make some space mission style patches for the XGAL internship and the LancAstro package!

I made these in Inkscape using pre-made graphics for a mod of a game called Kerbal Space Program (KSP) . I think I should make some t-shirts and mugs with these on!

Going into Week 2, HistoPlot.py was now in what I deemed V1 ready and so it was time to move on and make ScatterPlot. I thought this probably wouldn’t take much time as the two modules would share a lot of the same code; the same loading data methods, very similar plotting methods, the same figure creation parameters and methods etc. However, it turned out that quite a bit of the code for HistoPlot was fairly broken!

I had made a few methods to automatically adjust the axis dimesions to fit the range of all the variables but this turned out to cut the axis short. Another method was meant to check the length of the array which held different parameters for each variable (such as the colour of each plot for example) and that this was the same length as the number of variables. If this was not true (which I deemed likely as if someone wanted to plot 14 different variable sets for instance, they probably couldn’t be bothered to type out that many marker styles for instance) the code would get the first entry of the array and set this to all entries and make the array the correct length. The way I had coded this however meant that it often appended whole arrays of numbers or adjusted the wrong value.

The user also had to send a lot of arguements when they wanted to use the module because I didn’t realise that in Python you can set optional arguements. I had managed to create the monster I had sworn to try and vanquish! A user had to parse the data or parameters in a very particular way, otherwise it would not work.

Fig 9: A successful figure made with ScatterPlot from SC4K data processed by Sergio Santos and Josh Butterworth

However, after realising the error of my ways, I managed to get ScatterPlot to work, including asymmetrical errors on some test data from Josh’s project.

Therefore, at the end of Week 2, the state of LancAstro was this:

As one can see, I’m fairly behind schedule but the more I learn about how Python works, the quicker I should be able to progress.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s