Week 5: Release Edition

The eagerly awaited final installment of LancAstro.py blog is here! Or is it the final installment? This week, LancAstro gets an overhaul and some extra modules with some bells and whistles. With the completion of the LancAstro Example Library last week, I had the chance to move back onto my LancAstro.py package, my baby! With the newly christened Plot2D up and running, the next logical step seemed to be to go 3D!

At first glance through the MatPlotLib documentation, it looked like hacking Plot2D into Plot3D would be a doddle, so much so that I could just merge them into one all powerful LancPlotter that could do all the dimensions! It seemed like all one had to do was add an extra argument in plt.errorbar() for z and therefore I presumed z errors. Seems the logical course of action right? WRONG! Turns out that the syntax for 3D is needlessly complicated and frustrating!

To plot something in 3D, you’ve got to use a separate 3D axes object which has its own entirely separate methods and syntax to the standard pyplot methods I’ve been using in Plot2D. This therefore completely bricks my code!

# Sets up the figure size from parameters before plotting commences  
fig = plt.figure(figsize=figsize)

# Setting this axis to '3d' produces a Axes3D object    
ax = fig.add_subplot(111, projection='3d')

So I had to go through and rewrite the majority of my code to fit this ridiculous syntax.

# Have to use scatter() rather than errorbar() for 3D
PLOT = ax.scatter(xs=np.array(x[i]),ys=np.array(y[i]),zs=np.array(z[i]),
# Though I did add a colourbar to the z-axis data by using cmap and parsing
# PLOT into this:
cb = plt.colorbar(PLOT)

It took many failed attempts till it ran and even when it did it often looked terrible. This was down to yet more syntax that one would think was generic to MatPlotLib but wasn’t. For instance, you’d think that all these rcParams I’ve been using for x and y would equally work for z by just following the same logic of the syntax. Wrong again. It’s a completely different style! WHY?

# ============================================================================
#                           PARAMETERS
# ============================================================================
# FIGURE AND AXIS SIZES ======================================================
mpl.rcParams['xtick.labelsize'] = 10        # Size of x-tick labels
mpl.rcParams['ytick.labelsize'] = 10        # Size of y-tick labels
# mpl.rcParams['ztick.labelsize'] = 10 <== This is not legit 😦

It very much seems that even though this 3D plotting is cool and quite powerful, it does seem from what I’ve read around that it is a bit of an after thought tacked onto MatPlotLib’s library and therefore has some very strange and quirky syntax and lacking some rather obvious features.

For example, the labels of the axis are defaulted to be placed level but this often looks awful because as the the axis are all at angles, they run into them.

Fig 1: As you can see the labels often run into the axis

You’d think there was a simple keyword argument (**kwargs as its known in computer science) to set the labels to be parallel to the axis they’re associated with? Nope, apparently it’s a persistent problem people have tried to work around, often involving long, complicated bits of code.

As you can probably tell by this point, it was week 5, the syntax was bonkers and involving ripping up half my code to make it work and the lab was boiling hot in a heat wave so my sanity was fairly questionable at this point! But, my hard-work and fever-ish mentality would not be in vain, for the finished results were pretty cool!

Fig 2: Visualisation of SC4K data processed by Sergio Santos and Josh Butterworth using Plot3D. The colourbar represents the redshift of the galaxies

I also worked on developing MultiPlot, a module to create a grid of subplots, inspired by the grid created for the SC4K Sobral 2018 et.al paper which I reworked the code for last week. Unlike the 3D plotting thankfully, this could reuse the Plot2D code.

Essentially, each subplot is just a plot like any other, the difference being that with each subplot, a subplot object is created.

# Plots all the subplots
for i in range(len(x)):
    # Creates a subplot object that defines how many rows and columns the grid has
    # i+1 defines what number in the grid this plot is 
    # Therefore this ties all subplots together
    # Calls plot method to plot each variable from the dataset    
    HANDLES = laplt.plot(x[i],y[i],x_error=x_error[i],y_error=y_error[i],

You tell it how many columns and rows this grid of subplots will have, then whenever you make a subplot, you tell it which number this one is and it is then added to the grid!

It was therefore easy to make a grid of plots using the same SC4K data processed by Sergio and Josh that I’ve been using to test all my code. The issue with this however is all the little ancillary bits to make a graph like the axis labels and ticks etc. If left to its own devices, all these subplot’s labels and ticks overrun into a horrid mess. The way round this is having to supply a 2D array which defines the shape of the grid which can then be used to set it up.

if i+1 in left_edge:
    plt.ylabel(r'%s'%y_label, {'color'    : '%s'%y_colour,  'fontsize'   : y_size })

if i+1 in bottom_edge:
    plt.xlabel(r'%s'%x_label, {'color'    : "%s"%x_colour,  'fontsize'   : x_size })

if i+1 not in left_edge:

if i+1 not in bottom_edge:

This can then also be used to work out the edges of the grid so that each subplot can be checked to see if its on an edge and thereby turn its labels and ticks on/off accordingly.

Fig 3: Grid of SC4K data processed by Sergio Santos and Josh Butterworth at 12 different wavelength bands

With this working and my time left in Lancaster before heading home for the ‘Summer’ quickly running out, I thought that I’d have to leave with one last eye-catching hurrah to cover up that my internship had run a *little* behind schedule (if you remember back to the optomistic gantt chart from week 1&2). NB: summer mainly meant a 2nd internship at home (because 1 isn’t enough right?) and panic writing my lit review.

So I came up with a way to make gifs of 3D visualisations of data cubes. Basically some code that punched out the same data cube at different orientations. This turned out to be surprisingly trivial. I added a new method to Plot3D which was essentially a modified copy of create_figure() that was excecuted from a script that fed it the data and gave it the angle(s) to stay fixed at while the other rotational axis angle would be incremented on every iteration.

# Plotting done before this but not saved to file
# Then this creates a new save of the plot but at an incremented angle
# This gives the effect of panning about the data without having to replot 
# the data everytime
for i in np.arange(0,no_frames,1):
    cpfig = fig

    # Sets elevation and azimuthal angle of plot
    if azim == None: 
    if elev == None: 
    if elev == None and azim == None:


    if print_on == True:
        pb.printProgressBar(i+1,no_frames+2,prefix='FRAME NO.%d SAVED'%i,
                            suffix='COMPLETE', length=40)


After some experiment with this, the best results seemed to come from letting both azimuthal and elevation angles to be incremented together, giving a good panning about the data cube.

And the results, drum roll please, are shown in this neat gif below!

Fig 4: gif of the same SC4K data cube from before with frames at intervals of azimuthal and elevation angles

I have to say, I was pretty pleased with myself!

So with Friday drawing to a close, my farewells given to my friends in the lab and my laptop packed away for the journey home it would seem that that was that for the LancAstro 2019 internship.

This internship has been a fantastic opputunity to expand my skills and experience in Python and coding in general. It’s taught me how to plot using Python, a skill I’m sure will come in handy countless times in the future. I’ve experienced the highs and lows of trying to make a package of scripts/ modules; the challenge of trying to make them generic and not turn them into a mess of code that only I can understand. And I’ve had fantastic fun sitting in a lab full of intelligent, funny and charming people; us all sharing in the trials and tribulations that work like this brings and the exholtation of success. The satisfaction of that graph finally looking right, to the frustration of a circle being plotted as a 3-sided, open square with curved sides (shout-out to Charlie Alexander there who joined us in the lab working for Dr Licia Ray 😉 ), and of course a continous stream of tea, coffee, biscuits and baked goods that kept us soldiering on. So thank you to all the interns in the astro lab this summer who made working their a real joy! You can check out the blogs of what they’ve done this summer which I had the opputunity to see being worked on, here:

I’d like to give a special thank you to Dr David Sobral who without his incredible hard-work and dedication we wouldn’t have these opputunities like the XGAL internships that provide more than just work in the summer, but also a fantastically designed programme which includes:

  • Weekly XGAL meetings to share our projects with himself and his PhD students and see their work
  • Lunches/ tea and coffee sessions every week with the rest of the astrophysics academics which gives us a chance to hear about their research and hear what academic life is like
  • A weekly paper review where we all present recent papers, giving us a chance to peer into the latest science.

But wait, the work continues! I know there are many improvements to be made to the code, bugs to squash, modules to add and an all important manual for users to be able to try and deceipher this web of modules I’ve knotted to make. I’ll also be able to apply some of the new skills I’ve learnt so far on my 2nd internship of the summer like using GitHub and PyCharm! So the blog will continue for another bonus summer holiday installment; LancAstro: Plotting Beneath the Decks on the High Seas

– Harry

Weeks 3 & 4: Beta Edition

Picking up from the last two weeks of the internship; the first modules of LancAstro had been created and seemed to be working. Now it was time to create the more complex scripts envisioned, such as 3D plotting, scripts to fit functions to data, regression etc.

To give me a better chance of doing this, my supervisor David decided that it would be best to go through example scripts he had to make specific figures, like those for papers he had wrote. My task would be to go through them and organise each one to have the same generic structure.

Firstly a ‘pre-amble’. This sets out the name of the script, the date it was last edited, its author and version number, plus a short description of what it does. Lets use the Ionisation_n_Excitations.py script as an example of this standard format:

# ============================================================================
# ======================= Ionisation_n_Excitations ===========================
# Author: Dr David Sobral
# Version: 
# Date: 22.07.2019
# Script to demonstrate the Boltzmann and Saha functions with ionised hydrogen
# Edited by Harry Baker as part of the LancAstro Example Library
# for the XGAL Internships 2019 

Next comes the imports where packages are imported into Python and named. These were corrected for Python 3.7 use and removed any instances of

from numpy import *


import numpy as np

or anything similar as using * means methods are called without designating which package they are from which can lead to errors if two packages share methods with the same names.

# ============================================================================
# 							  IMPORTS	
# ============================================================================

from astropy.io import fits as pyfits	# Import pyfits to handle FITS files
import matplotlib as mpl 				# Import matplotlib for various stuff
import matplotlib.pyplot as plt 		# Import pyplot for plotting
import numpy as np 						# Import numpy for maths functions
mpl.rc('text', usetex = True) 			# Set to use LaTeX font formatting

Now follows the functions or methods. Each one has a ‘docstring’ which is a comment section under the definition of the method which gives a description of its function, then lays out the arguments it takes and what it should return.

# ============================================================================
# ============================================================================

def Boltzmann(T,g1=2.,g2=8.,DE=10.2):
	""" Creates the Boltzmann distribution

		T: Temperature



		The Boltzmann distribution
	Kb = 1.38*10.**(-23.)
	Ev_to_J = 1.6*10.**(-19.)
	return (g2/g1)*np.e**(-(DE*Ev_to_J)/(T*Kb))

As you can see, I didn’t fully understand the arguements for each method well enough to put down their descriptions in the docstring but hopefully people could see the general idea of how to lay one out.

Python does allow for some pretty cool use of docstrings. If written correctly, a user can type


into the terminal and it will return the docstring for that method, so you can see what arguments it should have etc.

All the parameters or settings that are used come next. This mainly consists of a series of commands which update the Matplotlib.rcParams which set out all the default settings of how Matplotlib outputs figures. These include axis tick sizes, label sizes, font sizes, etc.

By standardising these across all scripts, figures can always be produced which have the same style, lets say XGAL style!

# ============================================================================
# ============================================================================
# FIGURE AND AXIS SIZES ======================================================
mpl.rcParams['xtick.labelsize'] = 14		# Size of x-tick labels
mpl.rcParams['ytick.labelsize'] = 14		# Size of y-tick labels
mpl.rcParams['axes.labelsize'] = 14			# Size of axes labels
mpl.rcParams['image.origin'] = 'lower'		# 

mpl.rcParams['axes.linewidth'] = 2.5		# Size of axes line width
mpl.rcParams['xtick.major.size'] = 14		# Size of major x-ticks	
mpl.rcParams['xtick.minor.size'] = 6		# Size of minor x-ticks
mpl.rcParams['xtick.major.width'] = 2.5		# Width of major x-ticks  
mpl.rcParams['xtick.minor.width'] = 1.5		# Width of minor x-ticks
mpl.rcParams['ytick.major.size'] = 14		# Size of major y-ticks
mpl.rcParams['ytick.minor.size'] = 6		# Size of minor y-ticks
mpl.rcParams['ytick.major.width'] = 2.5		# Width of major y-ticks
mpl.rcParams['ytick.minor.width'] = 1.5		# Width of minor y-ticks

# Sets font style and size
mpl.rcParams.update({'font.size': 18, 'font.weight': 'bold'})

# FIGURE =====================================================================
filename = 'Excited_H_demo.pdf'
figsize = (14.7,6)

# AXIS DIMENSIONS ============================================================
x_min = 0.0                				# Minimum x-axis value to plot
x_max = 20900            				# Maximum x-axis value to plot
y_min = -0.01           				# Minimum y-axis value to plot
y_max = 1.08        					# Maximum y-axis value to plot

axis_ranges = [x_min,x_max,y_min,y_max] # Dimensions of axes

# AXIS TICKS =================================================================
# Positions of x-ticks
x_ticks = np.arange(0,21000,2000)

# Positions of y-ticks        		  
y_ticks = np.round(np.arange(0,1.1,0.2),1)

You can see how this section also defines the axis-tick positions and labels and the range of the plot so that in the main, the commands to set these are always the same for every script, thereby making it much clearer to the user how a figure is actually set up and created.

The final part is the ‘main’. This is where the actual program is executed, mainly consisting of creating/ loading data, processing it to be ready to plot, then plotting it and producing and customising the figure ready to be saved.

# ============================================================================
# ============================================================================
# PLOTTING ===================================================================

f, ax1 = plt.subplots(1,1, sharey=True, figsize=figsize)

# Adjust spacing between sub-figures
plt.subplots_adjust(wspace = 0.02)

# Create a range of temperatures from 0K-30000K in increments of 100K
x = np.arange(1.0,30000.,100)

# Create the Boltzmann distribution with given temperature range
y = Boltzmann(x)

# Produces tbe Saha partition function with given temperature range
y2,y3 = Saha(x)

# Plot Boltzmann distribution
Boltz = ax1.plot(x,y*1000.,color = 'b', linestyle = '-',linewidth = 3.9,

At first, editing scripts was quite laborious. Many were written in Python 2.7. That’s fine, fairly easy to convert over to Python 3.7 by removing/ editing the print statements for example of a classic difference between the languages:

print "Hello world!"        # Python 2.x
print("Hello world!")       # Python 3.x

However, the script may run then but there will be a few subtle differences between the original figure and the new one. Small things like ticks missing on the top and right axis, or labels a few points to the side of where they were etc. These often-required hours of tinkering to find a small piece of syntax that may fix it or having to adjust the co-ordinates of a label several times till it lined up again, such as the example of Fig 1.

Fig 1: Figure to demonstrate Wien’s Law. Converting over from Python 2 to 3, all the labels for each band of the EM spectrum were mis-aligned so their co-ordinates had to be manually redefined

After the first few days though I managed to get into the rhythm of it by finding the common commands to convert over to Python 3 that would often mean the new figure was identical to the original first time. The larger task was re-organising the code into the format I spoke of earlier. This required me to scour through the code to find where each bit should belong. Also tricky was attempting to write the docstrings for functions that I couldn’t understand the meaning of. At least though the main purpose of this exercise was to produce example code of how to make different python figures so how the data was created/ handled didn’t really matter. And, as I endeavoured to find out how these different scripts worked, I was learning more Python skills which was also the point of doing this.

Fig 2: Example of a fairly complicated figure with subplots and different x-axis made from a script updated to Python 3 taken from Sobral et. al 2018 paper

By the end of the 4th week of the internship, I had fully re-organised and updated 11 scripts, 2 of which are for figures in a 2018 paper by Sobral et. al concerning SC4K data.

Now I had to compile these into some documentation that would show the example figure and which script made it, and explain how these are laid out and work. So, I produced a document in LaTeX with some cool boxes to highlight how to say make a legend or multiple plots. I put the XGAL logo I made in the last blog in there too on the front cover for good measure and hey presto!

Also during the fortnight, I’ve been improving and tinkering with the existing modules of LancAstro.py. Heather and Amaia have been working on outreach projects (check out their XGAL from Lancaster to the World blog), one of which required a gif showing how a spectrum changes with increasing redshift with filter profiles plotted on the same figure too. This seemed like a perfect job for ScatterPlot! With a bit of fettling, and script that executed ScatterPlot in a for loop with increasing redshift, outputting the figure to an iterate-able filename, a series of frames were created!

ScatterPlot had done its job. It had gone once more into the breach and survived with some new options and parameters to show for it!

My reward for its outstanding performance? ScatterPlot being reorganised into Plot2D, for a safe and secure LancAstro!

This could now plot any array of data, line, scatter etc with different point styles, colours, labels, errorbars. You name it, Plot2D could *probably* do it!

And that rounds up the fortnight’s work. Stay tuned for LancAstro: Release Edition which will detail the final week of my internship as LancAstro gets new modules and ventures into the 3rd dimension!  


Weeks 1 & 2: Alpha Edition

I started on the first day of the internship on the busy Extrav week at the university not really remembering what I had signed up for several months prior, just that it would involve lots of Python coding, which I couldn’t complain about!

Coding is not, at least to my mind, like riding a bike; you do not just learn it in childhood, take the stabilisers off at 9 and yet ride say 10 years later just like you used to be able to all that time ago on summery days along bridal paths that criss-crossed the Derbyshire Dales and feel like you can go as fast as the great locomotives used to thunder down those same routes decades prior. No, with coding it seems your brain has to be re-wired everytime you come back to it, even after the shortest spell away.

Therefore my first task was to conduct an appraisal of the numerous Python scripts that are stored on the Box folder of the XGAL group which would also double up as a re-acquaintance with coding. These have been produced over the years by staff, PhD students, interns and students of the Astrophysics Group Project to reduce images, produce catalogues, fit functions to data and create graphs of data. This took several days but allowed me to discover the huge amount of cool code that has been made and could be utilised to make new, generic code that everyone could use.

LancAstro.py will be the main focus of this internship; a Python package of modules that will be useful for staff and students alike. I envisioned it to be constructed of the following pieces of code, or modules:

  • HistoPlot.py To construct bins from data plot histograms
  • ScatterPlot.py To plot scatter points with error bars
  • FunctionPlot.py To plot functions
  • 3DPlot.py To plot 3D ‘data cubes’
  • Fitter.py To fit to data for analysis
  • SpectralFit.py Fit to lines in spectra
  • ImageRedux.py To reduce images
  • CatExtract.py To make catalogues of stars from an image
  • SSE.py To extract info for a single source
  • BootStrapper.py To produce a region of confidence about points
Fig 1: My optomistic gantt chart for the LancAstro internship project

With this plan laid out (as detailed in the gantt chart in Fig 1) I set off on making the first module; HistoPlot. For this I took a Python script from the Box folder that already made a histogram for a specific data set, and attempted to try and make it generic.

However, I quickly ran into issues. The variables in the code had been named specifically for the task the original author had created it for, so without context it was difficult to work out how it actually worked!

Fig 2: Example of how HistoPlot creates bins from the data. It does this by looping through each data point and checking if it is within +/- width of the bins from the centre of each bin and adding this to the count for the bin.

With a bit of help though, I realised it was actually marvelously simple to code into Python a bin constructor. The code simply loops through each data point one would wish to bin (this is the first ‘for’ loop in Fig 2), then loops through the centre of each bin with each data point and checks whether it is less than/ more than the centre of the bin +/- the width of the bins.

The code then has to convert this into arrays that can be plotted to look like a histogram. This requires it to add the ‘corners’ of each bin so that when plotted as a line, it will look like a bar-histogram.

With this working, the next step was to plot the counts for each bin. It took awhile to realise that the points on the x-axis for the left most and right most edge of the plot must be included so the lines plotted touch the x-axis and complete the bars.

Fig 3: Histogram of SC4K data processed by Sergio Santos and then Josh Butterworth, binned and plotted using an early version of HistoPlot.py

To test HistoPlot, I used SC4K data processed by Sergio Santos into SEDs and then by Josh Butterworth for his internship on SC4K: M* and UV LFs. As can be seen in Fig 3, at first I just plotted one variable but I wanted the code to be as generic as possible so I now needed to make it plot multiple variables on the same plot from data loaded in from text files or FITS files, which is shown in Fig 4.

Fig 4: Method in HistoPlot to load columns of data from a FITS file. The user has to parse arrays with the paths to the files, the names of the files and the names of the columns to extract and put in a 2D array called DATASET to plot with

The code then has to loop through each variable to plot, send this to make_bins to construct the bins as shown in Fig 2, and then plot this result. It also fills underneath in the same colour as the line. The ‘handle’ for each plot is returned to make the legend with all the plots. The handle is an object created with each plot in PyPlot that tells pyplot.legend() what each legend entry should look like. Be warned, legend handles are weird and can be quite tempermental as I found out but eventually I managed to produce this plot in Fig 5.

Fig 5: SC4K data binned into redshifts by Josh Butterworth and then binned into a histogram by my code!

To make the code easier to understand and edit by its users, I moved as many of the variables as I could to control the binning of data and parameters of the plot and figure to the top of the code.

Fig 6: Parameter definitions for HistoPlot in one easy to read place

I then thought I’d spruce up the code by putting a loading bar into the terminal output to update the user on the progress of loading the data, or plotting the histogram. I found some code for this on StackOverflow.com and created a module with it that then could be imported into other LancAstro modules, as shown below!

Fig 7: GIF of the the terminal output for making a histogram with the progress bar!

I’m pretty pleased that I got that code to work!

Fig 8: A finished figure of the same SC4K data from Josh’s project as shown in Fig 5

This had taken me up to the beginning of the second full week of the internship. There had been a week’s break in my internship after Week 1 for the National Astronomey Meeting that was taking place at Lancaster this year. As part of that break, I thought I’d try and make some space mission style patches for the XGAL internship and the LancAstro package!

I made these in Inkscape using pre-made graphics for a mod of a game called Kerbal Space Program (KSP) . I think I should make some t-shirts and mugs with these on!

Going into Week 2, HistoPlot.py was now in what I deemed V1 ready and so it was time to move on and make ScatterPlot. I thought this probably wouldn’t take much time as the two modules would share a lot of the same code; the same loading data methods, very similar plotting methods, the same figure creation parameters and methods etc. However, it turned out that quite a bit of the code for HistoPlot was fairly broken!

I had made a few methods to automatically adjust the axis dimesions to fit the range of all the variables but this turned out to cut the axis short. Another method was meant to check the length of the array which held different parameters for each variable (such as the colour of each plot for example) and that this was the same length as the number of variables. If this was not true (which I deemed likely as if someone wanted to plot 14 different variable sets for instance, they probably couldn’t be bothered to type out that many marker styles for instance) the code would get the first entry of the array and set this to all entries and make the array the correct length. The way I had coded this however meant that it often appended whole arrays of numbers or adjusted the wrong value.

The user also had to send a lot of arguements when they wanted to use the module because I didn’t realise that in Python you can set optional arguements. I had managed to create the monster I had sworn to try and vanquish! A user had to parse the data or parameters in a very particular way, otherwise it would not work.

Fig 9: A successful figure made with ScatterPlot from SC4K data processed by Sergio Santos and Josh Butterworth

However, after realising the error of my ways, I managed to get ScatterPlot to work, including asymmetrical errors on some test data from Josh’s project.

Therefore, at the end of Week 2, the state of LancAstro was this:

As one can see, I’m fairly behind schedule but the more I learn about how Python works, the quicker I should be able to progress.