COM6018 Data Science with Python

Week 10 - More assignment feedback

Jon Barker

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Overview

  • Group Feedback from Assignment 2 - Q3 and Q4
  • Review of Lab Class 8 (Curve Fitting with Scikit-Learn)
  • Bits and Bobs
  • Assignment 2

Group Feedback from Assignment 1

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Things to consider

  • Is the data correct?
  • Do the plots allow easy comparison?
  • Are the fonts large enough?
  • Is the title clear?
  • Have the axes been labeled?
  • Are the axes units clear?
  • Have good line styles been used?
  • Are the lines labeled clearly?
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3

  • Plot the per capita CO2 emissions against the GDP per capita for each country.

  • Design your plot so that the size of the marker is proportional to the population.

  • Only consider countries with a population of at leat 5 million people.

  • Comment on the relationship between the two variables and how it has changed over time.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q3 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q4

  • Make a plot that compares the distribution of GDP per capita across the countries
    in the world at 10-yearly intervals from 1950 to 2020.
  • Comment on how the distribution has changed over time.
  • In particular, does it appear that wealth inequality has increased or decreased over time
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q4 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q4 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q4 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q4 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q4 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q4 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Q4 plots

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Review of Lab Class 8

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Review of Lab Class 8

Using Scikit-Learn to fit the CO2 atmospheric concentration data.

We will then use our fitting model to predict future CO2 atmospheric concentration.

On what date will the CO2 atmospheric concentration reach 450 ppm?

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

The Task

The stages of the task are as follows,

  • Loading the CO2 data.
  • Fitting a polynomial curve to describe the growth trend.
  • Fitting a periodic function to describe the seasonal variation.
  • Tuning the model hyperparameters (order of polynomial, etc).
  • Evaluating the model.
  • Using the model to make a prediction.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

The Solution Notebook

The solutions to the lab have been released.

Open the Solution Notebook

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Bits and Bobs

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Persisting Models in Scikit-Learn

  • Once trained a model can be saved to disk and then loaded again later.

  • We can use this to distrubute a trained model to other people.

  • Two approaches to saving a model in Scikit-Learn,

    • Pickle
    • Joblib
  • The preferred approach is Joblib.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Saving and loading a model with pickle

pickle is a standard Python library for saving and loading Python objects to disk.

If we have a model called model we can save it to disk with,

import pickle

pickle.dump(model, open('model.pkl', 'wb'))

Then we can load the model later with,

import pickle

# Read the pickle file from disk
model = pickle.load(open('model.pkl', 'rb'))
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Saving and loading a model with joblib

If we have a model called model we can save it to disk with,

import joblib

joblib.dump(model, 'model.joblib')

Then we can load the model later with,

import joblib

model = joblib.load('model.joblib')

# or using a file handle
model = joblib.load(open('model.joblib', 'rb'))

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Next Steps

  • Complete/Review Lab Class 8:

    • Complete the Curve Fitting lab.
  • Lab Class on Wednesday:

    • Will provide support for getting started with the assignment.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved