COM6018 Data Science with Python

Week 7 - Introducing Scikit-Learn

Jon Barker

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Overview

  • Review of Lab Class 5
  • Some common Pandas mistakes
  • Principles of Supervised Learning
  • Introducing Scikit-Learn
  • Preview of Lab Class 6
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Review of Lab Class 5

  • You were provided with several complex dataset.
  • You were asked to use Matplotlib to reconstruct plots like the ones below.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

The Solution Notebook

The solutions to the lab have been released.

Open the Solution Notebook

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Common Pandas Mistakes

Addressing problems that some of you encountered using Pandas.

  • The Pandas library is very powerful, but it can be tricky to use.
  • Most common problems arise from confusing views and copies.
  • This is explained with examples in the notebook linked below.

Open Pandas Mistakes Notebook

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Introducing Scikit-Learn

Over the next few weeks we will be using the Scikit-Learn library to address machine learning problems.

What is Scikit-Learn ?

  • Scikit-Learn is a Python library for machine learning.
  • It provides a consistent API for many different machine learning algorithms.
  • It is built on top of NumPy, SciPy and Matplotlib.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Machine Learning terminology

Supervised versus Unsupervised Learning

  • Supervised learning uses labelled data to learn a function that maps inputs to outputs. (classification, regression).
  • Unsupervised learning without the need for labels (clustering, dimensionality reduction, anomaly detection, generative modelling).
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Machine Learning terminology

Classification vs Regression

  • Classification - take an input and predict a discrete output (e.g. is this email spam or not).
  • Regression - take an input and predict a continuous output (e.g. what is the price of this house).

We are going to introduce SciKit-Learn using a classification problem.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Scikit-Learn Introductory Tutorial

Link to the Scikit-Learn tutorial

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Lab Class 6 Preview

We will use Scikit-Learn to build a landmine detection and classification model.

Features:

  • Voltage (V) :Output voltage value from a magnetic sensor
  • High (H): The height of the sensor from the ground.
  • Soil Type (S): A value that corresponds to the amount of moisture in the soil

The output label that we wish to predict is one of 5 classes:

  • 1: No Landmine
  • 2: Anti-Tank Mine
  • 3: Anti-Personnel Mine
  • 4: Booby trapped Anti-Tank Mine
  • 5: M14 Anti-Personnel Mine
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Lab Class 6 Preview

We will work through all the stages of building a system

  • Visualising the data
  • Generating the training and test data
  • Training a model
  • Tuning the model's hyperparameters
  • Evaluating and analysing the model's performance
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Lab Class 6

Link to lab class

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Next Steps

  • Prepare for the Week 7 lab class:
    • Read through the Scikit-Learn tutorial.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved