COM6018 Data Science with Python

Week 7 - Introducing Scikit-Learn

Jon Barker

Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Overview

  • Review of Lab Class 5
  • Some common Pandas mistakes
  • Principles of Supervised Learning
  • Introducing Scikit-Learn
  • Preview of Lab Class 6
Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Review of Lab Class 5

  • You were provided with several complex dataset.
  • You were asked to use Matplotlib to reconstruct plots like the ones below.

Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

The Solution Notebook

The solutions to the lab have been released.

Open the Solution Notebook

Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Common Pandas Mistakes

Addressing problems that some of you encountered using Pandas.

  • The Pandas library is very powerful, but it can be tricky to use.
  • Most common problems arise from confusing views and copies.
  • This is explained with examples in the notebook linked below.

Open Pandas Mistakes Notebook

Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Introducing Scikit-Learn

Over the next few weeks we will be using the Scikit-Learn library to address machine learning problems.

What is Scikit-Learn ?

  • Scikit-Learn is a Python library for machine learning.
  • It provides a consistent API for many different machine learning algorithms.
  • It is built on top of NumPy, SciPy and Matplotlib.
Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Machine Learning terminology

Supervised versus Unsupervised Learning

  • Supervised learning uses labelled data to learn a function that maps inputs to outputs. (classification, regression).
  • Unsupervised learning without the need for labels (clustering, dimensionality reduction, anomaly detection, generative modelling).
Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Machine Learning terminology

Classification vs Regression

  • Classification - take an input and predict a discrete output (e.g. is this email spam or not).
  • Regression - take an input and predict a continuous output (e.g. what is the price of this house).

We are going to introduce SciKit-Learn using a classification problem.

Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Scikit-Learn Introductory Tutorial

Link to the Scikit-Learn tutorial

Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Lab Class 6 Preview

We will use Scikit-Learn to build a landmine detection and classification model.

Features:

  • Voltage (V) :Output voltage value from a magnetic sensor
  • High (H): The height of the sensor from the ground.
  • Soil Type (S): A value that corresponds to the amount of moisture in the soil

The output label that we wish to predict is one of 5 classes:

  • 1: No Landmine
  • 2: Anti-Tank Mine
  • 3: Anti-Personnel Mine
  • 4: Booby trapped Anti-Tank Mine
  • 5: M14 Anti-Personnel Mine
Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Lab Class 6 Preview

We will work through all the stages of building a system

  • Visualising the data
  • Generating the training and test data
  • Training a model
  • Tuning the model's hyperparameters
  • Evaluating and analysing the model's performance
Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Lab Class 6

Link to lab class

Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
COM6018 Data Science with Python

Next Steps

  • Prepare for the Week 7 lab class:
    • Read through the Scikit-Learn tutorial.
Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.