COM6018 Data Science with Python

Week 4 - Introducing Pandas

Jon Barker

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Overview

  • Review of Lab Class 3
  • Common traps when using NumPy
  • Introducing Pandas
  • Preview of Lab Class 4
  • Some words about the 1st assignment
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Review of Lab Class 3

  • We continued our analysis of the atmospheric gas concentration data.
  • We want to try and better understand the oscillations in the data.
  • We are interested in extracting information such as:
    • The period of the oscillations.
    • The amplitude of the oscillations.
    • Dates at which the peaks and dips occur.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

The Problem

Finding the peaks and dips in the oscillations is not so straightforward.

Problem:

  • The data is very noisy.
  • The daily measurements can bounce up and down in a seemingly random way.
  • There are many local peaks and dips that are not part of the oscillations we are interested in.

We need to smooth the data to remove the noise.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Tasks

  • Reading datasets from their files.
  • Smoothing the data
    • Using a simple moving average
    • By applying a window weighting function
  • Finding the peaks and dips in the smoothed data.
  • Visualising the peaks and dips in the data.
  • Answering questions about the period and size of the oscillations, etc.

We will do all of this using NumPy.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

The Solution Notebook

Open the Solution Notebook

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Common Traps when using NumPy

Open Notebook

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Introducing Pandas

What is Pandas?

  • Data Tool: Pandas is an open-source Python library for data manipulation and analysis.

  • Tabular Structures: It provides DataFrames for easy handling of structured data.

  • Analysis Functions: Offers diverse functions for cleaning, transforming, and analyzing data.

  • Integration Capability: Seamlessly integrates with popular Python libraries, e.g., NumPy, Matplotlib, etc.

  • Time Series Support: Suitable for time series analysis and handling panel data.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Pandas Tutorial

Link to the Pandas tutorial

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Lab Class 4 Preview

  • We will use Pandas to extend our analysis of greenhouse gas data.
  • We will see how using Pandas makes it very easy to read and manipulate datasets.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Lab Class 4

Link to lab class

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Assignment 1

  • Will be released next Monday.
  • Worth 40% of the module mark.
  • Will be due on Friday Week 6, November 8th (i.e., you will have 2 weeks to complete it).
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Assignment 1

  • You will be given a dataset and a template Jupyter notebook.
  • The template will ask a series of 4 questions about the dataset.
  • You will need to use Pandas and Matplotlib to answer the question.
  • Each question will be answered by producing a plot and a short paragraph of text.
  • Your mark will be based on:
    • the quality of the code,
    • the clarity of the plots, and
    • the answer to the question, i.e., the text.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Next Steps

  • Review the Week 3 materials:

    • Review the Lab 3 solution notebook if you have not already done so.
    • Read the Week 3 tutorial on NumPy
  • Prepare for the Week 4 lab class:

    • Read through the Pandas tutorial.
  • Checkout the reading list for Week 4 on Blackboard.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved