COM6018 Data Science with Python

Week 5 - Introducing Matplotlib and Seaborn

Jon Barker

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Overview

  • Review of Lab Class 4
  • Principles of Data Visualization
  • Introducing Matplotlib and Seaborn
  • Preview of Lab Class 5
  • The 1st Assignment
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Review of Lab Class 4

  • We continued our analysis of the atmospheric gas concentration data.
  • We used Pandas to repeat the work we had done in Lab Class 2.
  • We introduced more datasets, i.e., SF6 and N2O.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

The Problem

We want to combine the gas concentration datasets to produce a single plot like this,

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Tasks

  • Reading the data from the CSV files.
  • Renaming fields to be consistent across the datasets.
  • Dealing with missing data.
  • Converting all gas concentrations to the same units.
  • Merging the dataset
  • Computing the carbon dioxide equivalent (CO2e) concentration.
  • Making the plot.

We did all of this in NumPy in lab 2 (lots of code!). We will now do it in Pandas (much less code!).

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

The Solution Notebook

Open the Solution Notebook

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Principles of Data Visualization

  • Data visualization is a key part of data science.
  • We need to understand how to present data in a way that is meaningful and informative.

https://www.biostat.wisc.edu/~kbroman/presentations/graphs2017.pdf

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Introducing Matplotlib and Seaborn

What is Matplotlib?

  • Matplotlib is a low-level Python library for creating plots.
  • It is the most widely used plotting library in Python.
  • It is very powerful and flexible.

What is Seaborn?

  • Seaborn is a high-level Python data visualization library built on Matplotlib.
  • It is particularly good for statistical data visualization (i.e., examining the distribution of data).
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Matplotlib and Seaborn Tutorial

Link to the Matplotlib tutorial

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Lab Class 5 Preview

  • You will be provided with several complex datasets.
  • You will use Matplotlib to reconstruct a series of plots.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Lab Class 5

Link to lab class

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Assignment 1

  • This has now been released.
  • Worth 40% of the module mark.
  • Due on Friday 8th November.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Next Steps

  • Review the Week 4 materials:

    • Review the Lab 4 solution notebook if you have not already done so.
    • Make sure you have read the Week 4 tutorial on Pandas
  • Prepare for the Week 5 lab class:

    • Read through the Matplotlib tutorial.
  • Read through the assignment instructions. Bring any questions to the lab class on Friday.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved