COM6018 Data Science with Python

Week 11 - Evaluating Classifiers

Jon Barker

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Overview

  • Evaluating Classifiers
  • Assignment 2 - Q&A
  • Using LaTex to write your report.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Reading Week

  • No lecture/labs next week.
  • Use the time to work on the assignment.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Evaluating Classifiers

  • In previous labs, we have been evaluating classifiers using the accuracy metric.

  • This is a good starting point, but it is not always the best metric to use.

  • Assumes that all errors are equal.

  • This is not the case in many real-world applications.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Evaluating Classifiers

In many cases, we are more interested in one type of error than another.

We will consider the following metrics:

  • False Positive Rate (FPR), False Negative Rate (FNR) and the ROC curve.
  • Precision, Recall and Precision-Recall curves.

These metrics are defined for binary classifiers, but can be extended to multi-class classifiers

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Evaluating Classifiers

Tutorial 090 - Evaluating Classifier Performance

Link to tutorial

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Assignment 2 - Reminder

The system is a face verification system that takes a pair of images and predicts whether they are of the same person or not.

You have been provided with:

  • A training data set.
  • An evaluation data set.
  • A baseline system that uses a 1-nearest neighbour classifier.

You need to build your own system. Performance will be evaluated on a hidden test set

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Rules

  • Your model file should be named 'model.joblib' and must not exceed 80 MB in size.
  • You can only train your model using the provided training data (augmentation is allowed).
  • You cannot use any pre-trained models.
  • You may only use the standard Python libraries and the following: numpy, matplotlib,
    seaborn, pandas, scikit-learn, joblib, Pillow (for image processing)
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Report Structure

No more than two pages (sides) long and include the following sections:

  • Abstract
  • Introduction
  • System Description
  • Experiments
  • Results and Analysis
  • Conclusions
  • References
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Report Structure

  • Introduction: A brief description of the face verification problem.
  • System Description: A complete description of your verification system pipeline, highlighting critical hyper-parameters to optimise.
  • Experiments: A description of the experiments you conducted to tune your system's hyperparameters, including the construction of your training dataset. Be explicit about which hyperparameters you experimented with and how they influenced the performance of your model. Include results that justify your final choices.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Report Structure (cont.)

  • Results and Analysis: Provide an analysis of the performance of your final system on the evaluation data, including a comparison to the baseline system. Include a table that reports the accuracy of your model. Additionally, provide a brief discussion of any observed trends or insights, highlighting factors that may have influenced the results.
  • Conclusions: A summary of your work, including suggestions for further improvements.
  • References: A list of any references cited in your report.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Final Submission

You will need to submit the following files:

  • train.py - The Python script that trains your classifier. Include clear comments explaining the training
    process and key steps.
  • report.pdf - a PDF report that describes your system and the experiments you have carried out.
  • model.joblib - a joblib file containing your trained model.

The assignment is due by 15:00 on Wednesday, 18th December. Standard lateness penalties will be applied.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Assessment

The final mark will be based on the following criteria:

  • The quality and clarity of your code (20/60)
  • The quality and clarity of the written report (30/60)
  • The performance of your classifier on a hidden evaluation data (10/60)
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Using LaTex to write your report

  • The assignment contains a template for the report in LaTex.

  • You can upload this to Overleaf and edit it online.

Link to Overleaf

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved