100 Evaluating Classifiers#
COM6018
Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.
1. Introduction#
This notebook has been written to accompany the speech speech classification assignment. The notebook is unlike the previous labs notebooks, in that it is not a step-by-step guide to a solution. Instead, it contains some notes and snippets of code that you may find useful. The lab uses ideas from the lecture on classifier evaluation and applies them to the assignment data.
2. Loading the data provided#
The assignment makes use of the assignment data files. If you do not already have them, you will need to download them from the following Google Drive link:
https://drive.google.com/drive/folders/1Z_ZcrIbshgAFdj0wy8ou8bKlvi6BXHXe
In this lab, we will be using the following files for the ‘FBANK’ feature set:
fbank_speed.train.joblib- the full training dataset for the speed transformationfbank_speed.test1.joblib- the test set for the speed transformation
and
fbank_tempo.train.joblib- the full training dataset for the tempo transformationfbank_tempo.test1.joblib- the test set for the tempo transformation
If you have not downloaded these already, do so now and store them in the same directory as this notebook.
We will then load the datasets and the baseline model.
2.1 The training data#
We will first load the training data
import joblib
data_train = joblib.load('fbank_speed.train.joblib')
print(data_train.keys())
print(data_train['features'].shape)
print(data_train['target'].shape)
The data is stored in a 2-D array of 4155 rows of 6464 values, where each row represents the elements of a filter-bank with 64 frequency channels by 101 time-frames. For convenience we will reshape this into a 3-D array of 4155 x 64 x 101, i.e., 2200 samples of images of 64 by 101 pixels. We will then store the reshaped data back in the data_train dictionary under a key called ‘images’.
data_train['images'] = data_train['features'].reshape((-1, 64, 101))
print(data_train['images'].shape)
We can now write a function to take any image pair and display them
import matplotlib.pyplot as plt
import numpy as np
def display_filterbank(images, n):
"""Display the nth filterbank image"""
# Note that we use 'gray_r' to display the image with higher values as darker pixels
# This is the conventional way to display spectrograms and filterbanks
# Also set origin='lower' to have low frequencies at the bottom
plt.imshow(images[n], cmap='gray_r', origin='lower')
plt.show()
# show the 100th image pair
display_filterbank(data_train['images'], 100)
2.2 Loading the evaluation data#
We will now load the data and reshape it in the same way that we reshape the training data. Note that for the evaluation data there are 1000 pairs.
data_eval = joblib.load('fbank_speed.test1.joblib')
data_eval['images'] = data_eval['features'].reshape((-1, 64, 101))
To check that this has worked we will use our previous display function to show the 100th image pair.
display_filterbank(data_eval['images'], 100)
Try calling the display function with different values for the index.
# WRITE SOLUTION HERE
In the next section we will use a simple user interface to browse through the training data image set. This will use the ipywidgets package to create a slider and dropdown menu.
If you installed ipywidgets into your environment before starting Jupyter Notebook or vscode, then the import instruction in the next cell should work without error. If you get an error, then you will need to install the package and restart Jupyter Notebook or vscode.
from ipywidgets import Dropdown, IntSlider, interact
If the above does not work then quit your Jupyter environment and install the ipywidgets package using the following commands in your terminal:
uv add ipywidgets
uv sync --activate
Then restart your Jupyter environment and try the above code again.
from ipywidgets import Dropdown, IntSlider, interact
index_slider = IntSlider(value=0, min=0, max=2199, description="Image Pair Index")
def display_filterbank_wrapper(n):
"""Provides a 1-parameter interface for display_filterbank"""
display_filterbank(data_train['images'], n)
interact(display_filterbank_wrapper, n=index_slider);
Below is a more complete demo that adds a legend, axes labels and a title to the plot.
# Set the data to use for the interactive display
fbank_data = data_train
N_CHANNELS = 64
N_SAMPLES = len(fbank_data["features"])
CLASSES = ('VERY SLOW', 'SLOW', 'NORMAL', 'FAST', 'VERY FAST')
@interact(SAMPLE_INDEX=IntSlider(min=0, max=N_SAMPLES-1, step=1, value=0))
def plot_fbank(SAMPLE_INDEX):
fbank_2d = fbank_data["images"][SAMPLE_INDEX]
target = int(fbank_data["target"][SAMPLE_INDEX])
plt.figure(figsize=(10, 4))
plt.imshow(fbank_2d, aspect='auto', origin='lower', cmap='gray_r')
plt.title(f"FBANK for Sample Index {SAMPLE_INDEX}, class = {CLASSES[target]}")
plt.xlabel("Frame index")
plt.ylabel("Filterbank channel")
plt.colorbar()
plt.show()
2.3 The baseline model#
A baseline model has been trained for you and distributed with the assignment. If you have downloaded the assignment code copy the model into the same directory as this notebook.
src/baseline/train_tempo.py models/baseline_fbank/model.tempo.joblib
import joblib
from train_speed import *
model = joblib.load('model.speed.joblib')
print(model)
Note above that the print statement provides a description of the model. You can see that it is Pipeline with a single step, the KNN Classifier, constructed with \(k=1\). The performance of this model is not great and it should be easy for you to improve on it.
We can check the performance by using the model’s score method and passing the evaluation data as follows,
percent_correct = model.score(data_eval['features'], data_eval['target']) * 100
print(f'The classifier is {percent_correct:.2f}% correct')
You should get a score of 79%. This is far above chance (20%) but there is still room for improvement.
3 Retraining your own KNN model#
We should be able to replicate the baseline model score by training our own KNN.
3.1 Training a KNN#
Below we make a classifier, train it and then evaluate it.
from sklearn.metrics import accuracy_score, classification_report
# Import necessary libraries
from sklearn.neighbors import KNeighborsClassifier
# Create a KNN classifier with k=1
knn = KNeighborsClassifier(n_neighbors=1)
# Train the model
knn.fit(data_train['features'], data_train['target'])
# Compute the accuracy of the test set predictions
accuracy = knn.score(data_eval['features'], data_eval['target']) * 100
print(accuracy)
3.2 Repeating using a pipeline#
We will now repeat the process using a pipeline. The pipeline will have just the classifier and no preprocessing steps. This is not very useful but it illustrates the idea.
from sklearn.pipeline import Pipeline
# Create a pipeline with the KNN classifier
knn_pipeline = Pipeline([
('classifier', KNeighborsClassifier(n_neighbors=1))
])
# Train the pipeline
knn_pipeline.fit(data_train['features'], data_train['target'])
# Evaluate the pipeline
accuracy = knn_pipeline.score(data_eval['features'], data_eval['target']) * 100
print(accuracy)
we should get the same result.
But we can now add some preprocessing steps to the pipeline with very little modification to the code. For example, if we want to do standard scaler normalisation of the pixel features.
from sklearn.preprocessing import StandardScaler
# Create a pipeline with the KNN classifier
knn_pipeline = Pipeline([
('scaler', StandardScaler()), # Step 1: Normalize the data
('classifier', KNeighborsClassifier(n_neighbors=1)) # Step 2: Perform the classification
])
# Train the pipeline
knn_pipeline.fit(data_train['features'], data_train['target'])
# Evaluate the pipeline
accuracy = knn_pipeline.score(data_eval['features'], data_eval['target']) * 100
print(accuracy)
3.3 Writing your own pipeline processing steps#
What happens if you want to add your own data processing step to the pipe, i.e., a function that is not already defined in sklearn.
For example, let us imagine that we write a function that downsamples the images by taking every nth pixel. We will write a function that takes the data as input, processes it and then returns the processed data.
def transform(data, factor):
""" A crude downsampling of the images"""
return data[:, ::factor]
This function is fine but to put it compatible with the sklearn pipeline framework, we need to make it part of a ‘custom transformer’ class. A transformer is a class that provides a ‘transform’ method that applies the transform. It can also have some parameters (which might be learnable), and a ‘fit’ method that is used to learn the parameters.
The class has to inherit from a pair of sklearn classes: BaseEstimator and TransformerMixin. The code looks like this:
from sklearn.base import BaseEstimator, TransformerMixin
class MyDownsample(BaseEstimator, TransformerMixin):
def __init__(self, factor=2):
self.factor = factor # The downsampling factor
def fit(self, X, y=None):
return self # There is no fitting
def transform(self, data, y=None):
"""Downsample the data"""
return data[:,::self.factor]
We can now use this in our pipeline as follows
# Create a pipeline with the KNN classifier
knn_pipeline = Pipeline([
('scaler', StandardScaler()), # Step 1: Normalize the data
('downsample', MyDownsample(factor=2)), # Step 2: Downsample by a factor of 2
('classifier', KNeighborsClassifier(n_neighbors=1)) # Step 3: Perform the classification
])
# Train the pipeline
knn_pipeline.fit(data_train['features'], data_train['target'])
# Evaluate the pipeline
accuracy = knn_pipeline.score(data_eval['features'], data_eval['target']) * 100
print(accuracy)
You can save this pipeline to a model file in the usual way. If someone wants to load and use your model file then they will be able to do so as long as:
Dependencies are installed: The person receiving the model must have the same Python environment with compatible versions of scikit-learn and any other libraries (e.g., numpy, joblib).
Custom transformers are available: If your pipeline includes a custom transformer like MyDownsample, they need access to its code to deserialize and use the model. This is because joblib serializes the references to the class, not the class definition itself.
This will be fine for the assignment because you have been asked to provide your train.py code. But you need to ensure that any custom transformer classes that you have written are included in the train.py file.
In cases like the above, where the transformer has no parameters that need to be fitted, you can simplify the code by using sklearn’s ‘FunctionTransformer’ class, that will turn a simple function into a pipeline transformer for you.
In this case you do not need to define a new class but can just define the function and use FunctionTransformer like this,
from sklearn.preprocessing import FunctionTransformer
# Below is the function that will be used in the pipeline...
def my_downsample(data, factor):
return data[:, ::factor]
# ... and here is how it is added to the pipeline. Note how FunctionTransformer wraps around my_downsample
knn_pipeline = Pipeline([
('scaler', StandardScaler()), # Step 1: Normalize the data
('downsample', FunctionTransformer(my_downsample, kw_args={'factor': 2})), # Step 2: Downsample
('classifier', KNeighborsClassifier(n_neighbors=1)) # Step 3: Perform the classification
])
# Train the pipeline in the usual way
knn_pipeline.fit(data_train['features'], data_train['target'])
# Evaluate the pipeline
accuracy = knn_pipeline.score(data_eval['features'], data_eval['target']) * 100
print(accuracy)
Again, trained pipelines using functions wrapped with FunctionTransformer like this can be save to joblib files and shared with others, as long as you also provide your code.
4. Evaluating using an ROC curve#
In the following, I have provided an example of how to generate an ROC curve.
First, KNNs are not naturally probabilistic classifiers and so do not provide a very meaningful probability value when calling predict_proba. A value can be obtained when a large value of K is used but a large value of K gives poorer performance. So to make a better illustration I am swapping to using a LogisticRegression classifier.
The code below follows almost exactly the code in this week’s tutorial.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import auc, roc_curve
# ... and here is how it is added to the pipeline. Note how FunctionTransformer wraps around my_downsample
lr_pipeline = Pipeline([
('downsample', FunctionTransformer(my_downsample, kw_args={'factor': 2})), # Step 2: Downsample
('classifier', LogisticRegression(C=0.01, solver='liblinear')) # Step 3: Perform the classification
])
# Train the pipeline in the usual way
lr_pipeline.fit(data_train['features'], data_train['target'])
# Evaluate the pipeline
y_pred = lr_pipeline.score(data_eval['features'], data_eval['target']) * 100
print(y_pred)
We can now, i) use predict_proba to get scores; ii) calculate the FPR and TPR for different score thresholds and iii) compute the area under the curve (AUC).
from sklearn.metrics import roc_curve, auc
import numpy as np
y_true = data_eval['target']
# Choose which class you want to treat as "positive"
positive_class = 2
# Binarise: 1 for positive_class, 0 for everything else
y_true_bin = (y_true == positive_class).astype(int)
eval_scores = lr_pipeline.predict_proba(data_eval['features'])
y_score = eval_scores[:, positive_class] # probability of that class
fpr, tpr, _ = roc_curve(y_true_bin, y_score)
roc_auc = auc(fpr, tpr)
The code below then plots the ROC curve in a similar style to in the tutorial.
# Plot the ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='red', linestyle='--', lw=2, label='Random Guess')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.legend(loc='lower right')
plt.grid()
plt.show()
from sklearn.metrics import roc_auc_score
eval_scores = lr_pipeline.predict_proba(data_eval['features'])
y_true = data_eval['target']
roc_auc_ovr = roc_auc_score(y_true, eval_scores, multi_class='ovr') # one-vs-rest
# or:
roc_auc_ovo = roc_auc_score(y_true, eval_scores, multi_class='ovo') # one-vs-one
print(f'ROC AUC (OVR): {roc_auc_ovr:.2f}')
print(f'ROC AUC (OVO): {roc_auc_ovo:.2f}')
The notebook has been using the speed modified dataset. You can go back to the start and change it so that it loads from the more challenging tempo modified dataset instead. Performance will drop accordingly.
5. Running a Python script from the command line#
As an exercise, extract the code from this notebook into a .py file. You can use the template below to help you get started.
"""Python script for evaluating a model"""
import joblib
# Add all the import statements that you need.
# Add all the function definitions that you need
def main():
"""Function to evaluate a model."""
# This is the first function that gets called. Start adding code here.
if __name__ == "__main__":
main()
Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.