100 Evaluating Classifiers#
COM6018
Copyright © 2024 Jon Barker, University of Sheffield. All rights reserved.
1. Introduction#
This notebook have been written to accompany the Face Verification assignment. The notebook is unlike the previous labs notebooks, in that it is not a step-by-step guide to a solution. Instead, it contains some notes and snippets of code that you may find useful. The lab uses ideas from the lecture on classifier evaluation and applies them to the assignment data.
2. Loading the data provided#
The assignment makes use of a number of data files that you will need to download from the following location:
https://drive.google.com/drive/folders/10y3e2zKkh0lVpRZ3WC21Uu-v-EcbBSYs?usp=sharing
As described in the assignment handout, you are provided with the following files:
train.joblib
- the full training dataseteval1.joblib
- a dataset for evaluating your modelbaseline_model.joblib
- pre-trained kNN models, i.e., a baseline solution
If you have not downloaded these already, do so now and store them in the same directory as this notebook.
We will then load the datasets and the baseline model.
2.1 The training data#
We will first load the training data
import joblib
data_train = joblib.load('train.joblib')
print(data_train.keys())
print(data_train['data'].shape)
print(data_train['target'].shape)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[1], line 2
1 import joblib
----> 2 data_train = joblib.load('train.joblib')
3 print(data_train.keys())
4 print(data_train['data'].shape)
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/joblib/numpy_pickle.py:650, in load(filename, mmap_mode)
648 obj = _unpickle(fobj)
649 else:
--> 650 with open(filename, 'rb') as f:
651 with _read_fileobject(f, filename, mmap_mode) as fobj:
652 if isinstance(fobj, str):
653 # if the returned file object is a string, this means we
654 # try to load a pickle file generated with an version of
655 # Joblib so we load it with joblib compatibility function.
FileNotFoundError: [Errno 2] No such file or directory: 'train.joblib'
The data is stored in a 2-D array of 2200 rows of 5828 values, where each row represents the pixels in a pair of images of shape 62 rows by 47 colums. For convenience we will reshape this into a 4-D array off 2200 x 2 x 62 x 47, i.e., 2200 samples of 2 images of 62 by 47 pixels. We will then store the reshaped data back in the data_train dictionary under a key called ‘images’.
data_train['images'] = data_train['data'].reshape((2200, 2, 62, 47))
print(data_train['images'].shape)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 1
----> 1 data_train['images'] = data_train['data'].reshape((2200, 2, 62, 47))
2 print(data_train['images'].shape)
NameError: name 'data_train' is not defined
We can now write a function to take any image pair and display them
import matplotlib.pyplot as plt
def display_image_pair(images, n):
"""Display the nth image pair"""
plt.subplot(1, 2, 1)
plt.imshow(images[n, 0], cmap='gray')
plt.subplot(1, 2, 2)
plt.imshow(images[n, 1], cmap='gray')
plt.show()
# show the 100th image pair
display_image_pair(data_train['images'], 100)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 12
9 plt.show()
11 # show the 100th image pair
---> 12 display_image_pair(data_train['images'], 100)
NameError: name 'data_train' is not defined
2.2 Loading the evaluation data#
We will now load the data and reshape it in the same way that we reshape the training data. Note that for the evaluation data there are 1000 pairs.
data_eval = joblib.load('eval1.joblib')
data_eval['images'] = data_eval['data'].reshape((1000, 2, 62, 47))
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[4], line 1
----> 1 data_eval = joblib.load('eval1.joblib')
2 data_eval['images'] = data_eval['data'].reshape((1000, 2, 62, 47))
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/joblib/numpy_pickle.py:650, in load(filename, mmap_mode)
648 obj = _unpickle(fobj)
649 else:
--> 650 with open(filename, 'rb') as f:
651 with _read_fileobject(f, filename, mmap_mode) as fobj:
652 if isinstance(fobj, str):
653 # if the returned file object is a string, this means we
654 # try to load a pickle file generated with an version of
655 # Joblib so we load it with joblib compatibility function.
FileNotFoundError: [Errno 2] No such file or directory: 'eval1.joblib'
To check that this has worked we will use our previous display function to show the 100th image pair.
display_image_pair(data_eval['images'], 100)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 display_image_pair(data_eval['images'], 100)
NameError: name 'data_eval' is not defined
Try calling the display function with different values for the index.
# SOLUTION
If your jupyter environment allows the use of widgets then the following code should allow you to browse through the training data image set.
# First we need to pip install the ipywidget package (-q supresses the pip output)
!pip -q install ipywidgets
from ipywidgets import interact, IntSlider, Dropdown
index_slider = IntSlider(value=0, min=0, max=2199, description="Image Pair Index")
def display_image_pair_wrapper(n):
"""Provides a 1-parameter interface for display_image_pair"""
display_image_pair(data_train['images'], n)
interact(display_image_pair_wrapper, n=index_slider);
2.3 The baseline model#
A baseline model has been trained for you and stored in the files baseline_model.joblib
. This can be loaded using the joblib
library. The code below loads model.
import joblib
model = joblib.load('baseline_model.joblib')
print(model)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[10], line 3
1 import joblib
----> 3 model = joblib.load('baseline_model.joblib')
5 print(model)
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/joblib/numpy_pickle.py:650, in load(filename, mmap_mode)
648 obj = _unpickle(fobj)
649 else:
--> 650 with open(filename, 'rb') as f:
651 with _read_fileobject(f, filename, mmap_mode) as fobj:
652 if isinstance(fobj, str):
653 # if the returned file object is a string, this means we
654 # try to load a pickle file generated with an version of
655 # Joblib so we load it with joblib compatibility function.
FileNotFoundError: [Errno 2] No such file or directory: 'baseline_model.joblib'
Note above that the print statement provides a description of the model. You can see that it is Pipeline with a single step, the KNN Classifier, constructed with \(k=1\). The performance of this model is not great and it should be easy for you to improve on it.
We can check the performance by using the model’s score method and passing the evaluation data as follows,
percent_correct = model.score(data_eval['data'], data_eval['target']) * 100
print(f'The classifier is {percent_correct:.2f}% correct')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 percent_correct = model.score(data_eval['data'], data_eval['target']) * 100
2 print(f'The classifier is {percent_correct:.2f}% correct')
NameError: name 'model' is not defined
You should get a score of 56.3%. This is significantly above chance (50%) but there is plenty of room for improvement.
3 Retraining your own KNN model#
We should be able to replicate the baseline model score by training our own KNN.
3.1 Training a KNN#
Below we make a classifier, train it and then evaluate it.
# Import necessary libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
# Create a KNN classifier with k=1
knn = KNeighborsClassifier(n_neighbors=1)
# Train the model
knn.fit(data_train['data'], data_train['target'])
# Make predictions on the test set
y_pred = knn.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 9
6 knn = KNeighborsClassifier(n_neighbors=1)
8 # Train the model
----> 9 knn.fit(data_train['data'], data_train['target'])
11 # Make predictions on the test set
12 y_pred = knn.score(data_eval['data'], data_eval['target']) * 100
NameError: name 'data_train' is not defined
3.2 Repeating using a pipeline#
We will now repeat the process using a pipeline. The pipeline will have just the classifier and no preprocessing steps. This is not very useful but it illustrates the idea.
from sklearn.pipeline import Pipeline
# Create a pipeline with the KNN classifier
knn_pipeline = Pipeline([
('classifier', KNeighborsClassifier(n_neighbors=1))
])
# Train the pipeline
knn_pipeline.fit(data_train['data'], data_train['target'])
# Evaluate the pipeline
y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[13], line 9
4 knn_pipeline = Pipeline([
5 ('classifier', KNeighborsClassifier(n_neighbors=1))
6 ])
8 # Train the pipeline
----> 9 knn_pipeline.fit(data_train['data'], data_train['target'])
11 # Evaluate the pipeline
12 y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
NameError: name 'data_train' is not defined
we should get the same result.
But we can now add some preprocessing steps to the pipeline with very little modification to the code. For example, if we want to do standard scaler normalisation of the pixel features.
from sklearn.preprocessing import StandardScaler
# Create a pipeline with the KNN classifier
knn_pipeline = Pipeline([
('scaler', StandardScaler()), # Step 1: Normalize the data
('classifier', KNeighborsClassifier(n_neighbors=1)) # Step 2: Perform the classification
])
# Train the pipeline
knn_pipeline.fit(data_train['data'], data_train['target'])
# Evaluate the pipeline
y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[14], line 10
4 knn_pipeline = Pipeline([
5 ('scaler', StandardScaler()), # Step 1: Normalize the data
6 ('classifier', KNeighborsClassifier(n_neighbors=1)) # Step 2: Perform the classification
7 ])
9 # Train the pipeline
---> 10 knn_pipeline.fit(data_train['data'], data_train['target'])
12 # Evaluate the pipeline
13 y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
NameError: name 'data_train' is not defined
3.3 Writing your own pipeline processing steps#
What happens if you want to add your own data processing step to the pipe, i.e., a function that is not already defined in sklearn.
For example, let us imaging that we write a function that downsamples the images by taking every nth pixel. We will write a function that takes the data as input, processes it and then returns the processed data.
def transform(data, factor):
""" A crude downsampling of the images"""
return data[:, ::factor])
Cell In[15], line 3
return data[:, ::factor])
^
SyntaxError: unmatched ')'
This function is fine but to put it compatible with the sklearn pipeline framework, we need to make it part of a ‘custom transformer’ class. A transformer is a class that provides a ‘transform’ method that applies the transform. It can also have some parameters (which might be learnable), and a ‘fit’ method that is used to learn the parameters.
The class has to inherit from a pair of sklearn classes: BaseEstimator and TransformerMixin. The code looks like this:
from sklearn.base import BaseEstimator, TransformerMixin
class MyDownsample(BaseEstimator, TransformerMixin):
def __init__(self, factor=2):
self.factor = factor # The downsampling factor
def fit(self, X, y=None):
return self # There is no fitting
def transform(self, data, y=None):
"""Downsample the data"""
return data[:,::self.factor]
We can now use this in our pipeline as follows
# Create a pipeline with the KNN classifier
knn_pipeline = Pipeline([
('scaler', StandardScaler()), # Step 1: Normalize the data
('downsample', MyDownsample(factor=2)), # Step 2: Downsample by a factor of 2
('classifier', KNeighborsClassifier(n_neighbors=1)) # Step 3: Perform the classification
])
# Train the pipeline
knn_pipeline.fit(data_train['data'], data_train['target'])
# Evaluate the pipeline
y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[17], line 9
2 knn_pipeline = Pipeline([
3 ('scaler', StandardScaler()), # Step 1: Normalize the data
4 ('downsample', MyDownsample(factor=2)), # Step 2: Downsample by a factor of 2
5 ('classifier', KNeighborsClassifier(n_neighbors=1)) # Step 3: Perform the classification
6 ])
8 # Train the pipeline
----> 9 knn_pipeline.fit(data_train['data'], data_train['target'])
11 # Evaluate the pipeline
12 y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
NameError: name 'data_train' is not defined
You can save this pipeline to a model file in the usual way. If someone want to load and use your model file then they will be able to do so as long as:
Dependencies Are Installed: The person receiving the model must have the same Python environment with compatible versions of scikit-learn and any other libraries (e.g., numpy, joblib).
Custom Transformers Are Available: If your pipeline includes a custom transformer like MyDownsample, they need access to its code to deserialize and use the model. This is because joblib serializes the references to the class, not the class definition itself.
This will be fine for the assignment because you have been asked to provide your train.py
code. But you need to ensure that any custom transformer classes that you have written are included in the train.py
file.
In cases like the above, where the transformer has no parameters that need to be fitted, you can simplify the code by using sklearn’s ‘FunctionTransformer’ class, that will turn a simple function into a pipeline transformer for you.
In this case you do not need to define a new class but can just define the function and use FunctionTransformer like this,
from sklearn.preprocessing import FunctionTransformer
# Below is the function that will be used in the pipeline...
def my_downsample(data, factor):
return data[:, ::factor]
# ... and here is how it is added to the pipeline. Note how FunctionTransformer wraps around my_downsample
knn_pipeline = Pipeline([
('scaler', StandardScaler()), # Step 1: Normalize the data
('downsample', FunctionTransformer(my_downsample, kw_args={'factor': 2})), # Step 2: Downsample
('classifier', KNeighborsClassifier(n_neighbors=1)) # Step 3: Perform the classification
])
# Train the pipeline in the usual way
knn_pipeline.fit(data_train['data'], data_train['target'])
# Evaluate the pipeline
y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[18], line 15
8 knn_pipeline = Pipeline([
9 ('scaler', StandardScaler()), # Step 1: Normalize the data
10 ('downsample', FunctionTransformer(my_downsample, kw_args={'factor': 2})), # Step 2: Downsample
11 ('classifier', KNeighborsClassifier(n_neighbors=1)) # Step 3: Perform the classification
12 ])
14 # Train the pipeline in the usual way
---> 15 knn_pipeline.fit(data_train['data'], data_train['target'])
17 # Evaluate the pipeline
18 y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
NameError: name 'data_train' is not defined
Again, trained pipelines using functions wrapped with FunctionTransformer like this can be save to joblib files and shared with others, as long as you also provide your code.
4. Evaluating using an ROC curve#
In the following, I have provided an example of how to generate an ROC curve.
First, KNNs are not naturally probabilistic classifiers and so do not provide a very meaningful probability value when calling ‘score_proba’. A value can be obtained when a large value of K is used but a large value of K gives poorer performance. So to make a better illustration I am swapping to using a LogisticRegression classifier.
The code below follows almost exactly the code in this weeks tutorial.
from sklearn.metrics import roc_curve, auc
from sklearn.linear_model import LogisticRegression
# ... and here is how it is added to the pipeline. Note how FunctionTransformer wraps around my_downsample
lr_pipeline = Pipeline([
('downsample', FunctionTransformer(my_downsample, kw_args={'factor': 2})), # Step 2: Downsample
('classifier', LogisticRegression(C=0.01, solver='liblinear')) # Step 3: Perform the classification
])
# Train the pipeline in the usual way
lr_pipeline.fit(data_train['data'], data_train['target'])
# Evaluate the pipeline
y_pred = lr_pipeline.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[19], line 11
5 lr_pipeline = Pipeline([
6 ('downsample', FunctionTransformer(my_downsample, kw_args={'factor': 2})), # Step 2: Downsample
7 ('classifier', LogisticRegression(C=0.01, solver='liblinear')) # Step 3: Perform the classification
8 ])
10 # Train the pipeline in the usual way
---> 11 lr_pipeline.fit(data_train['data'], data_train['target'])
13 # Evaluate the pipeline
14 y_pred = lr_pipeline.score(data_eval['data'], data_eval['target']) * 100
NameError: name 'data_train' is not defined
We can now, i) use predict_proba to get scores; ii) calculate the FPR and TPR for different score thresholds and iii) compute the area under the curve (AUC).
eval_scores = lr_pipeline.predict_proba(data_eval['data'])
eval_scores_positive_class = eval_scores[:,1]
fpr, tpr, _ = roc_curve(data_eval['target'], eval_scores_positive_class)
roc_auc = auc(fpr, tpr)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[20], line 1
----> 1 eval_scores = lr_pipeline.predict_proba(data_eval['data'])
2 eval_scores_positive_class = eval_scores[:,1]
3 fpr, tpr, _ = roc_curve(data_eval['target'], eval_scores_positive_class)
NameError: name 'data_eval' is not defined
The code below then plots the ROC curve in a similar style to in the tutorial.
# Plot the ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='red', linestyle='--', lw=2, label='Random Guess')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.legend(loc='lower right')
plt.grid()
plt.show()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[21], line 3
1 # Plot the ROC curve
2 plt.figure(figsize=(8, 6))
----> 3 plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC Curve (AUC = {roc_auc:.2f})')
4 plt.plot([0, 1], [0, 1], color='red', linestyle='--', lw=2, label='Random Guess')
5 plt.title('Receiver Operating Characteristic (ROC) Curve')
NameError: name 'fpr' is not defined
<Figure size 800x600 with 0 Axes>
From the above we acan see that the AUC is very close to 0.5. The ROC curve lies very close to that which would be obtained if we were simply guessing. This is not surprising given that the accuracy (52.2%) was also close to chance performance.
You may ask whether 52.2% is significantly better than chance in formal statistical sense. We can compute a z-score and a p-value using a Binomial test. The
import scipy.stats as stats
# Parameters
n = 1000 # Total trials
k = 522 # Correct guesses
p_null = 0.5 # Null hypothesis: chance level
# Observed proportion
p_observed = k / n
# Binomial test
binom_result = stats.binomtest(k, n, p_null, alternative='greater')
p_value_binomial = binom_result.pvalue
print(f"Binomial test p-value: {p_value_binomial:.4f}")
Binomial test p-value: 0.0869
This value is above than the standard 0.05% threshold conventionally used for statistical significance. i.e. if we were simply guessing, we would get a result as least as good as this 8.69% of the time. We would not be allowed to call this result significant. Note that this does not mean that our classifier is not doing better than guessing, just that we have not got any evidence that it is.
5. Runnning Python script from the command line#
As an exercise, extract the code from this notebook into a .py file. You can use the template below to help you get started.
"""Python script for evaluating a model"""
import joblib
# Add all the import statements that you need.
# Add all the function definitions that you need
def main():
"""Function to evaluate a model."""
# This is the first function that gets called. Start adding code here.
if __name__ == "__main__":
main()
Copyright © 2024 Jon Barker, University of Sheffield. All rights reserved.