100 Evaluating Classifiers

100 Evaluating Classifiers#

COM6018

1. Introduction#

This notebook have been written to accompany the Face Verification assignment. The notebook is unlike the previous labs notebooks, in that it is not a step-by-step guide to a solution. Instead, it contains some notes and snippets of code that you may find useful. The lab uses ideas from the lecture on classifier evaluation and applies them to the assignment data.

2. Loading the data provided#

The assignment makes use of a number of data files that you will need to download from the following location:

https://drive.google.com/drive/folders/10y3e2zKkh0lVpRZ3WC21Uu-v-EcbBSYs?usp=sharing

As described in the assignment handout, you are provided with the following files:

train.joblib - the full training dataset
eval1.joblib - a dataset for evaluating your model
baseline_model.joblib - pre-trained kNN models, i.e., a baseline solution

If you have not downloaded these already, do so now and store them in the same directory as this notebook.

We will then load the datasets and the baseline model.

2.1 The training data#

We will first load the training data

import joblib
data_train = joblib.load('train.joblib')
print(data_train.keys())
print(data_train['data'].shape)
print(data_train['target'].shape)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[1], line 2
      1 import joblib
----> 2 data_train = joblib.load('train.joblib')
      3 print(data_train.keys())
      4 print(data_train['data'].shape)

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/joblib/numpy_pickle.py:650, in load(filename, mmap_mode)
    648         obj = _unpickle(fobj)
    649 else:
--> 650     with open(filename, 'rb') as f:
    651         with _read_fileobject(f, filename, mmap_mode) as fobj:
    652             if isinstance(fobj, str):
    653                 # if the returned file object is a string, this means we
    654                 # try to load a pickle file generated with an version of
    655                 # Joblib so we load it with joblib compatibility function.

FileNotFoundError: [Errno 2] No such file or directory: 'train.joblib'

The data is stored in a 2-D array of 2200 rows of 5828 values, where each row represents the pixels in a pair of images of shape 62 rows by 47 colums. For convenience we will reshape this into a 4-D array off 2200 x 2 x 62 x 47, i.e., 2200 samples of 2 images of 62 by 47 pixels. We will then store the reshaped data back in the data_train dictionary under a key called ‘images’.

data_train['images'] = data_train['data'].reshape((2200, 2, 62, 47))
print(data_train['images'].shape)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 data_train['images'] = data_train['data'].reshape((2200, 2, 62, 47))
      2 print(data_train['images'].shape)

NameError: name 'data_train' is not defined

We can now write a function to take any image pair and display them

import matplotlib.pyplot as plt

def display_image_pair(images, n):
    """Display the nth image pair"""
    plt.subplot(1, 2, 1)
    plt.imshow(images[n, 0], cmap='gray')
    plt.subplot(1, 2, 2)
    plt.imshow(images[n, 1], cmap='gray')
    plt.show()
    
# show the 100th image pair
display_image_pair(data_train['images'], 100)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 12
      9     plt.show()
     11 # show the 100th image pair
---> 12 display_image_pair(data_train['images'], 100)

NameError: name 'data_train' is not defined

2.2 Loading the evaluation data#

We will now load the data and reshape it in the same way that we reshape the training data. Note that for the evaluation data there are 1000 pairs.

data_eval = joblib.load('eval1.joblib')
data_eval['images'] = data_eval['data'].reshape((1000, 2, 62, 47))

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[4], line 1
----> 1 data_eval = joblib.load('eval1.joblib')
      2 data_eval['images'] = data_eval['data'].reshape((1000, 2, 62, 47))

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/joblib/numpy_pickle.py:650, in load(filename, mmap_mode)
    648         obj = _unpickle(fobj)
    649 else:
--> 650     with open(filename, 'rb') as f:
    651         with _read_fileobject(f, filename, mmap_mode) as fobj:
    652             if isinstance(fobj, str):
    653                 # if the returned file object is a string, this means we
    654                 # try to load a pickle file generated with an version of
    655                 # Joblib so we load it with joblib compatibility function.

FileNotFoundError: [Errno 2] No such file or directory: 'eval1.joblib'

To check that this has worked we will use our previous display function to show the 100th image pair.

display_image_pair(data_eval['images'], 100)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 display_image_pair(data_eval['images'], 100)

NameError: name 'data_eval' is not defined

Try calling the display function with different values for the index.

# SOLUTION

If your jupyter environment allows the use of widgets then the following code should allow you to browse through the training data image set.

# First we need to pip install the ipywidget package (-q supresses the pip output)
!pip -q install ipywidgets

from ipywidgets import interact, IntSlider, Dropdown

index_slider = IntSlider(value=0, min=0, max=2199, description="Image Pair Index")


def display_image_pair_wrapper(n):
    """Provides a 1-parameter interface for display_image_pair"""
    display_image_pair(data_train['images'], n)

interact(display_image_pair_wrapper, n=index_slider);

2.3 The baseline model#

A baseline model has been trained for you and stored in the files baseline_model.joblib. This can be loaded using the joblib library. The code below loads model.

import joblib

model = joblib.load('baseline_model.joblib')

print(model)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[10], line 3
      1 import joblib
----> 3 model = joblib.load('baseline_model.joblib')
      5 print(model)

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/joblib/numpy_pickle.py:650, in load(filename, mmap_mode)
    648         obj = _unpickle(fobj)
    649 else:
--> 650     with open(filename, 'rb') as f:
    651         with _read_fileobject(f, filename, mmap_mode) as fobj:
    652             if isinstance(fobj, str):
    653                 # if the returned file object is a string, this means we
    654                 # try to load a pickle file generated with an version of
    655                 # Joblib so we load it with joblib compatibility function.

FileNotFoundError: [Errno 2] No such file or directory: 'baseline_model.joblib'

Note above that the print statement provides a description of the model. You can see that it is Pipeline with a single step, the KNN Classifier, constructed with \(k=1\). The performance of this model is not great and it should be easy for you to improve on it.

We can check the performance by using the model’s score method and passing the evaluation data as follows,

percent_correct = model.score(data_eval['data'], data_eval['target']) * 100 
print(f'The classifier is {percent_correct:.2f}% correct')

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 percent_correct = model.score(data_eval['data'], data_eval['target']) * 100 
      2 print(f'The classifier is {percent_correct:.2f}% correct')

NameError: name 'model' is not defined

You should get a score of 56.3%. This is significantly above chance (50%) but there is plenty of room for improvement.

3 Retraining your own KNN model#

We should be able to replicate the baseline model score by training our own KNN.

3.1 Training a KNN#

Below we make a classifier, train it and then evaluate it.

# Import necessary libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Create a KNN classifier with k=1
knn = KNeighborsClassifier(n_neighbors=1)

# Train the model
knn.fit(data_train['data'], data_train['target'])

# Make predictions on the test set
y_pred = knn.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 9
      6 knn = KNeighborsClassifier(n_neighbors=1)
      8 # Train the model
----> 9 knn.fit(data_train['data'], data_train['target'])
     11 # Make predictions on the test set
     12 y_pred = knn.score(data_eval['data'], data_eval['target']) * 100

NameError: name 'data_train' is not defined

3.2 Repeating using a pipeline#

We will now repeat the process using a pipeline. The pipeline will have just the classifier and no preprocessing steps. This is not very useful but it illustrates the idea.

from sklearn.pipeline import Pipeline

# Create a pipeline with the KNN classifier
knn_pipeline = Pipeline([
    ('classifier', KNeighborsClassifier(n_neighbors=1))
])

# Train the pipeline
knn_pipeline.fit(data_train['data'], data_train['target'])

# Evaluate the pipeline
y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 9
      4 knn_pipeline = Pipeline([
      5     ('classifier', KNeighborsClassifier(n_neighbors=1))
      6 ])
      8 # Train the pipeline
----> 9 knn_pipeline.fit(data_train['data'], data_train['target'])
     11 # Evaluate the pipeline
     12 y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100

NameError: name 'data_train' is not defined

we should get the same result.

But we can now add some preprocessing steps to the pipeline with very little modification to the code. For example, if we want to do standard scaler normalisation of the pixel features.

from sklearn.preprocessing import StandardScaler

# Create a pipeline with the KNN classifier
knn_pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Step 1: Normalize the data
    ('classifier', KNeighborsClassifier(n_neighbors=1))  # Step 2: Perform the classification
])

# Train the pipeline
knn_pipeline.fit(data_train['data'], data_train['target'])

# Evaluate the pipeline
y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 10
      4 knn_pipeline = Pipeline([
      5     ('scaler', StandardScaler()),  # Step 1: Normalize the data
      6     ('classifier', KNeighborsClassifier(n_neighbors=1))  # Step 2: Perform the classification
      7 ])
      9 # Train the pipeline
---> 10 knn_pipeline.fit(data_train['data'], data_train['target'])
     12 # Evaluate the pipeline
     13 y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100

NameError: name 'data_train' is not defined

3.3 Writing your own pipeline processing steps#

What happens if you want to add your own data processing step to the pipe, i.e., a function that is not already defined in sklearn.

For example, let us imaging that we write a function that downsamples the images by taking every nth pixel. We will write a function that takes the data as input, processes it and then returns the processed data.

def transform(data, factor):
    """ A crude downsampling of the images"""
    return data[:, ::factor])

  Cell In[15], line 3
    return data[:, ::factor])
                            ^
SyntaxError: unmatched ')'

This function is fine but to put it compatible with the sklearn pipeline framework, we need to make it part of a ‘custom transformer’ class. A transformer is a class that provides a ‘transform’ method that applies the transform. It can also have some parameters (which might be learnable), and a ‘fit’ method that is used to learn the parameters.

The class has to inherit from a pair of sklearn classes: BaseEstimator and TransformerMixin. The code looks like this:

from sklearn.base import BaseEstimator, TransformerMixin

class MyDownsample(BaseEstimator, TransformerMixin):
    def __init__(self, factor=2):
        self.factor = factor # The downsampling factor
        
    def fit(self, X, y=None):
        return self  # There is no fitting
    
    def transform(self, data, y=None):
        """Downsample the data"""
        return data[:,::self.factor]

We can now use this in our pipeline as follows

# Create a pipeline with the KNN classifier
knn_pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Step 1: Normalize the data
    ('downsample', MyDownsample(factor=2)), # Step 2: Downsample by a factor of 2
    ('classifier', KNeighborsClassifier(n_neighbors=1))  # Step 3: Perform the classification
])

# Train the pipeline
knn_pipeline.fit(data_train['data'], data_train['target'])

# Evaluate the pipeline
y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[17], line 9
      2 knn_pipeline = Pipeline([
      3     ('scaler', StandardScaler()),  # Step 1: Normalize the data
      4     ('downsample', MyDownsample(factor=2)), # Step 2: Downsample by a factor of 2
      5     ('classifier', KNeighborsClassifier(n_neighbors=1))  # Step 3: Perform the classification
      6 ])
      8 # Train the pipeline
----> 9 knn_pipeline.fit(data_train['data'], data_train['target'])
     11 # Evaluate the pipeline
     12 y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100

NameError: name 'data_train' is not defined

You can save this pipeline to a model file in the usual way. If someone want to load and use your model file then they will be able to do so as long as:

Dependencies Are Installed: The person receiving the model must have the same Python environment with compatible versions of scikit-learn and any other libraries (e.g., numpy, joblib).
Custom Transformers Are Available: If your pipeline includes a custom transformer like MyDownsample, they need access to its code to deserialize and use the model. This is because joblib serializes the references to the class, not the class definition itself.

This will be fine for the assignment because you have been asked to provide your train.py code. But you need to ensure that any custom transformer classes that you have written are included in the train.py file.

In cases like the above, where the transformer has no parameters that need to be fitted, you can simplify the code by using sklearn’s ‘FunctionTransformer’ class, that will turn a simple function into a pipeline transformer for you.

In this case you do not need to define a new class but can just define the function and use FunctionTransformer like this,

from sklearn.preprocessing import FunctionTransformer

# Below is the function that will be used in the pipeline...
def my_downsample(data, factor):
    return data[:, ::factor]

# ... and here is how it is added to the pipeline. Note how FunctionTransformer wraps around my_downsample
knn_pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Step 1: Normalize the data
    ('downsample', FunctionTransformer(my_downsample, kw_args={'factor': 2})), # Step 2: Downsample
    ('classifier', KNeighborsClassifier(n_neighbors=1))  # Step 3: Perform the classification
])

# Train the pipeline in the usual way
knn_pipeline.fit(data_train['data'], data_train['target'])

# Evaluate the pipeline
y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 15
      8 knn_pipeline = Pipeline([
      9     ('scaler', StandardScaler()),  # Step 1: Normalize the data
     10     ('downsample', FunctionTransformer(my_downsample, kw_args={'factor': 2})), # Step 2: Downsample
     11     ('classifier', KNeighborsClassifier(n_neighbors=1))  # Step 3: Perform the classification
     12 ])
     14 # Train the pipeline in the usual way
---> 15 knn_pipeline.fit(data_train['data'], data_train['target'])
     17 # Evaluate the pipeline
     18 y_pred = knn_pipeline.score(data_eval['data'], data_eval['target']) * 100

NameError: name 'data_train' is not defined

Again, trained pipelines using functions wrapped with FunctionTransformer like this can be save to joblib files and shared with others, as long as you also provide your code.

4. Evaluating using an ROC curve#

In the following, I have provided an example of how to generate an ROC curve.

First, KNNs are not naturally probabilistic classifiers and so do not provide a very meaningful probability value when calling ‘score_proba’. A value can be obtained when a large value of K is used but a large value of K gives poorer performance. So to make a better illustration I am swapping to using a LogisticRegression classifier.

The code below follows almost exactly the code in this weeks tutorial.

from sklearn.metrics import roc_curve, auc
from sklearn.linear_model import LogisticRegression

# ... and here is how it is added to the pipeline. Note how FunctionTransformer wraps around my_downsample
lr_pipeline = Pipeline([
    ('downsample', FunctionTransformer(my_downsample, kw_args={'factor': 2})), # Step 2: Downsample
    ('classifier', LogisticRegression(C=0.01, solver='liblinear'))  # Step 3: Perform the classification
])

# Train the pipeline in the usual way
lr_pipeline.fit(data_train['data'], data_train['target'])

# Evaluate the pipeline
y_pred = lr_pipeline.score(data_eval['data'], data_eval['target']) * 100
print(y_pred)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 11
      5 lr_pipeline = Pipeline([
      6     ('downsample', FunctionTransformer(my_downsample, kw_args={'factor': 2})), # Step 2: Downsample
      7     ('classifier', LogisticRegression(C=0.01, solver='liblinear'))  # Step 3: Perform the classification
      8 ])
     10 # Train the pipeline in the usual way
---> 11 lr_pipeline.fit(data_train['data'], data_train['target'])
     13 # Evaluate the pipeline
     14 y_pred = lr_pipeline.score(data_eval['data'], data_eval['target']) * 100

NameError: name 'data_train' is not defined

We can now, i) use predict_proba to get scores; ii) calculate the FPR and TPR for different score thresholds and iii) compute the area under the curve (AUC).

eval_scores = lr_pipeline.predict_proba(data_eval['data'])
eval_scores_positive_class = eval_scores[:,1]
fpr, tpr, _ = roc_curve(data_eval['target'], eval_scores_positive_class)
roc_auc = auc(fpr, tpr)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 eval_scores = lr_pipeline.predict_proba(data_eval['data'])
      2 eval_scores_positive_class = eval_scores[:,1]
      3 fpr, tpr, _ = roc_curve(data_eval['target'], eval_scores_positive_class)

NameError: name 'data_eval' is not defined

The code below then plots the ROC curve in a similar style to in the tutorial.

# Plot the ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='red', linestyle='--', lw=2, label='Random Guess')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.legend(loc='lower right')
plt.grid()
plt.show()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 3
      1 # Plot the ROC curve
      2 plt.figure(figsize=(8, 6))
----> 3 plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC Curve (AUC = {roc_auc:.2f})')
      4 plt.plot([0, 1], [0, 1], color='red', linestyle='--', lw=2, label='Random Guess')
      5 plt.title('Receiver Operating Characteristic (ROC) Curve')

NameError: name 'fpr' is not defined

<Figure size 800x600 with 0 Axes>

From the above we acan see that the AUC is very close to 0.5. The ROC curve lies very close to that which would be obtained if we were simply guessing. This is not surprising given that the accuracy (52.2%) was also close to chance performance.

You may ask whether 52.2% is significantly better than chance in formal statistical sense. We can compute a z-score and a p-value using a Binomial test. The

import scipy.stats as stats

# Parameters
n = 1000  # Total trials
k = 522   # Correct guesses
p_null = 0.5  # Null hypothesis: chance level

# Observed proportion
p_observed = k / n

# Binomial test
binom_result = stats.binomtest(k, n, p_null, alternative='greater')
p_value_binomial = binom_result.pvalue
print(f"Binomial test p-value: {p_value_binomial:.4f}")

Binomial test p-value: 0.0869

This value is above than the standard 0.05% threshold conventionally used for statistical significance. i.e. if we were simply guessing, we would get a result as least as good as this 8.69% of the time. We would not be allowed to call this result significant. Note that this does not mean that our classifier is not doing better than guessing, just that we have not got any evidence that it is.

5. Runnning Python script from the command line#

As an exercise, extract the code from this notebook into a .py file. You can use the template below to help you get started.

"""Python script for evaluating a model"""
import joblib
# Add all the import statements that you need.

# Add all the function definitions that you need

def main():
    """Function to evaluate a model."""

    # This is the first function that gets called. Start adding code here.


if __name__ == "__main__":
    main()