045 Common Pandas Mistakes#

COM6018

Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.

1. Introduction#

This notebook considers some common issues that you may encounter when using Pandas. If you have any suggestions for other common problems to include, please let me know.

Below we will import NumPy and Pandas, which will be used in the examples that follow.

import pandas as pd
import numpy as np

2. The SettingWithCopyWarning#

The ‘SettingWithCopyWarning’ is one of the most commonly encountered problems for new users of Pandas. You will find many people asking about it online. As it is a warning, it is easy to ignore. However, it is important to understand what it means and how to fix your code when you see it because it means that you are doing something that is not guaranteed to work.

In order to explain the warning, we will first make a simple example that generates it. We will start with a simple DataFrame with some age and height data for a few people.

data = {"name": ["Bill", "Jane", "Sue", "Xingyi", "Maryam"],
        "age": np.array([45, 98, 24, 11, 64]),
        "height": np.array([1.73, 1.62, 1.83, 1.11, 1.54])
}
df = pd.DataFrame(data)

Now, let us say that we want to change Bill’s age to be 100. We might try to do this as follows:

df[df.name=='Bill']["age"] = 100

print("\n\n Printing df after trying to set Bill's age to 100")
print(df.head())
 Printing df after trying to set Bill's age to 100
     name  age  height
0    Bill   45    1.73
1    Jane   98    1.62
2     Sue   24    1.83
3  Xingyi   11    1.11
4  Maryam   64    1.54
/tmp/ipykernel_2280/753014212.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df.name=='Bill']["age"] = 100

This seems very natural, but it (probably) hasn’t worked and it has generated a SettingWithCopyWarning.

To understand the origins of the problem, it is necessary to understand that Pandas operations can return either a copy of the data or a view of the data. A copy is a new DataFrame that contains a completely new copy of the original data (or some part of it). A view is a new DataFrame that contains a reference to the original data (or some part of it). If you change the data in a copy, then the original data remains unchanged. If you change the data in a view, then the original data is also changed. (This is somewhat analogous to the difference between references and copies in Python variables, although Pandas’ internal data structures make the distinction more complex.)

In our code above, the expression df[df.name=='Bill'] has returned a copy of a slice of the DataFrame df and not a view. This means that when we use ["age"] = 100 to set the age of Bill to 100, we are actually setting the age of Bill to 100 in a copy, and not in the original DataFrame. The original DataFrame is unchanged.

The warning occurs because this code uses chained indexing — applying one index operation after another. Pandas cannot always tell whether the second operation modifies the original data or just a temporary object, so it raises a SettingWithCopyWarning to alert you.

The code above is actually equivalent to the following:

df_copy = df[df.name=='Bill']
df_copy["age"] = 100

print("\n\n Printing df after trying to set Bill's age to 100")
print(df.head())
 Printing df after trying to set Bill's age to 100
     name  age  height
0    Bill   45    1.73
1    Jane   98    1.62
2     Sue   24    1.83
3  Xingyi   11    1.11
4  Maryam   64    1.54
/tmp/ipykernel_2280/2079350034.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_copy["age"] = 100

In the above, we would not be surprised that the df has not been changed by df_copy["age"] = 100.

In fact, the situation is worse than this. Whether or not the operation df[df.name=='Bill'] returns a copy or a view is ‘undetermined,’; in most current versions of Pandas, it usually returns a copy, but this behaviour is not consistent across versions and is not guaranteed. So, it is best not to ignore this warning even if the code appears to work. It tells you that you are doing something that is not guaranteed to work.

As noted in the warning message, the correct way to set Bill’s age to 100 is to index the element that you want to change in a single step using loc as follows:

df.loc[df.name=='Bill', "age"] = 100

print("\n\n Printing df after trying to set Bill's age to 100")
print(df)
 Printing df after trying to set Bill's age to 100
     name  age  height
0    Bill  100    1.73
1    Jane   98    1.62
2     Sue   24    1.83
3  Xingyi   11    1.11
4  Maryam   64    1.54

Note that no warning has appeared and Bill’s age has successfully changed to 100.

Caution

If you ever feel tempted to suppress the SettingWithCopyWarning, don’t! The correct solution is always to rewrite your code to use .loc or .copy() explicitly so that the behaviour is well-defined.

3. SettingWithCopyWarning - another example

Let us say that we want to make a new DataFrame containing just the people over 50 years old, and in this new DataFrame we want to change the height so that it is measured in centimetres rather than metres. We might try to do this as follows:

Again, we get the same warning. The problem is again that we cannot be sure whether df_centimetre is a copy or a view of df. So, whether the conversion to centimetres is also applied to the original df is undetermined. Even if it appears to work, it may not work in a different version of Pandas.

We need to rewrite the code so that it is guaranteed to work as expected. There are two cases: either we didn’t want the original DataFrame to be changed or we did.

If we want the original DataFrame to remain unchanged, then we need to explicitly state that df_centimetre is a copy of df. This is done using the copy() method as follows:

df_centimetre = df[df.age > 50].copy()
df_centimetre["height"] *= 100

print("\n\n Printing the original df")
print(df)
print("\n\n Printing the new df_centimetre")
print(df_centimetre)
 Printing the original df
     name  age  height
0    Bill  100    1.73
1    Jane   98    1.62
2     Sue   24    1.83
3  Xingyi   11    1.11
4  Maryam   64    1.54


 Printing the new df_centimetre
     name  age  height
0    Bill  100   173.0
1    Jane   98   162.0
4  Maryam   64   154.0

Alternatively, if we wanted to apply the conversion to centimetres to the original DataFrame then we need to use the loc method as follows,

df.loc[df.age > 50, "height"] *= 100

print("\n\n Printing the original df")
print(df)
 Printing the original df
     name  age  height
0    Bill  100  173.00
1    Jane   98  162.00
2     Sue   24    1.83
3  Xingyi   11    1.11
4  Maryam   64  154.00

Note that neither of the above solutions will generate the SettingWithCopyWarning.

Caution

The above example is only for illustrative purposes. Making a DataFrame in which the height of people over 50 is measured in centimetres, but that of people under 50 is measured in metres is not a good idea!

Key Takeaways

  • The SettingWithCopyWarning warns that an assignment may not affect the original DataFrame.

  • It is usually triggered by chained indexing (e.g. df[mask]["col"] = value).

  • Use .loc[row_selector, col_selector] = value for safe, guaranteed assignments.

  • Use .copy() if you want to work with a separate DataFrame.

4. Submit your own#

If you have any suggestions for other common Pandas pitfalls to include, please let me know.

Copyright © 2023–2025 Jon Barker, University of Sheffield. All rights reserved.