030 Numerical Computing with NumPy#

COM6018

Copyright © 2023, 2024 Jon Barker, University of Sheffield. All rights reserved.

1. Introducing NumPy#

1.1 What is NumPy?#

NumPy is a core Python package for scientific computing that

  • provides a powerful N-dimensional array object,

  • provides highly optimised linear algebra tools,

  • has tight integration with C/C++ and Fortran code,

  • is licensed under a BSD license, i.e., it is freely reusable.

1.2 Importing NumPy#

NumPy is conventionally imported using,

import numpy as np

The Python keyword as here allows us to use ‘np’ as a shorthand to refer to the ‘numpy’ module in our code. This is a common convention and so your code will be more readable if you follow it.

2. NumPy Arrays#

2.1 Generating a NumPy ndarray#

The NumPy package introduce a type called numpy.ndarray that is used for representing N-Dimensional arrays. An N-Dimensional array is a data structure that can be used to represent vectors, matrices, and higher-dimensional arrays. i.e., a vector is a 1-D array, a matrix is a 2-D array, etc.

Arrays can be generated from various sources. These include

  • from Python lists containing numeric data,

  • using NumPy array generating functions, or

  • by reading data from a file.

2.2 Generating arrays from lists#

In the example below, we generate a 1-D array from a Python list. We start with the Python list my_list and then use the NumPy function np.array to convert it to a NumPy array which we have stored as the varible my_array.

my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)  # create a simple 1-D array
print(my_list)
print(my_array)
print(type(my_list))
print(type(my_array))
[1, 2, 3, 4, 5]
[1 2 3 4 5]
<class 'list'>
<class 'numpy.ndarray'>

Note the slight differnce in appearance when we print a NumPy array versus printing a Python list.

We will now construct a 2-D array (i.e., a matrix) from a list of lists,

my_2d_array = np.array([[1., 2, 3], [4, 5, 6], [7, 8, 9]])
print(my_2d_array)
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
print(type(my_2d_array))
<class 'numpy.ndarray'>

2.3 The ndarray object’s properties#

The ndarray has various properties that we can access. The most import are shape, size, and dtype.

The shape property is a tuple that gives the size of each dimension of the array.

print(my_2d_array.shape)
(3, 3)

The size property gives the total number of elements in the array.

print(my_2d_array.size)
9

The dtype property gives the data type of the array’s elements (e.g., are they integers, floats, etc.)

print(my_2d_array.dtype)
float64

Tip: when working with NumPy arrays it is often useful to check the shape and dtype properties to make sure you are working with the correct type of array. Printing these properties is also a useful debugging technique and is often more useful than printing the arrays contents.

2.4 N-dimensional arrays#

NumPy generalises arrays to be N-dimensional. Although we are often working with 1-d or 2-d data there are applications where higher dimensions are useful. For example, 3-d data appears often when video processing (x, y and time) and 4-d data appears in medical imaging (x, y, z and time). Note, in mathematics, N-dimensional arrays are often called tensors. (This is why Google’s deep learning library is called TensorFlow.)

In the example below, we first create a 2-D array (x2) of shape 2 by 2 and then we make a 3-D array (x3) using a list of 4 of these 2-D arrays.

x2 = np.array([[1, 2], [3, 4]])  # a matrix
x3 = np.array([x2, x2, x2, x2])  # stacking two matrices
print(x3.shape)
(4, 2, 2)

The x3 array ends up as a block of numbers of size \(2 \times 2 \times 4\).

We can then repeat this process to create a 4-D array (x4) by stacking 5 copies of the 3-D array to make 4-D structures…

x2 = np.array([[1, 2], [3, 4]])  # a matrix
x3 = np.array([x2, x2, x2, x2])  # stacking two matrices
x4 = np.array([x3, x3, x3, x3, x3])  # stacking 5 3-D structures
print(x4.shape)
(5, 4, 2, 2)

… and stacking 4-D arrays to make 5-D arrays.

x2 = np.array([[1, 2], [3, 4]])  # a matrix
x3 = np.array([x2, x2, x2, x2])  # stacking two matrices
x4 = np.array([x3, x3, x3, x3, x3])  # stacking 5 3-D structures
x5 = np.array([x4, x4])  # stacking 2 4-D structures!
print(x5.shape)
(2, 5, 4, 2, 2)

But in COM6018 we will mostly only use N=1 (vectors) and N=2 (matrices).

3 Generating NumPy arrays#

3.1 Basic array generating functions#

NumPy provides a number of functions for generating arrays of various kinds.

For example, generating a 1-D array of consecutive integers,

x = np.arange(10)
print(x)
[0 1 2 3 4 5 6 7 8 9]

Or an array of evenly spaced numbers,

x = np.arange(100, 110, 2)  # start, stop, step
print(x)
[100 102 104 106 108]

The linspace function is similar to arange but allows you to specify the number of points rather than the step size,

x = np.linspace(10, 20, 5)  # start, stop, n-points
print(x)
[10.  12.5 15.  17.5 20. ]

We often need arrays full of zeros or ones. NumPy provides functions for this,

x = np.zeros( (3, 3, 3) )  # Note, argument is a tuple
print(x)
[[[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]

 [[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]

 [[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]]

Note that the zeros() function has a single argument specifying the desired array shape. This argument is a tuple. For example, in the above, the argument has value (3,3,3) which means that we want a 3-D array with 3 elements in each dimension. I have placed spaces between the parentheses to make it clear that this is a tuple. However, the spaces are not necessary and you will see this more often written as np.zeros((3,3,3)). (When you see it written like this, do not be confused into thinking that the double brackets are redundant. You cannot rewrite this as np.zeros(3,3,3). This is computer programming, not mathematics. :smile: )

Similarly, to make an array full of ones we can use,

x = np.ones((2, 5))
print(x)
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

3.2 More array generating functions#

The diag function can be used to generate diagonal matrices where we specify the numbers that we want to appear on the leading diagonal. For example a 3 by 3 matrix with 4, 5 and 3 along the diagonal can be generated with,

x = np.diag((4, 5, 3))
print(x)
[[4 0 0]
 [0 5 0]
 [0 0 3]]

There is an optional parameter k that allows us to instead specify a diagonal that is displaced from the leading diagonal. This is most easily explained with an example,

x = np.diag((2, 2), k=3)
print(x)
[[0 0 0 2 0]
 [0 0 0 0 2]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]

By summing matrices of this form we can make any banded-diagonal matrix, e.g.,

x = np.diag((1, 1, 1)) + np.diag((2, 2), k=1) + np.diag((2, 2), k=-1)
print(x)
[[1 2 0]
 [2 1 2]
 [0 2 1]]

Using diag we could make an identity matrix, which has 1’s along the leading diagonal. However, because the identity matrix is used so often, NumPy provides a dedicated function for generating it, eye that has a single parameter which determines the number of rows and columns,

x = np.eye(6)
print(x)
[[1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]]

3.3 Arrays initialised with random numbers#

NumPy provides a number of functions for generating arrays of random numbers. The most useful are rand and randn which appear in the the submodule numpy.random. rand uses random numbers that are uniformly distributed between 0 and 1. randn uses random numbers that are normally distributed with mean 0 and standard deviation 1.

For example, to generate a 2 by 4 matrix of random numbers,

np.random.rand(2, 4)  # uniform distribution between 0 and 1
array([[0.11106326, 0.67891963, 0.19792918, 0.26436323],
       [0.87284111, 0.41436352, 0.44435961, 0.036551  ]])

or,

np.random.randn(2, 4)  # standard normal distribution
array([[-1.32667792,  0.74980168,  0.2277731 , -0.86582345],
       [ 0.30587597, -0.26918779,  1.4611421 , -0.32246338]])

3.4 Reading arrays from files#

Finally, we might want to generate arrays by reading data from a file. NumPy provides a number of functions for this,

  • genfromtxt and savetxt for reading and writing to text files.

  • load and save for reading and writing in NumPy’s native format.

For the example below, we will use data from a text file data/liver_data_20.txt which has 20 rows of data stored in CSV format,

cat data/liver_data_20.txt
92,45,27,31,0.0,1
64,59,32,23,0.0,2
54,33,16,54,0.0,2
78,34,24,36,0.0,2
70,12,28,10,0.0,2
55,13,17,17,0.0,2
62,20,17,9,0.5,1
67,21,11,11,0.5,1
54,22,20,7,0.5,1
60,25,19,5,0.5,1
52,13,24,15,0.5,1
62,17,17,15,0.5,1
64,61,32,13,0.5,1
77,25,19,18,0.5,1
67,29,20,11,0.5,1
78,20,31,18,0.5,1
67,23,16,10,0.5,1
79,17,17,16,0.5,1
107,20,20,56,0.5,1
116,11,33,11,0.5,1

To read this into a NumPy array we simply use,

x = np.genfromtxt("data/liver_data_20.txt", delimiter=",")  # for reading a csv file
print(x)
[[ 92.   45.   27.   31.    0.    1. ]
 [ 64.   59.   32.   23.    0.    2. ]
 [ 54.   33.   16.   54.    0.    2. ]
 [ 78.   34.   24.   36.    0.    2. ]
 [ 70.   12.   28.   10.    0.    2. ]
 [ 55.   13.   17.   17.    0.    2. ]
 [ 62.   20.   17.    9.    0.5   1. ]
 [ 67.   21.   11.   11.    0.5   1. ]
 [ 54.   22.   20.    7.    0.5   1. ]
 [ 60.   25.   19.    5.    0.5   1. ]
 [ 52.   13.   24.   15.    0.5   1. ]
 [ 62.   17.   17.   15.    0.5   1. ]
 [ 64.   61.   32.   13.    0.5   1. ]
 [ 77.   25.   19.   18.    0.5   1. ]
 [ 67.   29.   20.   11.    0.5   1. ]
 [ 78.   20.   31.   18.    0.5   1. ]
 [ 67.   23.   16.   10.    0.5   1. ]
 [ 79.   17.   17.   16.    0.5   1. ]
 [107.   20.   20.   56.    0.5   1. ]
 [116.   11.   33.   11.    0.5   1. ]]

The delimiter parameter specifies that the data are separated by commas. (The default is to assume that the data are separated by spaces.)

Saving a NumPy array to a file is also easy,

x = np.genfromtxt("data/liver_data_20.txt", delimiter=",")  # for reading a csv file
np.savetxt("data/matrix.tsv", x, delimiter="\t", fmt="%.5f")

Here we have saved the data in tab-separated format with 5 decimal places of precision.

cat data/matrix.tsv
92.00000	45.00000	27.00000	31.00000	0.00000	1.00000
64.00000	59.00000	32.00000	23.00000	0.00000	2.00000
54.00000	33.00000	16.00000	54.00000	0.00000	2.00000
78.00000	34.00000	24.00000	36.00000	0.00000	2.00000
70.00000	12.00000	28.00000	10.00000	0.00000	2.00000
55.00000	13.00000	17.00000	17.00000	0.00000	2.00000
62.00000	20.00000	17.00000	9.00000	0.50000	1.00000
67.00000	21.00000	11.00000	11.00000	0.50000	1.00000
54.00000	22.00000	20.00000	7.00000	0.50000	1.00000
60.00000	25.00000	19.00000	5.00000	0.50000	1.00000
52.00000	13.00000	24.00000	15.00000	0.50000	1.00000
62.00000	17.00000	17.00000	15.00000	0.50000	1.00000
64.00000	61.00000	32.00000	13.00000	0.50000	1.00000
77.00000	25.00000	19.00000	18.00000	0.50000	1.00000
67.00000	29.00000	20.00000	11.00000	0.50000	1.00000
78.00000	20.00000	31.00000	18.00000	0.50000	1.00000
67.00000	23.00000	16.00000	10.00000	0.50000	1.00000
79.00000	17.00000	17.00000	16.00000	0.50000	1.00000
107.00000	20.00000	20.00000	56.00000	0.50000	1.00000
116.00000	11.00000	33.00000	11.00000	0.50000	1.00000

4 Array manipulation#

4.1 Indexing and slicing#

indexing is similar to Python lists

x = np.array([1, 2, 3, 4, 5, 6, 7])
print(x[0])
print(x[2:5])
print(x[:4])
print(x[4:])
1
[3 4 5]
[1 2 3 4]
[5 6 7]

But it is generalised to n-dimensions

x = np.random.rand(5, 5)
print(x[2:4, :2])
[[0.99441504 0.17275828]
 [0.71553443 0.99882997]]

4.2 Extracting a row or column vector from a matrix#

A = np.genfromtxt("data/test_matrix.txt")
print(A)
[[ 0.  1.  2.  3.  4.]
 [10. 11. 12. 13. 14.]
 [20. 21. 22. 23. 24.]
 [30. 31. 32. 33. 34.]
 [40. 41. 42. 43. 44.]]
print(A[2, 1:4])  # extract row 2  (can also be written as A[2])
print(A[2, 1:4].shape)
[21. 22. 23.]
(3,)
print(A[:, 2])  # extract column 2
print(A[:, 2].shape)
[ 2. 12. 22. 32. 42.]
(5,)

4.3 Some basic operations#

The NumPy ndarray object has many methods.

e.g., min, max, sum, product, mean

x = np.array([1, 2, 3, 4, 5, 6])
print(x.min(), x.max())
1 6
print(x.sum(), x.prod())
21 720
print(x.mean(), x.var())
3.5 2.9166666666666665

These operations can be applied to arrays with more than one dimension.

A = np.genfromtxt("data/test_matrix.txt")
print(A)
[[ 0.  1.  2.  3.  4.]
 [10. 11. 12. 13. 14.]
 [20. 21. 22. 23. 24.]
 [30. 31. 32. 33. 34.]
 [40. 41. 42. 43. 44.]]
print(A.min(), A.max())
0.0 44.0
mean_values = A.mean(axis=0)
sum_values = A.sum(axis=0)
print(mean_values)
print(sum_values)
[20. 21. 22. 23. 24.]
[100. 105. 110. 115. 120.]

5 Working with NumPy arrays#

5.1 Reshaping and resizing#

It is sometimes necessary to wrap a vector into a matrix or unwrap a matrix into a vector

M = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(3, 3)
print(M)
[[1 2 3]
 [4 5 6]
 [7 8 9]]
v = M.reshape(9)
print(v)
[1 2 3 4 5 6 7 8 9]
# The following line will generate an error
# because reshape cannot change the number of elements.

v = M.reshape(8)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[44], line 4
      1 # The following line will generate an error
      2 # because reshape cannot change the number of elements.
----> 4 v = M.reshape(8)

ValueError: cannot reshape array of size 9 into shape (8,)

5.2 Adding a new dimension#

Can easily turn 1-D vectors in 2-D matrices

v = np.array([1, 2, 3, 4, 5])
print(v)
print(v.shape)
[1 2 3 4 5]
(5,)
v_row = v[np.newaxis, :]  # turn a vector into a 1-row matrix
print(v_row)
print(v_row.shape)
[[1 2 3 4 5]]
(1, 5)
v_col = v[:, np.newaxis]  # turn a vector into a 1-column matrix
print(v_col)
print(v_col.shape)
[[1]
 [2]
 [3]
 [4]
 [5]]
(5, 1)

5.3 Stacking arrays#

Arrays with compatible dimensions can be joined horizontally or vertically

x = np.ones((2, 3))
y = np.zeros((2, 2))
z = np.hstack((x, y, x))  # note, arrays passed as a tuple
print(x)
print(y)
print(z)
print(z.shape)
[[1. 1. 1.]
 [1. 1. 1.]]
[[0. 0.]
 [0. 0.]]
[[1. 1. 1. 0. 0. 1. 1. 1.]
 [1. 1. 1. 0. 0. 1. 1. 1.]]
(2, 8)
x = np.ones((2, 2))
y = np.zeros((1, 2))
z = np.vstack((x, y))
print(z)
print(z.shape)
[[1. 1.]
 [1. 1.]
 [0. 0.]]
(3, 2)

5.4 Tiling and repeating#

x = np.array([[1, 2], [3, 4]])
y = np.tile(x, 3)
print(y)
[[1 2 1 2 1 2]
 [3 4 3 4 3 4]]
y = np.tile(x, (2, 4))
print(y)
[[1 2 1 2 1 2 1 2]
 [3 4 3 4 3 4 3 4]
 [1 2 1 2 1 2 1 2]
 [3 4 3 4 3 4 3 4]]
y = np.repeat(x, 4)
print(y)
[1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4]
y = np.repeat(x, 4, axis=0)
print(y)
[[1 2]
 [1 2]
 [1 2]
 [1 2]
 [3 4]
 [3 4]
 [3 4]
 [3 4]]

6 Copying#

6.1 Shallow copy#

Arrays are handled by reference.

When you do A = B you are just copying a reference, not the data itself.

A = np.array([1, 2, 3, 4, 5, 6])
B = A
B[0] = 10
print(A)
[10  2  3  4  5  6]

Note that this is also true for Python lists and objects.

A = [1, 2, 3, 4, 5, 6]
B = A
B[0] = 10
print(A)
[10, 2, 3, 4, 5, 6]

So, how do we make a real copy?

6.2 Deep Copy#

To actually copy the data stored in the array we use the NumPy copy method,

A = np.array([1, 2, 3, 4, 5, 6])
B = A.copy()  # can also write, B = np.copy(A)
B[0] = 10
print(A)
[1 2 3 4 5 6]

Note, to copy Python lists, we first need to import the copy module,

import copy

A = [1, 2, 3, 4, 5, 6]
B = copy.deepcopy(A)
print(B)
[1, 2, 3, 4, 5, 6]

(Don’t confuse NumPy ndarrays and Python lists…)

7 Matrix operations#

NumPy implements all common array operations,

  • addition, subtraction,

  • transpose,

  • multiplication,

  • inverse

7.1 Array addition and subtraction#

X = np.array([[1, 2, 3], [4, 5, 6]])
Y = np.ones((2, 3))
print(X)
print(Y)
[[1 2 3]
 [4 5 6]]
[[1. 1. 1.]
 [1. 1. 1.]]
print(X + Y)
[[2. 3. 4.]
 [5. 6. 7.]]
Z = X - 2 * Y  # note, scalar multiplication
print(Z)
[[-1.  0.  1.]
 [ 2.  3.  4.]]
Z = X + np.array([[2, 2], [2, 2]])
print(Z)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[64], line 1
----> 1 Z = X + np.array([[2, 2], [2, 2]])
      2 print(Z)

ValueError: operands could not be broadcast together with shapes (2,3) (2,2) 

7.2 Broadcasting#

During operations, NumPy will try to repeat an array to make dimensions fit. This is called ‘broadcasting’. It is convenient but it can be confusing.

X = np.array([[1, 2, 3], [4, 5, 6]])
row = np.array([1, 1, 1])
print(X)
print(row)
[[1 2 3]
 [4 5 6]]
[1 1 1]
print(X + row)
[[2 3 4]
 [5 6 7]]
col = np.array([1, 2])
print(col)
print(X + col[:, np.newaxis])
[1 2]
[[2 3 4]
 [6 7 8]]

7.3 Transpose#

Transposing a matrix swaps the rows and columns.

A = np.array([[1, 2, 3], [4, 5, 6]])
print(A)
print(A.T)
[[1 2 3]
 [4 5 6]]
[[1 4]
 [2 5]
 [3 6]]
A = np.array([[1, 2, 3], [4, 5, 6]])
print(A.shape)
print(A.T.shape)
(2, 3)
(3, 2)
v = np.array([1, 2, 3, 4, 5])
print(v)
print(v.T)  # Vectors only have one dimension. Transpose does nothing.
[1 2 3 4 5]
[1 2 3 4 5]
v = np.array([1, 2, 3, 4, 5])
print(v.shape)
print(v.T.shape)
(5,)
(5,)

7.4 A note on vectors versus `skinny’ matrices#

When using NumPy, a vector is not the same as a matrix with one column.

v = np.array([1, 2, 3, 4, 5])  # A vector - has 1 dimension
print(v)
print(v.T)
print(v.shape)
[1 2 3 4 5]
[1 2 3 4 5]
(5,)
M_row = np.array([[1, 2, 3, 4, 5]])  # A matrix - has 2 dimensions
print(M_row)
print(M_row.shape)
[[1 2 3 4 5]]
(1, 5)
M_col = np.array([[1, 2, 3, 4, 5]]).T  # A matrix can be transposed
print(M_col)
print(M_col.shape)
[[1]
 [2]
 [3]
 [4]
 [5]]
(5, 1)

7.5 Multiplication#

The ‘*’ operator performs ‘elementwise’ multiplication

A = np.array([[1, 2, 3], [4, 5, 6]])
X = A * A
print(X)
[[ 1  4  9]
 [16 25 36]]

Standard ‘matrix multiplication’ is performed using the dot function

X = np.dot(A, A.T)  # Multipy 2x3 matrix A and 3x2 matrix A.T (AA')
print(X)
[[14 32]
 [32 77]]
v = np.array([1, 2, 3])
x = np.dot(A, v)  # Multiply 2x3 matrix A and 3-element vector (Av)
print(x)
[14 32]

7.6 Matrix inverse#

The matrix determinant and the inverse function are provided by the linalg submodule of NumPy.

A = np.array([[2, 1], [3, 2]])
print(A)
[[2 1]
 [3 2]]
det_A = np.linalg.det(A)
print(det_A)
0.9999999999999998
inv_A = np.linalg.inv(A)
print(inv_A)
[[ 2. -1.]
 [-3.  2.]]

8 Summary#

  • NumPy provides tools for numeric computing.

  • With NumPy, Python becomes a usable alternative to MATLAB.

  • NumPy’s basic type is the ndarray – it can represent vectors, matrices, etc.

  • Lots of tools for vector and matrix manipulation.

    • This lecture has only reviewed the most commonly used.

  • For the full documentation, see http://docs.scipy.org/doc/numpy/