030 Numerical Computing with NumPy#
COM6018
Copyright © 2023, 2024 Jon Barker, University of Sheffield. All rights reserved.
1. Introducing NumPy#
1.1 What is NumPy?#
NumPy is a core Python package for scientific computing that
provides a powerful N-dimensional array object,
provides highly optimised linear algebra tools,
has tight integration with C/C++ and Fortran code,
is licensed under a BSD license, i.e., it is freely reusable.
1.2 Importing NumPy#
NumPy is conventionally imported using,
import numpy as np
The Python keyword as
here allows us to use ‘np’ as a shorthand to refer to the ‘numpy’ module in our code. This is a common convention and so your code will be more readable if you follow it.
2. NumPy Arrays#
2.1 Generating a NumPy ndarray#
The NumPy package introduce a type called numpy.ndarray
that is used for representing N-Dimensional arrays. An N-Dimensional array is a data structure that can be used to represent vectors, matrices, and higher-dimensional arrays. i.e., a vector is a 1-D array, a matrix is a 2-D array, etc.
Arrays can be generated from various sources. These include
from Python lists containing numeric data,
using NumPy array generating functions, or
by reading data from a file.
2.2 Generating arrays from lists#
In the example below, we generate a 1-D array from a Python list. We start with the Python list my_list
and then use the NumPy function np.array
to convert it to a NumPy array which we have stored as the varible my_array
.
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list) # create a simple 1-D array
print(my_list)
print(my_array)
print(type(my_list))
print(type(my_array))
[1, 2, 3, 4, 5]
[1 2 3 4 5]
<class 'list'>
<class 'numpy.ndarray'>
Note the slight differnce in appearance when we print a NumPy array versus printing a Python list.
We will now construct a 2-D array (i.e., a matrix) from a list of lists,
my_2d_array = np.array([[1., 2, 3], [4, 5, 6], [7, 8, 9]])
print(my_2d_array)
[[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
print(type(my_2d_array))
<class 'numpy.ndarray'>
2.3 The ndarray object’s properties#
The ndarray
has various properties that we can access. The most import are shape
, size
, and dtype
.
The shape
property is a tuple that gives the size of each dimension of the array.
print(my_2d_array.shape)
(3, 3)
The size
property gives the total number of elements in the array.
print(my_2d_array.size)
9
The dtype
property gives the data type of the array’s elements (e.g., are they integers, floats, etc.)
print(my_2d_array.dtype)
float64
Tip: when working with NumPy arrays it is often useful to check the shape
and dtype
properties to make sure you are working with the correct type of array. Printing these properties is also a useful debugging technique and is often more useful than printing the arrays contents.
2.4 N-dimensional arrays#
NumPy generalises arrays to be N-dimensional. Although we are often working with 1-d or 2-d data there are applications where higher dimensions are useful. For example, 3-d data appears often when video processing (x, y and time) and 4-d data appears in medical imaging (x, y, z and time). Note, in mathematics, N-dimensional arrays are often called tensors. (This is why Google’s deep learning library is called TensorFlow.)
In the example below, we first create a 2-D array (x2
) of shape 2 by 2 and then we make a 3-D array (x3
) using a list of 4 of these 2-D arrays.
x2 = np.array([[1, 2], [3, 4]]) # a matrix
x3 = np.array([x2, x2, x2, x2]) # stacking two matrices
print(x3.shape)
(4, 2, 2)
The x3
array ends up as a block of numbers of size \(2 \times 2 \times 4\).
We can then repeat this process to create a 4-D array (x4
) by stacking 5 copies of the 3-D array to make 4-D structures…
x2 = np.array([[1, 2], [3, 4]]) # a matrix
x3 = np.array([x2, x2, x2, x2]) # stacking two matrices
x4 = np.array([x3, x3, x3, x3, x3]) # stacking 5 3-D structures
print(x4.shape)
(5, 4, 2, 2)
… and stacking 4-D arrays to make 5-D arrays.
x2 = np.array([[1, 2], [3, 4]]) # a matrix
x3 = np.array([x2, x2, x2, x2]) # stacking two matrices
x4 = np.array([x3, x3, x3, x3, x3]) # stacking 5 3-D structures
x5 = np.array([x4, x4]) # stacking 2 4-D structures!
print(x5.shape)
(2, 5, 4, 2, 2)
But in COM6018 we will mostly only use N=1 (vectors) and N=2 (matrices).
3 Generating NumPy arrays#
3.1 Basic array generating functions#
NumPy provides a number of functions for generating arrays of various kinds.
For example, generating a 1-D array of consecutive integers,
x = np.arange(10)
print(x)
[0 1 2 3 4 5 6 7 8 9]
Or an array of evenly spaced numbers,
x = np.arange(100, 110, 2) # start, stop, step
print(x)
[100 102 104 106 108]
The linspace
function is similar to arange
but allows you to specify the number of points rather than the step size,
x = np.linspace(10, 20, 5) # start, stop, n-points
print(x)
[10. 12.5 15. 17.5 20. ]
We often need arrays full of zeros or ones. NumPy provides functions for this,
x = np.zeros( (3, 3, 3) ) # Note, argument is a tuple
print(x)
[[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]]
Note that the zeros()
function has a single argument specifying the desired array shape. This argument is a tuple. For example, in the above, the argument has value (3,3,3) which means that we want a 3-D array with 3 elements in each dimension. I have placed spaces between the parentheses to make it clear that this is a tuple. However, the spaces are not necessary and you will see this more often written as np.zeros((3,3,3))
. (When you see it written like this, do not be confused into thinking that the double brackets are redundant. You cannot rewrite this as np.zeros(3,3,3)
. This is computer programming, not mathematics. :smile: )
Similarly, to make an array full of ones we can use,
x = np.ones((2, 5))
print(x)
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
3.2 More array generating functions#
The diag
function can be used to generate diagonal matrices where we specify the numbers that we want to appear on the leading diagonal. For example a 3 by 3 matrix with 4, 5 and 3 along the diagonal can be generated with,
x = np.diag((4, 5, 3))
print(x)
[[4 0 0]
[0 5 0]
[0 0 3]]
There is an optional parameter k
that allows us to instead specify a diagonal that is displaced from the leading diagonal. This is most easily explained with an example,
x = np.diag((2, 2), k=3)
print(x)
[[0 0 0 2 0]
[0 0 0 0 2]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
By summing matrices of this form we can make any banded-diagonal matrix, e.g.,
x = np.diag((1, 1, 1)) + np.diag((2, 2), k=1) + np.diag((2, 2), k=-1)
print(x)
[[1 2 0]
[2 1 2]
[0 2 1]]
Using diag
we could make an identity matrix, which has 1’s along the leading diagonal. However, because the identity matrix is used so often, NumPy provides a dedicated function for generating it, eye
that has a single parameter which determines the number of rows and columns,
x = np.eye(6)
print(x)
[[1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0.]
[0. 0. 0. 1. 0. 0.]
[0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 1.]]
3.3 Arrays initialised with random numbers#
NumPy provides a number of functions for generating arrays of random numbers. The most useful are rand
and randn
which appear in the the submodule numpy.random
. rand
uses random numbers that are uniformly distributed between 0 and 1. randn
uses random numbers that are normally distributed with mean 0 and standard deviation 1.
For example, to generate a 2 by 4 matrix of random numbers,
np.random.rand(2, 4) # uniform distribution between 0 and 1
array([[0.11106326, 0.67891963, 0.19792918, 0.26436323],
[0.87284111, 0.41436352, 0.44435961, 0.036551 ]])
or,
np.random.randn(2, 4) # standard normal distribution
array([[-1.32667792, 0.74980168, 0.2277731 , -0.86582345],
[ 0.30587597, -0.26918779, 1.4611421 , -0.32246338]])
3.4 Reading arrays from files#
Finally, we might want to generate arrays by reading data from a file. NumPy provides a number of functions for this,
genfromtxt
andsavetxt
for reading and writing to text files.load
andsave
for reading and writing in NumPy’s native format.
For the example below, we will use data from a text file data/liver_data_20.txt
which has 20 rows of data stored in CSV format,
cat data/liver_data_20.txt
92,45,27,31,0.0,1
64,59,32,23,0.0,2
54,33,16,54,0.0,2
78,34,24,36,0.0,2
70,12,28,10,0.0,2
55,13,17,17,0.0,2
62,20,17,9,0.5,1
67,21,11,11,0.5,1
54,22,20,7,0.5,1
60,25,19,5,0.5,1
52,13,24,15,0.5,1
62,17,17,15,0.5,1
64,61,32,13,0.5,1
77,25,19,18,0.5,1
67,29,20,11,0.5,1
78,20,31,18,0.5,1
67,23,16,10,0.5,1
79,17,17,16,0.5,1
107,20,20,56,0.5,1
116,11,33,11,0.5,1
To read this into a NumPy array we simply use,
x = np.genfromtxt("data/liver_data_20.txt", delimiter=",") # for reading a csv file
print(x)
[[ 92. 45. 27. 31. 0. 1. ]
[ 64. 59. 32. 23. 0. 2. ]
[ 54. 33. 16. 54. 0. 2. ]
[ 78. 34. 24. 36. 0. 2. ]
[ 70. 12. 28. 10. 0. 2. ]
[ 55. 13. 17. 17. 0. 2. ]
[ 62. 20. 17. 9. 0.5 1. ]
[ 67. 21. 11. 11. 0.5 1. ]
[ 54. 22. 20. 7. 0.5 1. ]
[ 60. 25. 19. 5. 0.5 1. ]
[ 52. 13. 24. 15. 0.5 1. ]
[ 62. 17. 17. 15. 0.5 1. ]
[ 64. 61. 32. 13. 0.5 1. ]
[ 77. 25. 19. 18. 0.5 1. ]
[ 67. 29. 20. 11. 0.5 1. ]
[ 78. 20. 31. 18. 0.5 1. ]
[ 67. 23. 16. 10. 0.5 1. ]
[ 79. 17. 17. 16. 0.5 1. ]
[107. 20. 20. 56. 0.5 1. ]
[116. 11. 33. 11. 0.5 1. ]]
The delimiter
parameter specifies that the data are separated by commas. (The default is to assume that the data are separated by spaces.)
Saving a NumPy array to a file is also easy,
x = np.genfromtxt("data/liver_data_20.txt", delimiter=",") # for reading a csv file
np.savetxt("data/matrix.tsv", x, delimiter="\t", fmt="%.5f")
Here we have saved the data in tab-separated format with 5 decimal places of precision.
cat data/matrix.tsv
92.00000 45.00000 27.00000 31.00000 0.00000 1.00000
64.00000 59.00000 32.00000 23.00000 0.00000 2.00000
54.00000 33.00000 16.00000 54.00000 0.00000 2.00000
78.00000 34.00000 24.00000 36.00000 0.00000 2.00000
70.00000 12.00000 28.00000 10.00000 0.00000 2.00000
55.00000 13.00000 17.00000 17.00000 0.00000 2.00000
62.00000 20.00000 17.00000 9.00000 0.50000 1.00000
67.00000 21.00000 11.00000 11.00000 0.50000 1.00000
54.00000 22.00000 20.00000 7.00000 0.50000 1.00000
60.00000 25.00000 19.00000 5.00000 0.50000 1.00000
52.00000 13.00000 24.00000 15.00000 0.50000 1.00000
62.00000 17.00000 17.00000 15.00000 0.50000 1.00000
64.00000 61.00000 32.00000 13.00000 0.50000 1.00000
77.00000 25.00000 19.00000 18.00000 0.50000 1.00000
67.00000 29.00000 20.00000 11.00000 0.50000 1.00000
78.00000 20.00000 31.00000 18.00000 0.50000 1.00000
67.00000 23.00000 16.00000 10.00000 0.50000 1.00000
79.00000 17.00000 17.00000 16.00000 0.50000 1.00000
107.00000 20.00000 20.00000 56.00000 0.50000 1.00000
116.00000 11.00000 33.00000 11.00000 0.50000 1.00000
4 Array manipulation#
4.1 Indexing and slicing#
indexing is similar to Python lists
x = np.array([1, 2, 3, 4, 5, 6, 7])
print(x[0])
print(x[2:5])
print(x[:4])
print(x[4:])
1
[3 4 5]
[1 2 3 4]
[5 6 7]
But it is generalised to n-dimensions
x = np.random.rand(5, 5)
print(x[2:4, :2])
[[0.99441504 0.17275828]
[0.71553443 0.99882997]]
4.2 Extracting a row or column vector from a matrix#
A = np.genfromtxt("data/test_matrix.txt")
print(A)
[[ 0. 1. 2. 3. 4.]
[10. 11. 12. 13. 14.]
[20. 21. 22. 23. 24.]
[30. 31. 32. 33. 34.]
[40. 41. 42. 43. 44.]]
print(A[2, 1:4]) # extract row 2 (can also be written as A[2])
print(A[2, 1:4].shape)
[21. 22. 23.]
(3,)
print(A[:, 2]) # extract column 2
print(A[:, 2].shape)
[ 2. 12. 22. 32. 42.]
(5,)
4.3 Some basic operations#
The NumPy ndarray object has many methods.
e.g., min
, max
, sum
, product
, mean
x = np.array([1, 2, 3, 4, 5, 6])
print(x.min(), x.max())
1 6
print(x.sum(), x.prod())
21 720
print(x.mean(), x.var())
3.5 2.9166666666666665
These operations can be applied to arrays with more than one dimension.
A = np.genfromtxt("data/test_matrix.txt")
print(A)
[[ 0. 1. 2. 3. 4.]
[10. 11. 12. 13. 14.]
[20. 21. 22. 23. 24.]
[30. 31. 32. 33. 34.]
[40. 41. 42. 43. 44.]]
print(A.min(), A.max())
0.0 44.0
mean_values = A.mean(axis=0)
sum_values = A.sum(axis=0)
print(mean_values)
print(sum_values)
[20. 21. 22. 23. 24.]
[100. 105. 110. 115. 120.]
5 Working with NumPy arrays#
5.1 Reshaping and resizing#
It is sometimes necessary to wrap a vector into a matrix or unwrap a matrix into a vector
M = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(3, 3)
print(M)
[[1 2 3]
[4 5 6]
[7 8 9]]
v = M.reshape(9)
print(v)
[1 2 3 4 5 6 7 8 9]
# The following line will generate an error
# because reshape cannot change the number of elements.
v = M.reshape(8)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[44], line 4
1 # The following line will generate an error
2 # because reshape cannot change the number of elements.
----> 4 v = M.reshape(8)
ValueError: cannot reshape array of size 9 into shape (8,)
5.2 Adding a new dimension#
Can easily turn 1-D vectors in 2-D matrices
v = np.array([1, 2, 3, 4, 5])
print(v)
print(v.shape)
[1 2 3 4 5]
(5,)
v_row = v[np.newaxis, :] # turn a vector into a 1-row matrix
print(v_row)
print(v_row.shape)
[[1 2 3 4 5]]
(1, 5)
v_col = v[:, np.newaxis] # turn a vector into a 1-column matrix
print(v_col)
print(v_col.shape)
[[1]
[2]
[3]
[4]
[5]]
(5, 1)
5.3 Stacking arrays#
Arrays with compatible dimensions can be joined horizontally or vertically
x = np.ones((2, 3))
y = np.zeros((2, 2))
z = np.hstack((x, y, x)) # note, arrays passed as a tuple
print(x)
print(y)
print(z)
print(z.shape)
[[1. 1. 1.]
[1. 1. 1.]]
[[0. 0.]
[0. 0.]]
[[1. 1. 1. 0. 0. 1. 1. 1.]
[1. 1. 1. 0. 0. 1. 1. 1.]]
(2, 8)
x = np.ones((2, 2))
y = np.zeros((1, 2))
z = np.vstack((x, y))
print(z)
print(z.shape)
[[1. 1.]
[1. 1.]
[0. 0.]]
(3, 2)
5.4 Tiling and repeating#
x = np.array([[1, 2], [3, 4]])
y = np.tile(x, 3)
print(y)
[[1 2 1 2 1 2]
[3 4 3 4 3 4]]
y = np.tile(x, (2, 4))
print(y)
[[1 2 1 2 1 2 1 2]
[3 4 3 4 3 4 3 4]
[1 2 1 2 1 2 1 2]
[3 4 3 4 3 4 3 4]]
y = np.repeat(x, 4)
print(y)
[1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4]
y = np.repeat(x, 4, axis=0)
print(y)
[[1 2]
[1 2]
[1 2]
[1 2]
[3 4]
[3 4]
[3 4]
[3 4]]
6 Copying#
6.1 Shallow copy#
Arrays are handled by reference.
When you do A = B
you are just copying a reference, not the data itself.
A = np.array([1, 2, 3, 4, 5, 6])
B = A
B[0] = 10
print(A)
[10 2 3 4 5 6]
Note that this is also true for Python lists and objects.
A = [1, 2, 3, 4, 5, 6]
B = A
B[0] = 10
print(A)
[10, 2, 3, 4, 5, 6]
So, how do we make a real copy?
6.2 Deep Copy#
To actually copy the data stored in the array we use the NumPy copy method,
A = np.array([1, 2, 3, 4, 5, 6])
B = A.copy() # can also write, B = np.copy(A)
B[0] = 10
print(A)
[1 2 3 4 5 6]
Note, to copy Python lists, we first need to import the copy module,
import copy
A = [1, 2, 3, 4, 5, 6]
B = copy.deepcopy(A)
print(B)
[1, 2, 3, 4, 5, 6]
(Don’t confuse NumPy ndarrays and Python lists…)
7 Matrix operations#
NumPy implements all common array operations,
addition, subtraction,
transpose,
multiplication,
inverse
7.1 Array addition and subtraction#
X = np.array([[1, 2, 3], [4, 5, 6]])
Y = np.ones((2, 3))
print(X)
print(Y)
[[1 2 3]
[4 5 6]]
[[1. 1. 1.]
[1. 1. 1.]]
print(X + Y)
[[2. 3. 4.]
[5. 6. 7.]]
Z = X - 2 * Y # note, scalar multiplication
print(Z)
[[-1. 0. 1.]
[ 2. 3. 4.]]
Z = X + np.array([[2, 2], [2, 2]])
print(Z)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[64], line 1
----> 1 Z = X + np.array([[2, 2], [2, 2]])
2 print(Z)
ValueError: operands could not be broadcast together with shapes (2,3) (2,2)
7.2 Broadcasting#
During operations, NumPy will try to repeat an array to make dimensions fit. This is called ‘broadcasting’. It is convenient but it can be confusing.
X = np.array([[1, 2, 3], [4, 5, 6]])
row = np.array([1, 1, 1])
print(X)
print(row)
[[1 2 3]
[4 5 6]]
[1 1 1]
print(X + row)
[[2 3 4]
[5 6 7]]
col = np.array([1, 2])
print(col)
print(X + col[:, np.newaxis])
[1 2]
[[2 3 4]
[6 7 8]]
7.3 Transpose#
Transposing a matrix swaps the rows and columns.
A = np.array([[1, 2, 3], [4, 5, 6]])
print(A)
print(A.T)
[[1 2 3]
[4 5 6]]
[[1 4]
[2 5]
[3 6]]
A = np.array([[1, 2, 3], [4, 5, 6]])
print(A.shape)
print(A.T.shape)
(2, 3)
(3, 2)
v = np.array([1, 2, 3, 4, 5])
print(v)
print(v.T) # Vectors only have one dimension. Transpose does nothing.
[1 2 3 4 5]
[1 2 3 4 5]
v = np.array([1, 2, 3, 4, 5])
print(v.shape)
print(v.T.shape)
(5,)
(5,)
7.4 A note on vectors versus `skinny’ matrices#
When using NumPy, a vector is not the same as a matrix with one column.
v = np.array([1, 2, 3, 4, 5]) # A vector - has 1 dimension
print(v)
print(v.T)
print(v.shape)
[1 2 3 4 5]
[1 2 3 4 5]
(5,)
M_row = np.array([[1, 2, 3, 4, 5]]) # A matrix - has 2 dimensions
print(M_row)
print(M_row.shape)
[[1 2 3 4 5]]
(1, 5)
M_col = np.array([[1, 2, 3, 4, 5]]).T # A matrix can be transposed
print(M_col)
print(M_col.shape)
[[1]
[2]
[3]
[4]
[5]]
(5, 1)
7.5 Multiplication#
The ‘*’ operator performs ‘elementwise’ multiplication
A = np.array([[1, 2, 3], [4, 5, 6]])
X = A * A
print(X)
[[ 1 4 9]
[16 25 36]]
Standard ‘matrix multiplication’ is performed using the dot
function
X = np.dot(A, A.T) # Multipy 2x3 matrix A and 3x2 matrix A.T (AA')
print(X)
[[14 32]
[32 77]]
v = np.array([1, 2, 3])
x = np.dot(A, v) # Multiply 2x3 matrix A and 3-element vector (Av)
print(x)
[14 32]
7.6 Matrix inverse#
The matrix determinant and the inverse function are provided by the linalg
submodule of NumPy.
A = np.array([[2, 1], [3, 2]])
print(A)
[[2 1]
[3 2]]
det_A = np.linalg.det(A)
print(det_A)
0.9999999999999998
inv_A = np.linalg.inv(A)
print(inv_A)
[[ 2. -1.]
[-3. 2.]]
8 Summary#
NumPy provides tools for numeric computing.
With NumPy, Python becomes a usable alternative to MATLAB.
NumPy’s basic type is the
ndarray
– it can represent vectors, matrices, etc.Lots of tools for vector and matrix manipulation.
This lecture has only reviewed the most commonly used.
For the full documentation, see http://docs.scipy.org/doc/numpy/