{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 080 Curve Fitting with Scikit Learn\n", "\n", "> COM6018\n", "\n", "*Copyright © 2023, 2024 Jon Barker, University of Sheffield. All rights reserved*.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "In this lab we will be using linear regression to fit a model to the atmospheric gas concentration data that we have been working with in this module.\n", "\n", "The lab assumes that you have read and understood the lecture notes, Curve Fitting with Scikit learn. We will be using ideas from these notes in the lab.\n", "\n", "In the cell below, we will import some of the libraries that we will be using in the lab.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 - Load the data\n", "\n", "We will start by loading the data from the file `co2.csv` into a Pandas DataFrame. The data is in the same format as the data we used in the previous lab. \n", "\n", "You can use the `read_csv` method. The file contains comment lines that start with '%' so you will need to use the `comment` parameter of the `read_csv` method to ignore these lines.\n", "\n", "Read the data such that the DataFrame columns are called 'year', 'month', 'day' and 'co2'. The csv file also contains columns called 'NB' and 'scale', but we will ignore these. You can use the `drop` method to remove these columns from the DataFrame. Store the DataFrame in a variable called `co2_df`.\n", "\n", "Write the code below.\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# SOLUTION\n", "\n", "co2_df = pd.read_csv('data/co2.csv', comment='%', names=['year','month','day','co2','NB','scale', 'sta'], skipinitialspace=True)\n", "\n", "co2_df = co2_df.drop(['NB', 'scale'], axis=1)\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | year | \n", "month | \n", "day | \n", "co2 | \n", "sta | \n", "
---|---|---|---|---|---|
0 | \n", "1958 | \n", "1 | \n", "1 | \n", "NaN | \n", "mlo | \n", "
1 | \n", "1958 | \n", "1 | \n", "2 | \n", "NaN | \n", "mlo | \n", "
2 | \n", "1958 | \n", "1 | \n", "3 | \n", "NaN | \n", "mlo | \n", "
3 | \n", "1958 | \n", "1 | \n", "4 | \n", "NaN | \n", "mlo | \n", "
4 | \n", "1958 | \n", "1 | \n", "5 | \n", "NaN | \n", "mlo | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
24406 | \n", "2024 | \n", "10 | \n", "27 | \n", "NaN | \n", "mlo | \n", "
24407 | \n", "2024 | \n", "10 | \n", "28 | \n", "NaN | \n", "mlo | \n", "
24408 | \n", "2024 | \n", "10 | \n", "29 | \n", "NaN | \n", "mlo | \n", "
24409 | \n", "2024 | \n", "10 | \n", "30 | \n", "NaN | \n", "mlo | \n", "
24410 | \n", "2024 | \n", "10 | \n", "31 | \n", "NaN | \n", "mlo | \n", "
24411 rows × 5 columns
\n", "LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()