COM6018 Data Science with Python

Getting Set Up

Jon Barker

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Overview

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Tools

There are a number of tools that we will use in this module. These are:

  • Python - the programming language
  • Jupyter Notebooks - a way of writing and running Python code
  • Git - a version control system
  • GitHub - a website for hosting Git repositories
  • Visual Studio Code - a text editor (recommended)
  • Conda - a package manager for Python
  • bash - a command line shell

These tools are all free and open source and all widely used in industry. They will also all run on Windows, Mac and Linux.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Python Versions

There are two versions of Python in common use: Python 2 and Python 3. We will be using Python 3 in this module. Python 2 is no longer supported and should not be used for new projects.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Python Release Cycle

Python 3 is being continually developed and new subversions are released regularly.

We will be using Python 3.12. This is the latest version at the time of writing.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Getting the Materials

The materials for this module are hosted on GitHub.

Visit https://github.com/UOS-COM-6018/COM6018

You can clone the repository using git. Open a terminal and then type:

git clone https://github.com/UOS-COM-6018/COM6018.git

This will have made a directoy called COM6018 in your current directory. You can cd into this directory and then ls to see the contents.

cd COM6018
ls
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Contents

The materials are organised as follows:

.
└── materials
    ├── labs       # This will contain the lab materials
    ├── lectures   # These lecturer notes
    ├── solutions  # Solutions to the lab classes
    └── tutorials  # Tutorials covering the lecture content

Lectures and tutorials will generally be released on Sunday. Labs wills be released later in the week. Solutions will be released after the lab class.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Setting up a Conda Environment

We will be using conda to manage our Python environment. This will allow us to install the correct versions of Python and all the packages we need.

The materials contain a conda environment file com6018.environment.yml that specifies the packages we need.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

The environment.yml file

name: com6018
channels:
  - conda-forge
  - defaults
dependencies:
  - numpy
  - scikit-learn
  - basemap-data-hires
  - xlrd
  - ipykernel
  - matplotlib
  - pandas
  - docopt
  - seaborn
  - python=3.11
  - notebook
  - nb_conda_kernels
  - jupyter_contrib_nbextensions
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Creating the Conda environment

You will need to make a new environment called com6018 like this:

cd COM6018
conda env create -f com6018.environment.yml

If you have not used conda before you will need to install it. You can do this by installing Anaconda or Miniconda. See the conda documentation for details. Or see the 'Getting Started with Python lab class' notes.

We recommend using Miniconda as it is smaller and easier to manage.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Installing Visual Studio Code

I highly recommend using Visual Studio Code as your text editor. It is free and open source and has excellent support for Python.

https://code.visualstudio.com/download

It also provides support for running the Jupyter Notebooks that we use in the lab classes.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Jupyter Notebooks

Jupyter Notebooks are a way of writing and running Python code. They are very popular in data science and machine learning.

They allow you to integrate notes, Python code and the output of the code in a single document.

They are ideal for exploratory data analysis and for sharing your work with others. They are also great for lab classes.

conda activate com6018
cd COM6018/materials/tutorials
jupyter notebook 010_Introducing_Python.ipynb
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

What Next?

Read the tutorials 010_Introducing_Python.ipynb and 015_Further_Python.ipynb to get started with Python.

Come to the Lab on Friday and complete the 010_python_intro.ipynb notebook.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved