COM6018 Data Science with Python

Using Git and GitHub

Jon Barker

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

What is Git?

  • Git is an open-source, free version control system
  • It allows you to track changes to files in a database called a repository
  • It allows you to revert to previous versions
  • It allows you to work on different versions of files at the same time (branches)

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

What is Git?

  • It is a distributed version control system which means local copies of the repository can be kept on different computers.
  • You can work offline and then push your changes to a remote repository when you are ready or you can pull changes that others have made.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

What is GitHub?

  • GitHub is a hosting service for Git repositories that can make repositories available to others over the internet.
  • It is a commercial service but free for open-source projects
  • It provides a web interface for viewing and editing repositories
  • It also provides many other features such as issue tracking, wikis, project management, etc.

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

A very brief introduction to Git

  • Installing Git
  • Making a new repository
  • Saving file changes to a repository
  • Reverting to an earlier version
  • Saving your repository on GitHub
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Installing Git

  • Git is freely available for Windows, Mac and Linux
  • To install visit https://git-scm.com/downloads
  • Latest version at time of writing is 2.46.2
  • (Windows Git comes with Git Bash, a Unix-like terminal that can be used to run Git commands.)
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Making a new Git repository

A Git repository can track files in a directory and its subdirectories.

To make a repository for your project, navigate to the directory and type:

cd my_project
git init

This will make an empty repository. You now need to add (commit) files to the repository.

# Use `add .` to add all the files in the directory tree
git add .
# The -m flag allows us to provide a `commit message`
git commit -m "Initial commit"
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Updating a Git repository

After we've made changes to some files, or added/deleted files, we can save the new state to the repository.

# Use the -u flag to add all modified files
git add -u

# Use a meaningful commit message
git commit -m "Added the results analysis"
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

View the history of a Git repository

We can view the history of a repository using git log

git log

This will show something like the following,

commit 56e41739ec18f8b88b9dc7321147068d40244096 (HEAD)
Author: Jon Barker <j.p.barker@sheffield.ac.uk>
Date:   Sat Sep 30 18:37:10 2023 +0100

    Added the results analysis

commit e995371ea26d77006cbb2e33268ef2346a9e7c1a
Author: Jon Barker <j.p.barker@sheffield.ac.uk>
Date:   Sat Sep 30 18:27:33 2023 +0100

    Initial commit
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Reverting to a previous version

We can revert to a previous state of the repository.

We need to know the commit hash of the version we want to revert to. This is the long string of characters on the first line of the git log output. e.g. 'e995371...'

The we can temporarily revert to that version using git checkout

git checkout e995371

Or permanently revert to that version using git reset

git reset --hard e995371
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Cloning a repository

We can make a copy of an existing repository using git clone

git clone <url>

This will make a copy of the repository in a new directory with the same name as the repository.

We can easily clone repositories from GitHub. For example, to clone the COM6018 repository:

git clone https://github.com/UOS-COM-6018/COM6018.git
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Saving your own repository on GitHub

To save your own repository on GitHub you need to,

  1. Make yourself an account on GitHub, (visit https://github.com)
  2. create a new empty repository on GitHub
  3. push your local repository to GitHub.
Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved
COM6018 Data Science with Python

Demo

Copyright © Jon Barker, 2023, 2024 University of Sheffield. All rights reserved