# tl;dr

This post is a basic and non-comprehensive introduction to Git and GitHub, with a focus on Mac users who are starting a project and working alone.

This is the second part of a series in using Git and Github. See the first part about basic use of the command line to interact with files and folders on your computer.

The post is quite lengthy and may change a bit over time as I realise the things I forgot to mention.

Here’s a quick reference for some of the commonly-used Git commands used in this post:

Command Short explanation
git init Put a folder under version control with Git
git status Show which files are new or changed
git add <file> Confirm the changes to be committed
git commit -m "<message>" Commit the changes with an explanation
git push Send your commits to GitHub

# Version control

It’s good to store intermediate versions of your files. The process of ‘version controlling’ means you can revert to prior states and prevents the confusion of having folders filled with things like file_FINAL.text, file_comments2 and file_comments_FINAL. Version control can be achieved in a number of ways, but one popular approach is to use Git software and the GitHub website.

## Tools

Git is an open-source version control system that tracks file changes on your computer, records why they were made and lets you revert to a past state if something goes wrong. Alternatives include Mercurial and Apache Subversion.

GitHub is a Microsoft-owned website where you can store the Git history of your files remotely on the web. It acts as a back-up for your work and its versions; lets you explore your code in a browser; lets other people see your code in the open, use it themselves and collaborate with you; lets you record issues and to-do items; and much, much more. Alternative products include GitLab and BitBucket.

## This post

The goal of this post is to put a new project folder under version control with Git and then upload the version history to GitHub. I assume you’re a beginner and are, for now, working alone on a project.1

We’ll be using the bash language at the command line to do all of this. In this post, lines should be typed into the command line when the example starts with a $ (you don’t need to type the dollar symbol yourself). The command line is the place where you write instructions directly to your computer and you can access it via software like ‘Terminal’ on a Mac. You may want to see my earlier post for how to get basic stuff done at the command line before you continue. It’s okay if you aren’t comfortable the command line. The bottom line is to get your work under some form of version control. You could always use GitHub’s drag and drop interface without needing to use the command line at all. There are guides available for using GitHub without using the command line. # Step-by-step Let’s break down the steps required to: 1. Set up Git and GitHub (do this once) 2. Put a project folder under version control with Git (once per project) 3. Save a version of your work and record it on GitHub (rinse and repeat during your project’s development) I’ve written a section for each of these things. Click a link to jump to that section. ## 1. One-time tasks You need to install Git on each computer you’ll be working from (feasibly only one computer), but only need to sign up once for GitHub. ### Install Git Git is basically a bunch of functions you can use via the command line, with each command preceded with the word git. You can check to see if Git is installed on your machine by asking for its version number from the command line: $ git --version
git version 2.20.1 (Apple Git-117)

This response shows that I already have Git installed. If the git command isn’t recognised then you will need to download Git and install it.

Once installed, Git needs to know who you are so that file changes can be ascribed to you. Introduce yourself by supplying your name and email address:

$git config --global user.name 'Your Name'$ git config --global user.email 'youremail@address.com'

These details are written to a special configuration file. You can print its contents to check it’s worked. Here’s what happens when I do that on my machine:

$git config --global --list user.name=Matt Dray user.email=mwdray@gmail.com ### Create a GitHub account GitHub is free to use but you need an account. Go to the GitHub website and click ‘Sign-up for GitHub’. You’ll need to provide a username, email address and password. When prompted for a subscription level, choose ‘free subscription’. You’ll be able to do everything you need to do with this type of account; a premium account is more suitable for teams. You’ll be sent an automated email to verify your email address. Click the button in the email to confirm, then check that your GitHub profile page is viewable at github.com/yourusername. ## 2. Project set-up The following subsections are about creating a project folder and putting it under version control by activating Git. ### Create your project folder It’s good practice for each project to exist in its own repository (folder), so all the code, data, tests and documentation are in the same place. Let’s create a project folder now. Using the command line, navigate with cd to where we want the folder to be, create it with mkdir and then navigate into it: $ cd ~/Documents
$mkdir demo-project$ cd demo-project
$pwd /Users/matt/Documents/demo-project You could also create a new folder by pointing and clicking; just make sure you navigate to the folder at the command line with cd. ### Initiate Git So we have our empty project folder, but it’s not yet under version control. How can we make Git aware of this folder and start tracking changes? We can initiate Git there: $ git init
Initialized empty Git repository in /Users/matt/Documents/demo-project/.git

What did this do? If you look inside the folder through your file explorer you’ll see nothing new. But look what happens when you ask at the command line to list (ls) all (the -A flag) the files and folders (marked by a / with the -p flag):

$open .gitignore Now to add things to the gitignore file. You could start by copying from a template, like the one for R. As an example, the first file in that template is .Rhistory – a log of all the commands you’ve typed – which you probably don’t need to track. You could add things to your gitignore like sensitive-data.csv to ignore a specific data set, or something like *.csv (meaning any file name that ends with .csv) to ignore all files of a particular type. Mac users will also want to add .DS_Store, which is a kind of Mac-specific metadata file. ### Create a remote GitHub repository Everything we’ve done so far is on your machine; it’s all ‘local’. All the files we add and change will be local until we choose to send a copy of the information to GitHub. Now we’re going to create a repository (folder) for the local information to be stored on GitHub. This location is ‘remote’. We’re doing this now as part of the project set-up steps, but you needn’t necessarily do it until later when you finally want to upload your version control history to GitHub. To begin, make sure you’re logged into GitHub and click the ‘plus’ symbol in the upper right corner and then ‘New repository’. You can then input a repository name (i.e. the name of the project folder you created) and a short description. You can choose to keep the project private, but far better for it to be public. You can also start this remote repository with a ‘readme’ file, ‘gitignore’ and a license if you want, but in this example we’re going to do these steps locally on our own machine. Click the green ‘create repository’ button to continue. This sets-up the remote repository on GitHub, but it’s empty right now. We haven’t sent any information yet. This will be the final step of the next section. For now, let’s associate our local repository with the remote repository: git remote add origin https://github.com/matt-dray/demo-project.git Remember to change the file path to match your the URL to your project. ## 3. A Git/GitHub workflow So far nothing has been put under version control; we’ve just been doing some set-up steps. Now we’ll create a file structure and start using Git and GitHub. ### Work on your files Start adding project files to your folder. Let’s say you’ve added a data set in raw-data/data.csv; an analysis.R script to analyse it; and a very important README.md file to to explain what your project is about. Let’s say the CSV file contains these data: number,position,name,nationality 1,"GK","Nicky Weaver","England" 9,"FW","Paul Dickov","Scotland" 10,"FW","Shaun Goater","Bermuda" 22,"DF","Richard Dunne","Ireland" 29,"MF","Shaun Wright-Phillips","England" And the R script looks like this: mancity <- read.csv("raw-data/data.csv") # read data mancity[mancity$position == "FW", ]  # filter for forwards

And the readme file, written here using Markdown, says:

# demo-project

This is a demonstration of how to use Git and GitHub. It accompanies a blog post on [rostrum.blog](https://rostrum.blog).

So the file tree ends up looking like this:

demo-project/
├── analysis.R
├── .git
├── .gitignore
├── raw-data/
│   └── data.csv
└── README.md

Great, now let’s capture this version with Git.

### Check file status

Run git status to see the files that have been created or modified.

$git status On branch master No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) .gitignore README.md analysis.R raw-data/ nothing added to commit but untracked files present (use "git add" to track) All our new files and folders are listed under ‘Untracked files’. By ‘untracked’ Git is telling us that these are new files that haven aren’t yet under version control. All the files will be new the first time you run this process, of course. Note that any files that are in your .gitignore will not appear here, since they’ve been ignored. Helpfully, the output of git status also tells us what we need to do next: we need to use git add. ### Add changes Now you need to earmark the files whose changes you want to record. This process is called staging. The stage is a safe place where you can add files to be captured; they’re not yet confirmed. This is a preventative measure so you don’t accidentally record anything you didn’t mean to. As mentioned in the output of git status, we can type git add <file> to do this. We have three files to add, so we can name them all: $ git add .gitignore README.md analysis.R raw-data/

You could also use git add ., where the period means ‘all the files’. Be careful if you use this – you don’t want to end up capturing things you didn’t mean to.

You won’t get any output, so how do we know this worked? Check the status again.

$git status On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: .gitignore new file: README.md new file: analysis.R new file: raw-data/data.csv These files are now staged and marked as ‘changes to be committed’. If you made a mistake at this point and want to remove files from the stage, you can type git rm --cached <file>. This line is helpfully printed in the output from git status. ### Commit changes When you’re happy with the files that have been staged, you can now commit them. This confirms that you want Git to record the changes made to these files. You commit the staged changes using git commit. It’s good practice to leave a message explaining what changes were made and why. You can leave a simple message using the -m flag of git commit2. The message should be short (typically less than 50 characters) but informative (not just ‘changed stuff’). It’s convention to start with a verb, like ‘add this thing’ or ‘fix that thing’. $ git commit -m "Add readme, gitignore, script and data"
[master (root-commit) 133733b] Add readme, gitignore, script and data
4 files changed, 50 insertions(+)
create mode 100644 .gitignore
create mode 100644 analysis.R
create mode 100644 raw-data/data.csv

The output confirms the files that have been added. It also tells us how many insertions (characters added) and deletions (characters removed) there were. The create mode lines reference that a file has started to be tracked.

How often should you make commits? It’s up to you, but it’s probably a good idea to do it when you’ve completed a section of code that does something useful and that you might want to revert to if there’s a problem later.

Just to prove that we’re up to date, we can check the status of the repository once more.

$git status On branch master nothing to commit, working tree clean Cool. Everything’s up to date. ### Add and edit files (repeat as necessary) Continue to create and edit files, adding and committing them when necessary. For example, let’s add a row to the data set: 22,"FWD","George Weah","Liberia" And add this line to the bottom of our README file: The data are a selection of [Manchester City Football Club players from the 2000 to 2001 Premier League season](https://en.wikipedia.org/wiki/2000%E2%80%9301_Manchester_City_F.C._season). Given that we’ve changed our files and want to store this in our Git version history, let’s go through the status-add-commit workflow again. First, let’s check the status of the files: $ git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)

modified:   raw-data/data.csv

no changes added to commit (use "git add" and/or "git commit -a")

So the altered files are now marked as ‘modified’. Let’s add them both; I’ll use the period shortcut to add both without typing their full names:

$git add . One more look at the status to check it’s done what we wanted: $ git status
On branch master
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)

modified:   raw-data/data.csv

And now to commit with a message:

git commit -m "Add King George to the data, update readme with Wikipedia link"
[master 1359d3f] Add King George to the data, update readme with Wikipedia link
2 files changed, 3 insertions(+)

And lets check we’re up to date:

\$ git status
On branch master
nothing to commit, working tree clean

Boom. We’re up to date.

You can rinse and repeat this edit-status-add-commit to store versions of the files in your repository.

### Push changes to GitHub

Great, you’ve used Git to version control your files. But the commits are stored locally on your machine. How do you send them to GitHub so you have a remote copy of your version history?

Remember when we set up the GitHub repository for this project earlier on? Now we’re ready to send the version history from our local project folder to this remote repository.

You do this by ‘pushing’ the local information to GitHub.

git push -u origin master
Counting objects: 12, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (12/12), 1.52 KiB | 388.00 KiB/s, done.
Total 12 (delta 3), reused 0 (delta 0)
remote: Resolving deltas: 100% (3/3), done.
To https://github.com/matt-dray/demo-project.git
* [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.

The output confirms that the push was successful.

If you go now to your repository on GitHub, you’ll see that your files have been added. For example, I would navigate to https://github.com/matt-dray/demo-project (this is a real link to the project used in this post!).

You can see the files at the top and our README.md has helpfully been rendered by GitHub so that people can see what the repository is all about.

There’s lots of stuff you can do from here. For example, click ‘Commits’ above and to the left of the list of your files. You can see each of the commits we made earlier.

You can do things like explore the repository at any given commit (click the <> button) or if you click the alphanumeric code (a unique code given to each commit) you can see the changes that were made with that commit. In this split view, the left panel is the state of each file before the commit, with changes on the right. Additions are green and start with a +; deletions are red with a — symbol.

So now you have a cloud backup of your work and can easily navigate the changes and explore the various snapshots of your work.

Now you can continue to work locally on your files and push the changes to GitHub when you’re ready.

# What next?

This post has been for beginners to get to grips with the fundamental Git commands init, status, add, commit and push.

Git is extremely powerful and does so much more that wasn’t covered here. A sensible next step would be to learn more about ‘branching’ in particular. This involves working on an isolated copy of the repo so you can work safely on new features without fear of ruining the main, or ‘master’ branch of work. If your development is successful, you can merge it back into the ‘master’ branch with a ‘pull request’.

You will also want to think about working collaboratively on a project. GitHub has lots of facilities for encouraging team work, like issue tracking and the ability to comment on changes. If you’re working on a new feature in a branch you can request a review from your colleagues to merge your work into the main ‘master’ branch; this is called a ‘pull request’.

The concepts in this blog post and more are explained well in other blog posts, articles and books. I particularly recommend Happy Git and GitHub for the useR by Jenny Bryan, the teaching assistants of STAT545 and Jim Hester, which focuses on R users in particular.

1. In fact, the real audience are the participants of the UK government’s Data Science Accelerator. Participants are analysts who may have little knowledge of programming.

2. You can also commit with both a short message and a longer description. To do this, run git commit without the -m flag. By default on macOS this will open Vim, a text editor, inside Terminal. Type in your comments and then type Esc : w q to ‘write-quit’ (save and exit) and return to the command line.