02| File Structure, the Terminal, and Git

Miles Robertson, 10.27.23 (edited 01.10.24)

Introduction

Computers are much faster and more accurate than humans. However, that speed and accuracy is ornamental unless the user is capable of giving the computer instructions to follow.

The earliest computers were made to run only a single program, like adding two large numbers together. Having to physically rewire computers to run different programs meant that computers were only narrowly applicable.

Later computers used central processing units, which are small components capable of executing a series of simple calculations, storing and accessing partial results along the way. To give instructions to these computers, users would feed punch cards into the computer, where metal pins would poke across the card: if a hole was present, the pin would touch a metal barrel beneath the card, completing a circuit and communicating a 1 to the computer. If there was no hole, the paper would block the circuit from being completed, communicating a 0 to the computer. This improvement meant that a single computer could make millions of types of calculations instead of just one. But further improvements were required to increase the speed and robustness of human-to-computer communication.

After several innovations and iterations, these improvements were achieved through the keyboard and terminal. A screen would display a blinking cursor, and a user could type commands for the computer to execute.

You know the rest of the story, where Microsoft and Apple later came out with operating systems that would show pictures and icons that could be navigated with a mouse. These advancements made computers approachable for novices, and made the computer a common household item. No longer did users need to remember the commands to type, they could simply click corresponding buttons on a screen. However, these graphical user interfaces (GUIs), as they were called, came with costs to speed and flexibility.

Counterintuitive to most modern computer users, your options for how to use your computer are limited by the buttons and icons on your screen, and your speed by how quickly you can move your mouse cursor from one side of the screen to the other. For this reason, the terminal still plays a critical role in the workflow for programmers today. In this chapter, you'll learn basic commands for the terminal, how to navigate folders, and how to use git to make the computer do more for you.

I feel that it is important to acknowledge that much of this may feel difficult and unintuitive at first. However, you will be surprised at how quickly this can become second nature. I encourage you to follow along with the instructions below, and to test out your curiosities as you go. Getting acquainted with it now will help you sort things out when you inevitably run into the need for these things later.

File Structure

You're certainly familiar with how to make a new folder on your desktop. I'm sure you could make a new folder within that one as well. However, you may not have been introduced to the idea that your desktop is itself a folder (aka directory), and it is inside its own folder, which is inside its own folder, etc. This doesn't go on for long, though. In a few steps, you'll hit the folder that contains everything: the root folder or root directory.

Called / on Mac and C:\\ on Windows, the root folder holds all folders and files you use on your computer. That means that every folder and file can be uniquely described by what's called a path name. For example, on my Mac, a file on my desktop called notes.txt has the path name /Users/miles/Desktop/notes.txt. If I was on a Windows computer, the path name would be C:\\Users\miles\Desktop\notes.txt. (For the rest of this chapter, I'll use the Mac path name, but the same principles apply to Windows. See the note in the box below for information about Windows path names.)

As you can see, the desktop is stored inside a folder called miles, which is the username for my computer. This folder, /Users/[your-username-here]/, is referred to as the home directory, and contains important folders like Desktop, Documents and Downloads. The home directory can be abbreviated to the symbol ~, so the path for the notes.txt file could also be expressed as ~/Desktop/notes.txt.

Path names that start from the root folder are called full path names. As is discussed in the next section, the terminal is always looking into a folder as its working. Whatever folder the terminal is looking in is called the current working directory (CWD) or simply working directory. Since the terminal will know where it is pointed at a given time, you can give a path name that is relative to the CWD, i.e., a relative path name. For example, if the terminal is looking into the home directory, you can refer to the notes.txt file by Desktop/notes.txt. Since the Desktop folder is within the home directory, the relative path is sufficient to describe where the file is. Going even further, if the terminal is looking at the Desktop folder, the relative path for notes.txt is just that: notes.txt.

In R programming, one of the most common uses of path names is to import data from a file. The following box gives instructions for how to quickly get the full path of a file on your computer.

TODO: get the full path of a file on your computer

  • Windows users: Locate the file you want the path name of in File Explorer. Hold shift as you right-click on the file. Select Copy as Path. It is now in your clipboard and can be pasted anywhere.
    • Note: Windows uses backslashes (i.e., \) in their path names by default. This is dumb because backslashes have a special use in programming that makes it so using backslashes in path names can break your code. To avoid this, you can simply replace C:// with C:\ in a full path, and replace all backslashes with forward slashes (i.e., /) when you paste the path name into your code. E.g., C://Users/miles/wow.txt becomes C:\Users\miles\wow.txt, and miles/wow.txt becomes miles\wow.txt. Windows will still be able to handle it.
  • Mac users: Locate the file you want the path name of in Finder. Hold option as you right-click on the file. Select Copy "[filename]" as Pathname. It is now in your clipboard and can be pasted anywhere.

The Terminal

The terminal is a program that allows you to give instructions to your computer by typing commands. The terminal is also called the command line or command line interface (CLI). The terminal is a powerful tool that allows you to do things that would be impossible or very difficult to do with a graphical user interface (GUI). For example, imagine that you have a folder of thousands of pictures that are named for the city and day they were taken, e.g., Chicago_Apr-12-23.png, Ontario_Jun-08-03.png, etc., and you want to copy all the pictures taken in Miami in 2009 to a new folder. With a GUI, you would have to move each file individually. With the terminal, you can do this in one line of code:

cp Miami*09.png new_folder

This command copies all files whose names start with Miami and end in 09.png and have whatever in between (the * is a wildcard token to represent any characters, not discussed below but mentioned here if you are curious) to the folder new_folder. Of course, the tradeoff is that you have to learn the commands to type, and then have to remember them when you need them. But the more you use the terminal, the more you'll find that it is faster and more flexible than a GUI.

The terminal is a program that runs on your computer. On Mac, the terminal is called Terminal. On Windows, the old version is called Command Prompt, whereas the new one is called PowerShell. Complicatedly, there is a different set of commands for each of these terminals, including between the two Windows terminals. For this reason, I recommend that Windows users use Git Bash, which is installed when you installed Git in the first chapter. Git Bash is a special terminal that runs on Windows that uses the same commands as the Mac terminal. This means that everyone can learn the same commands, and you can follow along with the same instructions as Mac users, which has preferable syntax. See the box below for instructions on how to open the terminal.

TODO: open the terminal

  • Mac users: Open the Terminal program. It can be quickly opened by pressing Cmd + Space, typing Terminal and pressing Enter.
    • Note: Older macs use the bash terminal, whereas newer macs use the zsh terminal. The commands are the same, and effectively the only differences you'll notice is the title of the window saying bash or zsh.
  • Windows users: Open the Git Bash program. It can be quickly opened by pressing Win, typing Git Bash and pressing Enter.
    • Note: Git Bash has the quirk that it will not let you copy/paste into the terminal with Ctrl + C/V. There are other keyboard shortcuts, but they're dumb. Instead, you'll probably do best right-clicking and select Copy/Paste.

Follow the instructions below in your open terminal.

When you open the terminal, you'll see a screen that is mostly blank, except for some text that follows something similar to one of these patterns: username@computername working_directory %, or username@computername:working_directory$. For example, mine looks like miles@Miless-MacBook-Pro ~ %. This is called the command prompt, and it is where you type commands for the computer to execute. Regardless of the exact pattern yours follows, you'll likely see a ~ in the command prompt, which is a shorthand for the home directory, as discussed above. For me, that means my terminal is looking into \Users\miles, my home directory. You'll see this command prompt every time you hit enter.

Below, I'll introduce the most common commands for the terminal. Each demonstration builds upon the one previous. You can follow along by pulling up this chapter and the terminal at the same time on your screen. I encourage you to try out the commands yourself, and to experiment with them along the way. I'll warn you about anything "dangerous", and you'll learn through practice. The most important commands to learn are the first four, so prioritize understanding and remembering those.

A few tips for ease: if you're part-way through typing a file name, you can hit tab to autocomplete the rest of the name, given that no other files/folders in the CWD start with the same characters. Also, you can hit the up arrow on your keyboard to cycle through your previous commands. The down arrow does the same in reverse.

  1. pwd, or print working directory: To start, check the full path name of the folder you're in by typing pwd and hitting enter. This stands for print working directory, referencing the working directory discussed above. If you're at your home directory, you should see something like /Users/[your-username-here].
  2. cd, or change directory: Next, let's change the working directory to be the desktop. To do this, type cd Desktop and hit enter. After doing so, run pwd again. You should see that the working directory is now /Users/[your-username-here]/Desktop.
  3. ls, or list: Next, let's see what is on your desktop. To do this, type ls and hit enter. This command will list all the files and folders in the working directory. You should see a list of all the files and folders on your desktop. Now, try typing ls -a and hitting enter. The -a is called a flag. Flags are used to modify the behavior of commands. In this case, the -a flag stands for all, and it modifies the ls command so that it lists all files and folders, including hidden files. Hidden files are usually configuration files that you don't need to worry about.

    You'll see that there are a few more files and folders listed, including . and ... These are special "folders" (quotations here because they're more like links to folders) that are always present in every folder. . is the CWD, and .. is the folder that contains the CWD (aka the parent directory of the CWD). You can use these in path names to refer to the CWD and the parent directory. For example, you can go up one folder from your CWD by running the command cd ...
  4. mkdir, or make directory: Now, let's make a new folder on your desktop. To do this, run the command mkdir new_folder. This stands for make directory, and it creates a new folder in the CWD. You can check that it was created by running ls again. You should see new_folder listed.
  5. touch, the file creation command: Now, let's create a new file in the new folder. To do this, first change directory into that new folder: cd new_folder. Then, run the command touch new_file.txt. The touch command creates a new, totally blank file with the name you give it. You can check that it was created by running ls again.
  6. mv, or move: Now, let's move the file to the desktop. To do this, first change directory back to the desktop: cd ... Run pwd again to make sure you were successful. Then, run the command mv new_folder/new_file.txt .. The mv command stands for move, and it moves the file from the first path name to the second path name. In this case, the first path name is new_folder/new_file.txt, and the second path name is .. As discussed above, . is the CWD, so this command moves the file from the new folder to the CWD, which is the desktop. You can check that it was moved by running ls again.
  7. mv again, but used for renaming: Now, let's rename the file. To do this, run the command mv new_file.txt renamed.txt. This command moves the file from the first path name to the second path name, but since the second path name is in the same folder as the first, it effectively renames the file. You can check that it was renamed by running ls again.
  8. cp, or copy: Now, let's copy the file. To do this, run the command cp renamed.txt renamed_copy.txt. This command copies the file from the first path name to the second path name. You can check that it was copied by running ls again.
  9. rm, or remove: This is the only command so far that can be considered "dangerous" in that you can use this command to delete things permanently. However, if you follow my instructions here, you'll be fine and sufficiently introduced to do your own research. Now, let's remove a file. To do this, run the command rm renamed_copy.txt. This command removes the file at the path name you give it. You can check that it was removed by running ls again.
  10. echo, a command to put text to the screen: This one is simple: simply type echo "Hello, world!" and hit enter, and you'll see that the terminal prints Hello, world! to the screen. This command is most helpful in conjunction with the next command.
  11. >>, an operator to append content to a file: If you've followed along so far, you should have an empty file named renamed.txt on your desktop. Say you want to add some text to it. You can do this by running echo "Hello, world!" >> renamed.txt and hitting enter. This command appends the text Hello, world! to the file renamed.txt. That is, it adds the text to the end of the file. You can check that it was appended with the next command.
  12. cat, or concatenate: This command lets you see the contents of a file. To do this, run the command cat renamed.txt. You should see the text Hello, world! printed to the screen. To play around with this a little more, try running echo "How are the wife and kids?" >> renamed.txt, and then again running cat renamed.txt (don't forget you can pull up previous commands using the up arrow key!). You'll see that using >> added the second line in addition to the first.

Although those were a lot of commands, I hope you realize that they're all relatively simple to use. Being even a little familiar with these commands will help you be a lot more comfortable when you see them out in the wild. Below is a brief summary of the commands you learned, and their syntax.

Error Messages in the Terminal

The terminal will tell you if you have asked it to do something it cannot do. For example, if you accidentally type la instead of ls, you will see an error message that says zsh: command not found: la or something like it. This of course means that the command la is not a valid command. When you see a message in the terminal after running a command, read it! Often times, we as computer users are conditioned to ignore error messages, but such messages in the terminal are often helpful, and if they are confusing, they can be googled to help you understand what went wrong.

Git and GitHub

To begin, I will give some disambiguation: Git is a version control system (i.e., it helps keep track of past versions of files), and is open-source. Git Bash is a terminal program that gets installed with Git. GitHub is cloud storage for your folders of code, and is run by a private company.

As stated above, Git is a version control system. This means that it keeps track of the changes you make to your files, and allows you to revert back to previous versions of your files. For example, say you're working on a project that has a data file (extension .csv) and an R script (extension .R). As you edit the data file and add lines of code to your R script, you might make a change that you later regret. Git allows you to revert both files back to previous versions. In reality, the biggest way that Git is used in our field is to allow multiple people contribute to the same coding project through GitHub. We'll start by talking about how Git works, and doing some practice along the way. Then, we'll talk about GitHub, and how it can be used to store your code in the cloud.

Git

The first step with using Git is to initialize a Git repository. A Git repository is simply a folder that is being tracked by Git. Let's use some of the commands from the previous section in the terminal to make a practice example.

First, make a new folder on your desktop called practice_repo or something of the sort. Then, change directory into that folder: cd practice_repo. Then, run the command git status. This command checks to see if there is already a git repository in the CWD. Since we just made this folder, there is not, so you should see a message that says fatal: not a git repository (or any of the parent directories): .git.

Next, run the command git init. This command initializes a new git repository in the CWD. You should see a message that says Initialized empty Git repository in /Users/[your-user-name]/Desktop/practice_repo/.git. From that message, note that there is now a new folder in practice_repo called .git. This folder is where Git stores all the information about the repository. You can see this folder inside practice_repo by runningls -a, which lists all files and folders, including hidden ones.

Now, run git status again. You should see a message that says On branch main, and that there are no commits yet. A commit is a "snapshot" of the files in the repository at a given time. You can think of it as a save point. You can make as many commits as you want, and you can revert back to any commit at any time. We'll now work on making our first commit.

First, let's make a new file in the repository. Run the command touch new_file.txt. Then, run git status again. You should see a message that says Untracked files:, and that new_file.txt is untracked. This means that Git is aware that there is a new file, but it is not tracking it yet. Note that git status is just a way for you to check the status of your repository, and is not required to do anything. It is just a helpful tool.

To start tracking that file, run the command git add new_file.txt. Then, run git status again. You should see a message that says Changes to be committed:, and that new_file.txt is new. The effect that git add has is that it tells Git to start tracking the file. This is sometimes called staging the file. In reality, you will almost never use git add on a single file like we have here, but rather on all files in the repository. To do this, you'll git add ., where the . represents the CWD as discussed in previous sections.

Now, let's make our first commit. Run the command git commit -m "my first commit". This command makes a commit, and using the required flag -m (for message), adds the message my first commit. You can see all the commits you've made by running the command git log. This command lists all the commits you've made, and the messages you gave them. You should see a commit with the message my first commit.

Everything we've done so far has been local to your computer. That is, you've been making commits to your local repository, which is just a folder on your computer. No internet connection is required to do what we've done so far. However, the real power of Git comes from being able to push your local repository to a remote repository, or in other words, being able to save to the cloud. This is where GitHub comes in. GitHub is a website that allows you to store your repositories in the cloud. It is free to use, and is the most popular website for storing code.

In the subsection below, I'll walk you through how to create a GitHub account, and how to push your local repository to GitHub.

GitHub

To start, go to github.com and create an account. Once you've done that, you'll be taken to your dashboard. Click on your account icon in the top right corner, and select Settings. On the left hand side, click SSH and GPG keys. Here, we need to tell your GitHub account to trust your computer. It will do this by using a public key that is unique to your computer. Follow the instructions in the box below to get your public key.

TODO: add an SSH key to your GitHub account

  • Open your terminal. Make sure you're in your home directory by checking the command prompt as described above. If you're not, run cd ~ to get there.
  • Run the command ls -a. If you see a folder called .ssh, you already have an SSH key and should skip the following step. If you don't see that file, continue to the next step.
  • Run the command ssh-keygen. This command will create a new SSH key for you. It will ask you where you want to save the key. The default location is fine, so just hit enter. It will then ask you to enter a passphrase. This is optional, so you can just hit enter again. It will then ask you to confirm the passphrase. Again, just hit enter. You should see a message that includes some weird symbols in a box that looks something like this:

    The key's randomart image is:
    +---[RSA 2048]----+
    |             00b*|
    |              +oo|
    |              o. |
    |        .     .. |
    |       S+     ...|
    |     0.=o    ....|
    |     o+SS=...o00 |
    |      +***=ooo.  |
    |       EEo00=+   |
    +----[SHA256]-----+

    If you see something like this, your key was created successfully.
  • Now, run the command cd .ssh, and then ls. You should see two files: id_rsa and id_rsa.pub. These are your private and public keys, respectively. You should not share your private key, but can share your public key. To do so, run the command cat id_rsa.pub. You should see several lines of characters that start with ssh-rsa. This is your public key. Select it and copy it to your clipboard.
  • In your GitHub account, under Settings > SSH and GPG keys, click the green New SSH key button. Give it a title (e.g., "MyMacbook"), and paste your public key into the textbox. Click the green Add SSH key button.

Now your GitHub account will trust your computer. This means that you can push (i.e., save) your local repositories to GitHub. The first time you do this after adding your SSH key, you will be asked to confirm that you trust GitHub. When that happens, type yes and hit enter. You won't have to do it again.

Now you'll be able to connect your local repositories to your GitHub account. Create a new repository by clicking your profile icon in the top right corner, and selecting Your repositories. Then, click the green New button. You'll be taken to a page where you can name your repository, and give it a description if you wish. For this case, you might name it practice_repo. Click the green Create repository button.

Once you've done that, you'll be taken to a page that gives you instructions for how to connect your local repository to your GitHub repository. However, we have already completed some of those steps (e.g., initializing the local repository with git init, adding with git add, and committing with git commit). So, you'll only need to do the last three steps:

git branch -M main
git remote add origin git@github.com:[your-username]/[repo-name].git
git push -u origin main

Note that the second line uses the SSH link (i.e., starts with git@github.com), not the HTTPS link (i.e., starts with https://). Although each line here has a purpose, you only have to run this when you create a new repository, so I'll leave further research to you if you're curious. The first time you push, Git may ask for your email or username with the prompt below. This refers to the email or username of your GitHub account. Pick one of the two options, include your GitHub username or email, and run the command.

*** Please tell me who you are.

Run

    git config --global user.email "you@example.com"
    git config --global user.name "Your Name"

to set your account's default identity.

If this new repository that you have created were something that you were actually working on, here's what your workflow would look like after this setup:

Conclusion

In this chapter, you were exposed to a lot of information about the terminal, Git, and GitHub. As stated before, this information can be overwhelming at first. However, the more you use these tools, the more comfortable you'll become with them, and the more you'll get done while you code. Below, I've given you some practice exercises to help you get more comfortable with these tools.


Practice

Use Basic Terminal Commands

In this section, you'll practice using the terminal. Use the commands you learned above to follow the sequential instructions below.

  1. Open your terminal.
  2. Change the directory to your desktop.
  3. Make a new folder called test.
  4. Change the directory to test.
  5. Make a file called test_file.R.
  6. Append the line "print(1)" to this file.
  7. Make a second file in this folder called test_file.txt.
  8. Append the line "Hello, world!" to this file.
  9. Delete the file test_file.R.
  10. Print the contents of test_file.txt to the screen.
  11. Make another file in this folder called test_file_2.txt.
  12. List all of the files in this folder on the screen.

Use Git in the Terminal

Here, you'll use the results of the previous practice exercise to practice using Git in the terminal.

  1. Initialize a local Git repository in test.
  2. Stage and commit all the files in test.
  3. Create a remote GitHub repository called test on their website with your account.
  4. Connect your local repository to the remote repository.
  5. Add a file to your local repository called README.md.
  6. Append the line "#Hello my friends" to this file.
  7. Stage and commit all these files again.
  8. Push the changes to the remote repository.