Reproducible research with RMarkdown and Github

This first lab will serve as a general introduction to the major tools and platforms that we will be using throughout the semester, which are R, RStudio, and Git/Github. R is the name of the programming language itself and RStudio is a convenient interface, and both are installed on the lab computers. For this first lab, we request that you login to the RStudio Server https://rstudio.cos.gmu.edu, as installing and setting up everything takes some time.

RStudio

The lab instructor will guide you through how to launch RStudio for the first time. When you launch RStudio, you will see an interface that looks something like this:

plot of chunk rstudio-interface

The panel in the upper right contains your workspace as well as a history of the commands that you’ve previously entered. Any plots that you generate will show up in the panel in the lower right corner.

The panel on the lower left is where you can enter R commands. It’s called the console. Everytime you launch RStudio, it will have the same text at the top of the console telling you the version of R that you’re running. Below that information is the prompt. As its name suggests, this prompt is really a request, a request for a command. When you want to test out an R command and not necessarily save it to a file, this is the place to do it.

You will need to install some packages before going further. Enter the following command at the R prompt (i.e. right after > on the console) by typing it in:

install.packages("tidyverse")

You will see some output pass by. Just let it run, and when it’s completed, move on to the next section.

Getting started with Github

Please sign up for an account on Github using your Mason email address if you do not have one already. If you already have a Github account, please update your profile to use your Mason email address. If you would prefer to not do this, then create a second account using your Mason email address instead.

Obtain the Lab01 repository on GitHub through the link in Slack. Click “Clone or Download”, and make sure it says “Clone with SSH” in bold in the top left of the pop-up box. If not, click on the blue “use SSH” button on the top right of the pop-up box. Now copy the link in the box to your clipboard.

In RStudio, go to File -> New Project. Click Version Control, then Git. Paste the link you just copied into the Repository URL box. Leave the Project directory name blank. Click Browse and create a directory where you store your labs, for example as Documents/102labs, and select it. An RStudio project should now open up, which will allow you to start working on your homework assignment. You will probably see a blank console screen. However, in RStudio you should also see a list of all of the files available. Click on the lab01.Rmd file and edit away. If you save and close R Studio and want to go back to editing your project, open up R Studio, then go to File -> Open Project. Navigate to the project directory and double click on the .Rproj file.

Note that if you received an error in the above steps, you may have to clone with HTTPS instead of SSH. You can do this by again clicking on the “Clone or Download” button in the repository page, then clicking “Use HTTPS” in the top right of the pop-up box. Now copy the link and repeat this step.

Getting familiar with RMarkdown

Now that you’ve obtained your personal lab template from Github and opened it in RStudio, we can start learning about RMarkdown. The instructor will play this video after you’ve had time to sign up for Github to introduce what RMarkdown is:

As you can see, the combination of RMarkdown and RStudio combines text processing with the ability to execute code written in R. However, unlike software like Microsoft Word, there aren’t any formatting buttons for doing things like creating bold or italicized text. So how can you format your text? RMarkdown does this using symbols that stand for different types of text markups. For example, typing **bold text** in RMarkdown signals you want the words “bold text” to be bolded. Paragraphs are separated by a blank line, for example:

This is the end of the first paragraph.

This starts the next paragraph.
However, putting a line right below it will not start a new paragraph.

Let’s get some practice by typing in some examples!

There is an official cheatsheet available for you to download. It’s recommended that you have this available to you at all times. You may even want to print it out before next class so that you can easily reference it.

  1. Fill in your name and other information in the spaces at the top of the file if you haven’t already. Then, click the the “knit” icon just above the editing window (if a drop-down menu appears, choose “knit to HTML”). Based on the output that you see, explain what the ## and #### symbols are doing to the text.

Let’s write some more Markdown and see what it does.

  1. Type the code in the box below into your lab report file exactly you see it. Then, knit the file again and look at the output. Write a short explanation of what each markup symbol does.
*What happens when you surround text with one-star pairs?*

**What happens when you surround text with two-star pairs?**

***What happens when you surround text with three-star pairs?***

1.  Start typing this list. Note there are two spaces between the period and the word "Start".
2.  Type the second line of the list
1.  What happens if I type step 3 as another step 1?

*   What does this star at the beginning do?
*   Visually, it's similar to the numbered list.

1.  What happens if we nest a list?
    1.  Type four spaces, then type the number
    2.  Did this do what you expected?
2.  What if we contine the numbers this way?
    *   What happens if we indent using stars?
    *   Let's add another one for good measure.
        *   Can we get another level of nesting?
        
[What does this do?](https://google.com)
        
![How is this different from the above?](test-image.jpeg)

Interlude: How to save your work back to Github

Now that you’ve typed some material into your RMarkdown document, let’s take a snapshot of our work using what is called a “commit”. First, save your work in the traditional way by going to File -> Save or clicking the floppy disk icon just above the editing window. Next, click on the GIT toolbar dropdown menu and select “Commit”. You will see a list of one or more filenames. Click on one of them. You will see some text appear in the lower area. Text highlighted in green has been added to the file since the last snapshot, while text highlighted in red has been deleted. Next, to the left of the filenames is a checkbox marked “Staged”. Staging just means selecting the files that we want to use when taking a snapshot. Stage the file lab01.Rmd. Then, to the right, write a brief, informative message in the text box labeled “Commit message”. When you are done, click the commit button just below this text box.

Two things about committing. One, you should commit somewhat frequently. At minimum, you should try and make a commit each time that you’ve finished a lab exercise. Two, leave informative commit messages. “Added stuff” will not help you if you’re looking at your commit history in a year. A message like “Typed in Lab 1 RMarkdown examples” will be more useful.

Important note! The above process does all this saving and snapshoting on the local computer that you’re currently using. So far we have not synchronized anything back to the Github website. Without synchronizing back, you will not be able to ultimately submit the lab or have the lab instructor read and grade it. Also, and this is important, if you don’t synchronize, you will not be able to work on your lab report from another computer.

In the world of Github, synchronizing from our local computer to the Github website is called a “Push”. So let’s do that now. In the interactive commit window we opened earlier, in the upper right-hand corner you should see a button labeled “Push”, click that now. Alternatively, if the interactive commit window isn’t open, you can just click on the GIT toolbar dropdown menu and select “Push branch”.

The synchronizing can go in the opposite direction as well, which is called a “Pull”. One reason you would use “Pull” is if you are working on the lab report from more than one computer throughout the day. You would push your commits from the first computer, move to the second computer, and then use pull to get all the changes you made and synchronized.

More Markdown

Now that we’ve saved and synchronized our work on Github, let’s return to RMarkdown. During the rest of the lab, you are encouraged to commit early and commit often.

  1. You can create tables using Markdown, type in the following and knit to see it looks like. Do both tables look the same after being rendered? What are the snippets below each table doing?
| Column 1 | Column 2 | Column 3 | Column 4 |
| --- | ---: | :---: | :--- |
| Notice | what | the | colons |
| are | doing? | | |

Table: The table with poor spacing

| Column 1 | Column 2 | Column 3 | Column 4 |
| -------- | -------: | :------: | :------- |
| Notice   | what     | the      | colons   |
| are      | doing?   |          |          |

Table: The table with good spacing
  1. Copy-and-paste one of the above tables and then add a fifth column. There should be text in both rows of the fifth column.

Of course, there isn’t much point to this if we don’t include code snippets. Let’s try a couple of different commands, shall we?

  1. Type the following code block into your file, then click the green “play” icon that appears to the right of your code block. Explain what you’re getting as output.
```{r}
qplot(x = displ, y = hwy, data = mpg)
```
  1. What output do you get if you type the following in a code block? Do you see a connection between certain parts of this output and the one from the previous question? If so, what are they? If not, why isn’t there a connection?
```{r}
print(mpg)
```

Knitting your document

We’ve done enough demonstration for now. First, check that there are no errors when you click “knit”. Try knitting your lab report using both the Knit to HTML and Knit to PDF options and inspect the resulting documents. If the documents look good, then save your files and then stage, commit, and push your changes to the Github site. Do not try to commit and push the HTML and PDF documents, those will get uploaded elsewhere.

How to submit

When you are ready to submit, be sure to save, commit, and push your final result so that everything is synchronized to Github. Then, navigate to your copy of the Github repository you used for this assignment. You should see your repository, along with the updated files that you just synchronized to Github. Confirm that your files are up-to-date, and then do the following steps:

  • Click the Pull Requests tab near the top of the page.

  • Click the green button that says “New pull request”.

  • Click the dropdown menu button labeled “base:”, and select the option starting.

  • Confirm that the dropdown menu button labeled “compare:” is set to master.

  • Click the green button that says “Create pull request”.

  • Give the pull request the following title: Submission: Lab 1, FirstName LastName, replacing FirstName and LastName with your actual first and last name.

  • In the messagebox, write: My lab report is ready for grading @jkglasbrenner.

  • Click “Create pull request” to lock in your submission.

Credits

This lab is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Adapted by James Glasbrenner for CDS-102 from Lab 0 - Introduction to R and RStudio by Mine Çetinkaya-Rundel and the Github Classroom Guide for Students by Jacob Fiksel.