This tutorial walks you through the steps of creating a reproducible
project with the worcs package. The learning goals are:
Checking the Installation
Open RStudio. Load the worcs package, and run the
installation check:
You should see all green checkmarks, optionally some “information” messages. If you see any failed tests - instructions should be printed on how to remedy the issue. Please follow these instructions. If you are in a “worcshop” with a live instructor, ask for help after you’ve tried to remedy the issues.
Creating a New worcs Project
In Rstudio, click
File > New Project > New directory > WORCS Project Template
Type an appropriate name for the remote Repository in its textbox. This name will be used to create a new GitHub repository on your account. For example, you could name it “demo_worcs_project”.
Keep the checkbox for renv checked if you want to use
dependency management (recommended).
For this tutorial - select “none” in the preregistration template
dropdown menu. You can always add a preregistration later using
add_preregistration().
For this tutorial, select the manuscript template “github_document”, which has few dependencies. Optionally, you can choose a different template.
Select a license for your project (we recommend a CC-BY license, which allows free use of the licensed material as long as the creator is credited).
When you click “Create Project”, the new project should open in RStudio (either in a new, or in the current session.
Verify that you see a README.md file, which is the
welcoming page for users of your repository. Edit this template to
explain how users should interact with the project.
Prepare a dataset using prepare_data.R
The data preparation script should turn your source data into an
analysis-ready data.frame (or other data object, but you
will need to specify custom functions for reading and loading the data
in that case).
Two important steps usually occur before data is added to a repository:
- Removing any and all potentially identifying information in the case of sensitive data
- Minimal data cleaning required to store the data to a file. The remainder of the data cleaning will be done reproducibly.
You can use your own data. If you don’t have your own data, you can use some demo data:
- Allison Horst’s Penguin data, available in different file types at https://cjvanlissa.github.io/worcshop/
- Data on the effect of different methods of drying the hands on hang time in rock climbing.
Below is a minimal prepare_data.R script. Adapt it for
your own data (which will require you to copy the file to your
worcs project directory, and load them into memory).
# Inside prepare_data.R:
library(worcs)
# Example methods of loading a data file
# df <- readxl::read_xlsx("penguins.xlsx", 1)
# df <- foreign::read.spss("penguins.sav", to.data.frame = TRUE)
# df <- read.csv("penguins.csv", stringsAsFactors = FALSE)
# Example data
df <- iris
# Remove a colum containing "potentially identifying information"
df[["Species"]] <- NULL
# Inspect the prepared data
descriptives(df)Add the Dataset to the Repository
End the file prepare_data.R with the following command
to save the prepared dataset and publish it on GitHub. If you do not
want to publish your data on GitHub, use closed_data()
instead. This tutorial assumes you use open_data().
open_data(df)To confirm that the project now knows how to load the dataset, remove
df from the environment, then run load_data()
in the console:
Add some demo analyses
Open your manuscript.Rmd file. There, edit an existing
code chunk, or remove them and create a new code chunk. First, load
worcs and the data we just created:
Now, add some mock analyses.
You can insert your own analysis code, or play around with the following functions:
# Descriptive statistics
res_desc <- descriptives(df)
write.csv(res_desc, "res_desc.csv", row.names = FALSE)
# Simple model
mod <- lm(Sepal.Length ~ Sepal.Width, data = df)
res_mod <- summary(mod)
write.csv(res_mod$coefficients, "res_coef.csv", row.names = FALSE)Note that, in this code, we write the results to spreadsheet files. You can also print them in the document, for example using:
knitr::kable(res_mod, caption = "My regression model coefficients, for a model with $R^2 `r report(res_mod[['r.squared']])`$.")