A Workflow for Open Reproducible Computational Social Science

Caspar J. Van Lissa, Brandmaier, Brinkman, Lamprecht, Peikert, Struiksma, & Vreede (2021),
Data Science, DOI: 10.3233/DS-210031

Today

Start	End	Dur.	Topic
9H 0M	9H 20M	0:20	Pres.: Workflow for Open Reproducible Computational Social Science
9H 20M	9H 30M	0:10	Discussion and Q&A
9H 30M	10H 0M	0:30	DIY: Setting up a WORCS project with a FAIR Theory
10H 0M	10H 20M	0:20	Pres.: WORCS with targets: Sustainable reproducibility
10H 20M	10H 30M	0:10	Discussion and Q&A
11H 00M	11H 30M	0:30	DIY: Using `worcs` with `targets`
11H 3M	11H 50M	0:20	Pres.: Integration testing, or: catching errors early
11H 50M	12H 00M	0:10	Discussion and Q&A
12H 00M	12H 30M	0:30	DIY: Integration Testing OR Parallel Computing

Defining Open Science

“Open science is just good science” (Jonathan Tennant, 2018)

Formal definitions:

TOP guidelines (Nosek et al., 2015)
FAIR principles (Wilkinson et al., 2016)

Meeting TOP guidelines…

Relevant to openness and reproducibility:

Citation of literature, data, materials, and methods;
Sharing data;
Sharing the code required to reproduce analyses;
Sharing new research materials;
Sharing details of the design and analysis;
Pre-registration of studies before data collection;
Pre-registration of the analysis plan;
Replication of published results.

…in a FAIR manner

Findable
- Via standardized repositories and search engines
- With Digital Object Identifier (DOI)
Accessible online for humans and machines
- Long-term storage
Interoperable
- Open file type
Reusability
- License data, code, and materials for reuse

Why open science?

Sterling, 1959:

Why open science?

Scientiﬁc fraud (Levelt, Noort, & Drenth, 2012)
Questionable research practices (John, Loewenstein, & Prelec, 2012)
P-hacking / p-ritual (Gigerenzer, 2018)
Replication crisis (Shrout & Rodgers, 2018)

So: Open science as punishment for bad science?

Open science as a paradigm shift

Open Science creates opportunities to make science more

reliable,
cumulative,
collaborative,
inclusive

Why Reproducibility?

Every analysis has “inductive bias” (Sterkenburg and Grünwald 2021)
- What we learn from the data depends, in part, on how we analyze it
Implicit steps in the analysis make inductive bias intractable (Peikert 2023)
- We don’t know how, or how much, they influenced our results
Reproducibility makes inductive bias tractable
Reproducible code enables quantifying and studying inductive bias
Ideally, your whole analysis is a sealed “pipeline”: data in, results out
Multiverse analysis: conducting a study of the impact of all “reasonable” analysis decisions on the estimand/conclusion (Steegen et al. 2016)
Reproducibility -> Scalability
- Apply same method in new study
- Redo analysis when new data come in
- Incorporate analysis into application for stakeholders, etc

Reproducibility is Challenging

Where do you start?

What tools do you need to learn?

What workflow is right for you?

Introducing WORCS

Workflow for Open Reproducible Code in Science

Standardized workflow
Low threshold, high ceiling
Conceptual platform-independent principles
Van Lissa et al. (2021)
“One-click” solution for R-users:
https://cran.r-project.org/package=worcs
Defaults based on best practices (several experts contributed)
Compatible with journal/university requirements and other workflows
Pulling down the learning curve!

The tools

1. Dynamic document generation

2. Version control

3. Dependency management

1. Dynamic document generation

Paper consists of text and code
Results, figures, and tables automatically generated
Formatted as APA paper (including citations!)

Important because:

Save time from copy-pasting output and formatting paper
Eliminate human error in copying results;
When revising the paper, all results are automatically updated;
Reproducible by default: Just generate the document

R Markdown example

R Markdown example rendered

data-analysis citation

2. Version control (using Git)

Why version control?

NO MORE manuscript_final_final_SERIOUSLYFINAL.doc
“Track Changes” on steroids: record entire project history
If something breaks, you can figure out what happened.
Facilitates collaboration and experimentation!

2. Version control (using Git)

Tracks changes to (text-based) files line by line:

add files to your repository
commit changes to these files
push all commits to remote repository (private backup or public online supplement)

One command in worcs: git_update("Describe your changes")

Image credit: Software Carpentries

Introducing GitHub

worcs repository is backed up in a remote repository like GitHub;
GitHub is a “cloud backup” with “social networking” features
- Clone other people’s repository to reproduce or build upon them
- Open Issues with questions or comments about the work
- Send suggested changes as a “Pull request”
GitHub can be used to ‘tag’ specific states of the repository, e.g. a preregistration.

Important because:

Complete backup of entire project history
- Go back to previous version if you want
- Try new things, don’t worry about losing work
- Prove that you preregistered your plans and followed them
Easy collaboration online (even with strangers)
- People can copy your project and build on it
GitHub can be your preregistration, your research archive, supplementary materials, comments section, etc.
Connects to OSF.io project page
- Improves Findability
- Get DOI for project and/or specific resources
Connects to Zenodo
- Get DOI for project and/or specific resources
- Store project snapshot

3. Dependency management

To make project reproducible, people must have access to your (exact) software dependencies
- For R-users, these are R-packages
Difficult trade-off:

Dependency management in WORCS

Maintains text-based list of packages, their version,
and origin (e.g., “CRAN”, “Bioconductor”, “GitHub”)
This list can be version-controlled with Git;
When a user loads the project,
renv installs all dependencies from the list

Important because:

Essential for reproducibility
Good for collaboration (everybody has same versions)
Nice to your “future self”: Your code will work in the future

Unique features in `worcs`

RStudio template
Automatic installation check: check_worcs_installation()
Easy GitHub integration
- Add URL during project creation
- git_update("Commit message")
- Automatically reproduce results in the cloud!
Manuscript and preregistration templates
- From rticles, papaja, and prereg
- Original templates for secondary- and longitudinal data
Data sharing solutions
Cite @essential and @@nonessential
Integration with targets
WORCS checklist and badge

Reproducing WORCS Project

Create entry point (e.g., manuscript.Rmd)
Define recipe (e.g., rmarkdown::render("manuscript.Rmd"))
Snapshot endpoints recipe (e.g., manuscript.pdf, table1.csv)

worcs::reproduce() generates the endpoints from the entry point via the recipe

worcs::check_endpoints() verifies that the results are identical

For non-R-users

WORCS-paper addresses the conceptual workflow
Covers issues/decisions you have to consider for Open Science, regardless of software
worcs is a good starting point for new R-users
- Setup Tutorial to help install everything
- Tricky issues (like project management and using Git) are ~automatic when using the WORCS template
- Automatic check in case you get stuck: check_worcs_installation()
Learn good habits from the start; don’t reinvent the wheel

Find out more:

cjvanlissa.github.io/worcshop
cjvanlissa.github.io/worcs

`worcs` with `targets`

Redundant Computation

Making projects reproducible often involves frequently re-running code to ensure results are still valid.

Issue: Re-running unchanged code is redundant and time-consuming
Environmental Impact: Unnecessary computation increases carbon footprint (Gupta et al. 2021)
Some analyses take minutes, hours, or days
When you make a small formatting change to your Rmarkdown, you don’t want to re-run all analyses

What is `targets`

A “pipeline tool” for (computationally demanding) R-projects

Each step is only re-run if:
- The step changed
- Its inputs changed
Utilizes implicit parallel computing.
Tracks both R-objects and output files

This does not overlap with, but perfectly complements, worcs workflows

Pipeline tools

Any pipeline tools does the following:

Monitor dependencies of code chunks/functions
- Generates a so-called “dependency graph”, which shows the flow of information through processing steps (functions) in your project
Skip chunks whose code or inputs remain changed

The most famous pipeline tool is the general purpose GNU Make

targets is an R-exclusive pipeline tool

What Changes with Targets?

Every step in your pipeline becomes a function
The arguments of that function indicate what inputs the function listens to
- The package looks for changes to the function and its inputs to determine if it should be run
Encourages a clean, function-oriented programming style.
Helps researchers manage complex data analyses
Makes code chunks more portable (i.e., reusable in other projects)

Example Pipeline

Targets in `worcs`

To use targets in a WORCS project:

Select “Use Targets” in the project creation dialog.
- Define the pipeline in the _targets.R script.
Select “Target Markdown” as output format in project creation dialog
- Define the pipeline in the manuscript.rmd file
Use worcs::add_targets()

A targets workflow is executed by running targets::tar_make()

worcs sets the recipe to targets::tar_make(), so worcs::reproduce() also executes the pipeline
worcs makes sure that the last step of the pipeline is to render an Rmarkdown to report the results

Loading Results in Rmarkdown

Results from the pipeline can be loaded into an Rmarkdown document using:

targets::tar_load(result_name)
targets::tar_load_everything()

This integrates the pipeline results into your dynamic document.

Often, rendering the Rmarkdown document will be the final step of your pipeline

Alernatively: `targets` Markdown

You can run targets directly within an Rmarkdown file by:

Selecting “target_markdown” during project creation.
Manually incorporating a pipeline in other Rmarkdown templates

Warning: Running code interactively in combination with tar_make() may introduce bugs. It is safer to only use tar_make().

Integration testing

What is Integration Testing?

Definition: Software engineering practice where (new) code is subjected to tests to ensure correct functioning and catch mistakes.

Integration Testing Research Code

We can apply integration testing to ensure that:

Functions work as expected
Analyses are executed as intended
Results are reproducible upon repeated execution
- And on different systems

What to Test?

Individual Functions: Make sure that each function works as expected
- E.g., test that adding(2, 2) == 4 before adding() two unknown numbers
Data (after processing): Sanity checks / ensure proper loading and processing
- E.g., test that a known entry in the dataset is correctly retrieved; test range of scales, test for outliers
Analysis results: Test that models provide correct results (on dummy data/on real data)
Tables and Figures: Ensure that output (tables, figures) are correct
Final Document: If the final document is exactly reproducible, then all preceding steps should be too

Good Practices for Integration Testing

Test incrementally: Avoid only testing at the end; test every small step.
Automate tests: To run them frequently (e.g., after Git commit).
Use real data: Apply your test to messy real data
Use fake data: Apply your test to predictable fake data, to check that the outcome is as expected (e.g., when writing functions, during preregistration)

Integration Testing and Reproducibility

WORCS facilitates making analyses reproducible; integration tests verify reproducibility

Why: Increase trust in scientific findings by verifying that the study yields the reported results

Some journals now have “reproducibility editors” who perform these checks (e.g., Research Synthesis Methods)

Integration Testing Functions

worcs provides functionality for integration testing research code

Based on testthat: Test that functions behave according to your expectations

testthat easily integrates into existing workflows
But primarily targeted towards package design

Functions for Integration Tests

worcs::add_testthat(): Set Up testthat
usethis::use_test(): initialize a basic test file and open it for editing.
worcs::github_action_testthat(): add a GitHub action that evaluates the integration tests.
worcs::test_worcs(): Run all tests in worcs project
testthat::test_file(): Test a single file
- The output should tell you which tests passed or failed, and why they failed.

Integration Testing Reproducibility

Define an entry point, endpoint(s), and a recipe to get from entry- to endpoint(s).

Basic worcs:

Entry point: manuscript.Rmd
Endpoint: manuscript.pdf
Recipe: render("manuscript.Rmd")

With targets:

Entry point: targets::tar_make()
Endpoint: the named results of each step in the targets pipeline
Recipe: specified in _targets.R

Functions for Reproducibility Integration Tests

Entry point is set by default when project is created
add_recipe() to customize recipe
add_endpoint() to start tracking files as endpoints
snapshot_endpoints() to update state of endpoints
- Uses checksums
reproduce() runs the recipe to reproduce the project
check_endpoints() verifies that endpoints remain unchanged after reproduce()

GitHub Actions: Run your Tests Online

Avoid This:

It works on my machine ¯\_(ツ)_/¯

Introduction to GitHub Actions

‘GitHub’ allows you to run code in the cloud (= on their servers) to:

Automatically perform actions when your repository changes
For software: Build and test every commit
For research: Reproduce and check endpoints!

Components of GitHub Actions

Virtual Machines: A user-configurable “computer” that runs the action (Linux, Windows, macOS)
Workflows: The actions to be automated (e.g., check reproducibility)
Events trigger workflows (e.g., a commit)

Default Specifications

Virtual Machine: Latest Ubuntu Linux machine
Workflow: Install pandoc (for PDF), R, and the Renv package dependencies, run reproduce() or check_endpoints()
Events: Push to repository, and pull requests

You can add a reproducibility status badge to your README.md

Benefits of Automating Reproducibility

Breaking changes/unexpected behavior is detected and flagged
Functions and analyses work on another system
Increased rigor: systematically try to catch your mistakes
Sh#t filter: automatically check other people’s contributions
Reproducibility badge signals good practices

`worcs` GitHub Actions

github_action_testthat():
- Run a worcs project’s testthat integration test suite
github_action_reproduce():
- Reproducing the entire analysis remotely, then checking the endpoints
github_action_check_endpoints()
- Just checking that the endpoints remain unchanged

GitHub Actions

worcs::reproduce() on GitHub via GitHub Actions:

What if my Test Fails?

Sometimes it SHOULD fail: if your function/analysis pipeline has changed

Sometimes it should NOT fail

But you forgot to set a random seed
But you performed a step interactively that is not part of your analysis pipeline (e.g., non-linearly step through code)
Data has changed
Analysis interacts differently with different collaborators’ systems (Linux vs Windows)
Package versions are not correct (renv::snapshot() and renv::restore())
Etc.

References

Gupta, Udit, Young Geun Kim, Sylvia Lee, Jordan Tse, Hsien-Hsin S. Lee, Gu-Yeon Wei, David Brooks, and Carole-Jean Wu. 2021. “Chasing Carbon: The Elusive Environmental Footprint of Computing.” In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 854–67. https://doi.org/10.1109/HPCA51647.2021.00076.

Peikert, Aaron. 2023. “Towards Transparency and Open Science.” doctoralThesis, Humboldt-Universität zu Berlin. https://doi.org/10.18452/27056.

Steegen, Sara, Francis Tuerlinckx, Andrew Gelman, and Wolf Vanpaemel. 2016. “Increasing Transparency Through a Multiverse Analysis.” Perspectives on Psychological Science 11 (5): 702–12. https://doi.org/10.1177/1745691616658637.

Sterkenburg, Tom F., and Peter D. Grünwald. 2021. “The No-Free-Lunch Theorems of Supervised Learning.” Synthese 199 (3-4): 9979–10015. https://doi.org/10.1007/s11229-021-03233-1.

Van Lissa, Caspar J., Andreas M. Brandmaier, Loek Brinkman, Anna-Lena Lamprecht, Aaron Peikert, Marijn E. Struiksma, and Barbara M. I. Vreede. 2021. “WORCS: A Workflow for Open Reproducible Code in Science.” Data Science 4 (1): 29–49. https://doi.org/10.3233/DS-210031.

A Workflow for Open Reproducible Computational Social Science

Today

Defining Open Science

Meeting TOP guidelines…

…in a FAIR manner

Why open science?

Why open science?

So: Open science as punishment for bad science?

Open science as a paradigm shift

Open Science creates opportunities to make science more

Why Reproducibility?

Reproducibility is Challenging

Where do you start?

What tools do you need to learn?

What workflow is right for you?

Introducing WORCS

Workflow for Open Reproducible Code in Science

The tools

1. Dynamic document generation

2. Version control

3. Dependency management

1. Dynamic document generation

Important because:

R Markdown example

R Markdown example rendered

2. Version control (using Git)

Why version control?

2. Version control (using Git)

Introducing GitHub

Important because:

3. Dependency management

Dependency management in WORCS

Important because:

Unique features in worcs

Sharing data in WORCS

Use open_data():

Use closed_data():

Sharing data in WORCS

Loading data load_data():

Reproducing WORCS Project

For non-R-users

Find out more:

worcs with targets

Redundant Computation

What is targets

Pipeline tools

What Changes with Targets?

Example Pipeline

Targets in worcs

Loading Results in Rmarkdown

Alernatively: targets Markdown

Integration testing

What is Integration Testing?

Integration Testing Research Code

What to Test?

Good Practices for Integration Testing

Integration Testing and Reproducibility

Integration Testing Functions

Functions for Integration Tests

Integration Testing Reproducibility

Functions for Reproducibility Integration Tests

GitHub Actions: Run your Tests Online

Avoid This:

Introduction to GitHub Actions

Components of GitHub Actions

Default Specifications

Benefits of Automating Reproducibility

worcs GitHub Actions

GitHub Actions

What if my Test Fails?

References

Unique features in `worcs`

Use `open_data()`:

Use `closed_data()`:

Loading data `load_data()`:

`worcs` with `targets`

What is `targets`

Targets in `worcs`

Alernatively: `targets` Markdown

`worcs` GitHub Actions