“Open science is just good science” (Jonathan Tennant, 2018)
Formal definitions:
Relevant to openness and reproducibility:
Sterling, 1959:
NO MORE manuscript_final_final_SERIOUSLYFINAL.doc
“Track Changes” on steroids: record entire project history
If something breaks, you can figure out what happened.
Facilitates collaboration and experimentation!
Tracks changes to (text-based) files line by line:
One command in worcs
: git_update("Describe your changes")
Image credit: Software Carpentries
worcs
repository is backed up in a remote repository like GitHub;
GitHub is a “cloud backup” with “social networking” features
GitHub can be used to ‘tag’ specific states of the repository, e.g. a preregistration.
R-packages
renv
installs all dependencies from the listworcs
check_worcs_installation()
git_update("Commit message")
rticles
, papaja
, and prereg
@essential
and @@nonessential
targets
open_data()
:.csv
(text based, human / machine readable)closed_data()
:synthetic()
.csv
)load_data()
:.worcs
file; default read.csv()
manuscript.Rmd
)rmarkdown::render("manuscript.Rmd")
)manuscript.pdf
, table1.csv
)worcs::reproduce()
generates the endpoints from the entry point via the recipe
worcs::check_endpoints()
verifies that the results are identical
worcs
is a good starting point for new R-users
check_worcs_installation()
Reproducing in the cloud
Definition: Software engineering practice where (new) code is subjected to tests to ensure correct functioning and catch mistakes.
We can apply integration testing to ensure that:
WORCS facilitates making analyses reproducible; integration tests verify reproducibility
Why: Increase trust in scientific findings by verifying that the study yields the reported results
Some journals now have “reproducibility editors” who perform these checks (e.g., Research Synthesis Methods)
worcs
provides functionality for integration testing research code
Key Concepts: Define an entry point, endpoint(s), and a recipe to get from the entry point to the endpoint(s).
manuscript.Rmd
manuscript.pdf
rmarkdown::render("manuscript.Rmd")
add_recipe()
to customize recipeadd_endpoint()
to start tracking files as endpointssnapshot_endpoints()
to update state of endpoints
reproduce()
runs the recipe to reproduce the projectcheck_endpoints()
verifies that endpoints remain unchanged after reproduce()
GitHub Actions is a continuous integration platform that allows you to:
reproduce()
or check_endpoints()
You can add a reproducibility status badge to your README.md
¯\_(ツ)_/¯
"It works on my machine"
GitHub Actions can automate reproducibility checks by:
github_action_reproduce()
:
github_action_check_endpoints()
worcs::reproduce()
on GitHub via GitHub Actions:
Sometimes it SHOULD fail: if your analysis has changed
Sometimes it should NOT fail
renv::snapshot()
and renv::restore()
)worcs
with targets
Making projects reproducible often involves frequently re-running code to ensure results are still valid.
targets
A “pipeline tool” for (computationally demanding) R-projects
This does not overlap with, but perfectly complements, worcs
workflows
Any pipeline tools does the following:
The most famous pipeline tool is the general purpose GNU Make
targets
is an R-exclusive pipeline toolworcs
To use targets
in a WORCS project:
_targets.R
script.manuscript.rmd
fileworcs::add_targets()
A targets
workflow is executed by running targets::tar_make()
worcs
sets the recipe to targets::tar_make()
, so worcs::reproduce()
also executes the pipelineworcs
makes sure that the last step of the pipeline is to render an Rmarkdown to report the resultsResults from the pipeline can be loaded into an Rmarkdown document using: - targets::tar_load(result_name)
- targets::tar_load_everything()
This integrates the pipeline results into your dynamic document.
Often, rendering the Rmarkdown document will be the final step of your pipeline
targets
MarkdownYou can run targets
directly within an Rmarkdown file by:
Warning: Running code interactively in combination with tar_make()
may introduce bugs. It is safer to only use tar_make()
.