Creating a FAIR Theory
This tutorial takes the user through distinct steps involved in
making a theory FAIR. It uses the R-package theorytools
and
other specific software and platforms. As open science infrastructure is
an area of active development, the approach proposed here should not be
considered definitive, but rather, as one proposal for a FAIR-compliant
implementation of theory using infrastructure available at the time of
writing. The steps described in this tutorial are largely automated by
the function theorytools::create_fair_theory()
; expert
users might use this function directly.
Running Example: the Empirical Cycle
Given that we based our argument for the importance of FAIR theory on the empirical cycle, we use it as an example for this tutorial. The empirical cycle is a model of cumulative knowledge production through scientific research, described in De Groot and Spiekerman (1969) (p. 28):
Phase 1: ‘Observation’: collection and grouping of empirical materials; (tentative) formation of hypotheses.
Phase 2: ‘Induction’: formulation of hypotheses.
Phase 3: ‘Deduction’: derivation of specific consequences from the hypotheses, in the form of testable predictions.
Phase 4: ‘Testing’: of the hypotheses against new empirical materials, by way of checking whether or not the predictions are fulfilled.
Phase 5: ‘Evaluation’: of the outcome of the testing procedure with respect to the hypotheses or theories stated, as well as with a view to subsequent, continued or related, investigations.
Completing All Steps Manually
Creating a Project Folder
In the spirit of modular publishing, this tutorial assumes that
you’re creating your FAIR theory as a standalone project. While it is
possible for a theory to be implemented in a programming language like
R, it often is not - the empirical cycle described above is implemented
in plain text. Therefore, we will not create an R project (with an
.Rproj
file et cetera), but just a regular nondescript
project. This starts with creating an empty project folder. If we want
to create a new folder called empirical_cycle
in the
existing folder c:/theories/
, we can call:
project_path <- file.path("c:/theories", "empirical_cycle")
dir.create(project_path)
Version Controlling The Project Folder
We use Git to version control the project folder. If you do not already have Git installed on your computer, install it now. You can verify that Git is installed and working by running:
worcs::check_git()
If this function shows a green checkmark, you can initialize version control in your project repository by running:
gert::git_init(path = project_path)
Connecting to a Remote (GitHub) Repository
To make your FAIR theory accessible to collaborators and discoverable by the wider community, you must connect your local Git repository to a remote repository on a platform like GitHub.
Before proceeding, ensure you have a
GitHub account. Academics may qualify for a free upgrade. To
authorize R to interact with your GitHub account, run
usethis::create_github_token()
, which takes you to a
website to create a personal access token (PAT). Copy it, then run
gitcreds::gitcreds_set()
and paste the PAT when asked. If
you still experience problems try usethis::gh_token_help()
for help.
To check that you are ready to proceed, run:
worcs::check_github()
If you see a green checkmark, you can create a new repository on GitHub directly from R:
worcs::git_remote_create("empirical_cycle", private = FALSE)
This command will create a new public repository on GitHub and link
it to your local repository. The private = FALSE
argument
ensures the repository is public by default.
Alternatively, you may have already created a remote repository on
the GitHub website. Either way, assuming the name of that repository is
empirical_cycle
, you can connect it to your project folder
as follows:
worcs::git_remote_connect(project_path, remote_repo = "empirical_cycle")
Adding a Shareable Theory File to the Repository
Your theory should be represented as a digital artifact, such as a structured plain-text document or a machine-readable file (e.g., DOT, JSON, YAML, R code). At this point, we offer two alternatives.
Creating a Plain-Text Theory
You could simply copy De Groot’s implementation of the empirical cycle into a plain text file, like so:
writeLines(
c("*Phase 1:* 'Observation': collection and grouping of empirical materials;
(tentative) formation of hypotheses.",
"*Phase 2:* 'Induction': formulation of hypotheses.",
"*Phase 3:* 'Deduction': derivation of specific consequences
from the hypotheses, in the form of testable predictions.",
"*Phase 4:* 'Testing': of the hypotheses against new empirical materials,
by way of checking whether or not the predictions are fulfilled.",
"*Phase 5:* 'Evaluation': of the outcome of the testing procedure
with respect to the hypotheses or theories stated, as well as
with a view to subsequent, continued or related, investigations."
), file.path(project_path, "theory.txt"))
Further Formalizing the Empirical Cycle
If we compare it to the levels of theory formalization (Guest and Martin 2021), De Groot’s theory is either at the “theory” or “specification” level. It consists of a series of natural language statements. We can increase the level of formalization, and present an “implementation” in the human- and machine-readable DOT language:
theory <-
"digraph {
observation;
induction;
deduction;
test;
evaluation;
observation -> induction;
induction -> deduction;
deduction -> test;
test -> evaluation;
evaluation -> observation;
}"
This language describes the model as a directed graph. Note that the code has been organized so that the first half describes an ontology of the entities the theory postulates, and the second half describes their proposed interrelations. This follows the first two properties of good theory according to Meehl (Meehl 1990).
We can now write this implementation of the empirical cycle to a text
file, say empirical_cycle.dot
.
Adding a LICENSE File to the Repository
A license ensures that others know how they can legally reuse your work. For FAIR theory, we recommend using the CC0 (Creative Commons Zero) license, which places your work in the public domain. Add a license file to your repository:
worcs::add_license_file(path = project_path, license = "cc0")
#> ℹ Writing license file✔ Writing license file ... done
Adding a README File to the Repository
A README file describes the repository’s contents and purpose, making
it easier for others to understand and reuse your theory. The
theorytools
package contains a function to generate a
README file with appropriate sections for FAIR theory, which can be used
like so:
theorytools::add_readme_fair_theory(title = "The Empirical Cycle",
path = project_path)
#> ℹ Creating README.md✔ Creating README.md ... done
We encourage users to edit the resulting README.md
file,
in particular, to add relevant information about X-interoperability.
Adding Zenodo Metadata to the Repository
Zenodo uses metadata files to archive and index repositories. Create
a .zenodo.json
file with metadata about your theory so that
it is indexed appropriately:
add_zenodo_json_theory(
path = project_path,
title = "The Empirical Cycle",
keywords = c("philosophy of science", "methodology")
)
#> ℹ Add Zenodo metadata✔ Add Zenodo metadata ... done
Pushing These Changes to the Remote Repository
Version control requires adding files to be tracked to the repository
(gert::git_add()
), committing changes to those files
(gert::git_commit()
), and pushing them to the remote
repository (gert::git_push()
). The worcs
function worcs::git_update()
combines these three actions,
acting like a kind of “quick-save” function:
worcs::git_update("First commit of my theory", repo = project_path)
#> ℹ Identify local 'Git' repository at "C:\\Users\\vanlissa\\AppData\\Local\\Temp…✔ Identify local 'Git' repository at "C:\\Users\\vanlissa\\AppData\\Local\\Temp…
#> ℹ Adding files to staging area of 'Git' repository.✔ Adding files to staging area of 'Git' repository. ... done
#> ℹ Committed staged files to 'Git' repository.✔ Committed staged files to 'Git' repository. ... done
#> ℹ Push local commits to remote repository.✖ Push local commits to remote repository. ... failed
Check Your GitHub Repository
Navigate to your repository on GitHub and check that all committed files, including the theory file, license, README, and Zenodo metadata, are now visible in the remote repository (green box in the image below).
Furthermore, the repository visibility must be set to “Public” to ensure that Zenodo can discover and archive it. If you created the repository programmatically as shown above, it should already be public (see red box in the image above). If necessary, change the visibility setting to Public by clicking on “Settings” > “General” > “Change repository visibility.”
Login to Zenodo
Head over to zenodo.org. Zenodo is a platform where you can permanently archive your code and other project elements. Zenodo does this by assigning projects a Digital Object Identifier (DOI), which also helps to make the work more citable. This is different to GitHub, which acts as a place where the actual work on a project takes place, rather than long-term archiving of it. At GitHub, content can be modified, deleted, rewritten, and irreversibly changed, which makes it a bit concerning to be used for longer lasting referencing purposes. Zenodo offers more security and permanence for research outputs.
If you already have a Zenodo account, this is easy. If not, follow the steps to create one — you can login using your GitHub account.
Authorise GitHub to connect with Zenodo
On the Zenodo website authorise it to connect to your GitHub account in the ‘Using GitHub’ section. Here, Zenodo will redirect you to GitHub to ask for permissions to use ‘webhooks’ on your repositories. You want to authorise Zenodo here with the permissions it needs to form those links.
Select the Repository to Archive
If you have got this far, this means that Zenodo is now authorised to configure the repository webhooks that it needs to archive the repository and issue it a DOI. To do this, on the Zenodo website navigate to the GitHub repository listing page and simply “flip the switch” next to your repository. If your repository does not show up in the list, you may need to press the ‘Syncronize now’ button. At the time of writing, we noticed that it can take quite a while (hours?) for Zenodo to detect new GitHub repositories. If so, take a break or come back to this last step tomorrow!
Optional: Check repository settings
If you were successful, you have now set up a new webhook between Zenodo and your repository.
Optionally, you can verify this. In GitHub, click on the settings for your repository, and the Webhooks tab on the left hand side menu. This should display the new Zenodo webhook configured to Zenodo. Note, it may take a little time for the webhook listing to show up.
Create a New Release
To archive a repository on Zenodo, you must create a new release. You can do this using the following code:
worcs::git_release_publish(repo = project_path)
If you have not previously published any releases, this function will assume that you want to use semantic versioning for both the release tag and the release title. This means that the first release will be labeled with version number “0.1.0”. Each subsequent release will automatically increment the trailing digit, i.e.: “0.1.1”, “0.1.2”. If you make a major change to the theory, you may want to manually increment the middle digit like so:
worcs::git_release_publish(repo = project_path,
tag_name = "0.2.0",
release_name = "0.2.0")
Verify on Zenodo
To verify that your release was archived on Zenodo and assigned a DOI, you need to visit the Uploads tab.
Entering Meta-Data
We can further document our Zenodo archive as a FAIR theory by adding some extra information on Zenodo. On Zenodo click the Upload tab in the main menu, where you should find your newly uploaded repository.
Click the orange Edit
button, and verify/supply the
following information:
-
-
Is documented by
a theory paper you wrote, in which you introduce this FAIR theory -
Is derived from
an existing theory, which was published in print (paper, book chapter) but not made FAIR
-
To save these changes, click ‘Publish’.
Verifying That Zenodo Mints a DOI for Your Theory
After publishing a release, Zenodo will archive the repository and mint a DOI. Verify this by checking the Zenodo entry for your repository, where the DOI will be displayed. Include this DOI in any citations or references to your theory to enhance its discoverability and reusability.
The GitHub/Zenodo integration will assign one “mother-DOI” to the project, as well as a unique DOI to each version/release of the FAIR theory. This enables users to refer to and cite specific versions of the theory. The list of authors for the citation is automatically determined by the GitHub user account names used by the repository - this can be edited on Zenodo, as explained above. DOIs used in Zenodo are registered through the DataCite service.
Pro-tip: Check the
Citation
field on the Zenodo page, and copy-paste it into the README file of your GitHub repo to make cross-linking even easier (or refer users to the Zenodo page to find the citation, which obviates the need to manually update this information). Click the DOI bage in theDetails
field to get instructions on how to add a clear highlighted DOI badge to your GitHub repository, for users to see and make use of your DOI:
CONGRATULATIONS!
Your FAIR theory is now archived in Zenodo, and with a DOI that can be versioned to reflect updates to the repository version through time. You should be able to see details of this on the GitHub Zenodo page for your repository. This also means that your archived projects can get picked up by other indexing services and search engines that use DOIs too.
Providing a long-term archive and a DOI for your work is required for others to be able to properly cite it, as this provides basic citation metadata. For Open Science, it is important to be able to comprehensively cite the resources that you use in your research, including theory, and this workflow enables that to happen, in line with best practices. Making theory FAIR also helps elevate the standard of theory to that of the standard of other research outputs, like papers and software.
Pro-tip: Is your research funded by an EU grant? Now you can directly connect your FAIR theory to your grant by updating the grant section of the metadata on the project’s Zenodo record. This massively helps to increase its discoverability!
Optional: Everything in One Step
The function theorytools::create_fair_theory()
automates
most of the preceding steps, up to step 2.8. Assuming you have already
created a shareable theory file called theory.txt
which
resides in the currently active directory (getwd()
), you
can create your FAIR theory as follows:
create_fair_theory(
path = file.path("c:/theories", "empirical_cycle"),
title = "The Empirical Cycle, Again",
theory_file = "theory.txt",
remote_repo = "empirical_cycle2",
add_license = "cc0")
You should still complete steps 2.12 - 2.17 manually.
References
This tutorial is partly adapted from Module 5, Task 2 of Tennant et al. (2018).