2.4 Data manipulation (optional)
Now that we have the Meta-Analysis data in RStudio, let’s do a few manipulations with the data. These functions might come in handy when were conducting analyses later on.
Going back to the output of the str()
function, we see that this also gives us details on the type of column data we have stored in our data. There a different abbreviations signifying different types of data.
Abbreviation | Type | Description |
---|---|---|
num | Numerical | This is all data stored as numbers (e.g. 1.02) |
chr | Character | This is all data stored as words |
log | Logical | These are variables which are binary, meaning that they signify that a condition is either TRUE or FALSE |
factor | Factor | Factors are stored as numbers, with each number signifying a different level of a variable. A possible factor of a variable might be 1 = low, 2 = medium, 3 = high |
2.4.1 Converting to factors
Let’s look at the variable df$VIEWCAT
. This is a categorical variable, coded as a numerical one. We can have a look at this variable by typing the name of our dataset, then adding the selector $
and then adding the variable we want to have a look at.
This variable is currently a numeric vector. We want it to be a factor: That’s a categorical variable.
To convert this to a factor variable now, we use the factor()
function.
$VIEWCAT <- factor(df$VIEWCAT) df
We now see that the variable has been converted to a factor with the levels “1,” “2,” “3,” and “4”. We can assign different value labels as follows:
$VIEWCAT <- factor(df$VIEWCAT, labels = c("Rarely", "Sometimes", "Regularly", "Often")) df
2.4.2 Selecting specific cases
It may often come in handy to select certain cases for further analyses, or to exclude some studies in further analyses (e.g., if they are outliers).
To do this, we can use the []
operator to index our data.
Let’s say we want to get only the first 5 cases. We can select them like so:
1:5, ] df[
Or let’s say we only want the children younger than 36 months in the dataset. In this case, we can use boolean indexing: We create a TRUE / FALSE statement, and select the cases that are TRUE:
$AGE < 36, ] df[df
Note that this approach can be used for any other type of data and variable. We can also use it to e.g., only select studies where VIEWCAT was equal to “Often” “typical”:
$VIEWCAT == "Often", ] df[df
2.4.3 Changing cell values
Sometimes, even when preparing your data in EXCEL, you might want to change values in RStudio once you have imported your data.
To do this, we have to select a cell in our data frame in RStudio. This can be done by adding [x,y]
to our dataset name, where x signifies the number of the row we want to select, and y signifies the number of the column.
To see how this works, let’s select a variable using this command first:
8,1] df[
## [1] 8
We now see the 8th study in our dataframe, and the value of this study for Column 1 (participant ID) is displayed. Let’s say we had a typo in this name and want to have it changed. In this case, we have to give this exact cell a new value.
8,1] <- 1001 df[
Let’s check if the value has changed.
8,1] df[
## [1] 1001
You can also use this function to change any other type of data, including numericals and logicals. Only for characters, you have to put the values you want to insert in ""
.