13.1 Understanding random forests

MetaForest is an adaptation of the random forests algorithm (see Strobl, Malley, and Tutz 2009) for meta-analysis. Random forests are a powerful machine learning technique, with several advantages. Firstly, random forests are robust to overfitting. Secondly, they are a non-parametric technique, which means that they can easily capture non-linear relationships between the moderator and effect size, or even complex, higher-order interactions between moderators. Thirdly, random forests perform variable selection, identifying which moderators contribute most strongly to the effect size found.

The random forest algorithm combines many tree models. A tree model can be conceptualized as a decision tree, or a flowchart: The model recursively splits the data into groups with maximally similar values on the outcome variable, the study effect size. The splitting decisions are based on the moderator variables. Starting with the full dataset, the model first finds the moderator variable, and the value on that variable, along which to split the dataset. It chooses the moderator and value that result in the most homogenous post-split groups possible. This process is repeated for each post-split group; over and over again, until a stopping criterion is reached. Usually, the algorithm is stopped when the post-split groups contain a minimum number of cases.

One advantage of regression trees is that it does not matter if the number of moderators is large relative to the sample size, or even exceeds it. Secondly, trees are non parametric; they do not assume normally distributed residuals or linearity, and intrinsically capture non-linear effects and interactions. These are substantial advantages when performing meta-analysis on a heterogeneous body of literature. Single regression trees also have a limitation, however, which is that they are extremely prone to overfitting. They will simply capture all patterns in the data, both genuine effects and random noise (Hastie, Tibshirani, and Friedman 2009).

Random forests overcome this limitation of single regression trees. First, many different bootstrap samples are drawn (say 1000). Then, a single tree is grown on each bootstrap sample. To ensure that each tree learns something unique from the data, only a small random selection of moderators is made available to choose from at each splitting point. Finally, the predictions of all tree models are averaged. This renders random forests robust to overfitting: Because each tree captures some of the true patterns in the data, and overfits some random noise that is only present in its bootstrap sample, overfitting cancels out on aggregate. Random forests also make better predictions: Where single trees predict a fixed value for each “group” they identify in the data, random forests average the predictions of many trees, which leads to smoother prediction curves.

13.1.1 Meta-analytic random forests

To render random forests suitable for meta-analysis, a weighting scheme is applied to the bootstrap sampling, which means that more precise studies exert greater influence in the model building stage (Van Lissa 2017). These weights can be uniform (each study has equal probability of being selected into the bootstrap sample), fixed-effects (studies with smaller sampling variance have a larger probability of being selected), or random-effects based (studies with smaller sampling variance have a larger probability of being selected, but this advantage is diminished as the amount of between-studies heterogeneity increases). Internally, metaforest relies on the ranger R-package; a fast implementation of the random forests in C++.

13.1.2 Tuning parameters

Like many machine learning algorithms, random forests have several “tuning parameters”: Settings that might influence the results of the analysis, and whose optimal values must be determined empirically. The first is the number of candidate variables considered at each split of each tree. The second is the minimum number of cases that must remain in a post-split group within each tree. The third is unique to MetaForest; namely, the type of weights (uniform, fixed-, or random-effects). The optimal values for these tuning parameters are commonly determined using cross-validation (Hastie, Tibshirani, and Friedman 2009). Cross-validation means splitting the dataset many times, for example, into 10 equal parts. Then, predictions are made for each of the parts of the data, using a model estimated on all of the other parts. This process is conducted for all possible combinations of tuning parameters. The values of tuning parameters that result in the lowest cross-validated prediction error are used for the final model. For cross-validation, metaforest relies on the well-known machine learning R-package caret.

References

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second. New York: Springer.

Strobl, Carolin, James Malley, and Gerhard Tutz. 2009. “An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests.” Psychological Methods 14 (4): 323–48. https://doi.org/10.1037/a0016973.

Van Lissa, Caspar J. 2017. “MetaForest: Exploring Heterogeneity in Meta-Analysis Using Random Forests.” Open Science Framework, September. https://doi.org/10.17605/OSF.IO/KHJGB.