MetaForest uses a weighted random forest to explore heterogeneity in meta-analytic data. MetaForest is a wrapper for ranger (Wright & Ziegler, 2015). As input, MetaForest takes the study effect sizes and their variances (these can be computed, for example, using the metafor package), as well as the moderators that are to be included in the model. By default, MetaForest uses random-effects weights, and estimates the between-studies variance using a restricted maximum-likelihood estimator. However, it may be beneficial to first conduct an unweighted MetaForest, and then use the estimated residual heterogeneity from this model as the estimate of tau2 for a random-effects weighted MetaForest.

MetaForest(
  formula,
  data,
  vi = "vi",
  study = NULL,
  whichweights = "random",
  num.trees = 500,
  mtry = NULL,
  method = "REML",
  tau2 = NULL,
  ...
)

Arguments

formula

Formula. Specify a formula for the MetaForest model, for example, yi ~ . to predict the outcome yi from all moderators in the data. Only additive formulas are allowed (i.e., x1+x2+x3). Interaction terms and non-linear terms are not required, as the random forests algorithm inherently captures these associations.

data

A data.frame containing the effect size, moderators, and the variance of the effect size.

vi

Character. Specify the name of the column in the data that contains the variances of the effect sizes. This column will be removed from the data prior to analysis. Defaults to "vi".

study

Character. Optionally, specify the name of the column in the data that contains the study id. Use this when the data includes multiple effect sizes per study. This column can be a vector of integers, or a factor. This column will be removed from the data prior to analysis. See Details for more information about analyzing dependent data.

whichweights

Character. Indicate what time of weights are required. A random-effects MetaForest is grown by specifying whichweights = "random". A fixed-effects MetaForest is grown by specifying whichweights = "fixed". An unweighted MetaForest is grown by specifying whichweights = "unif". Defaults to "random".

num.trees

Atomic integer. Specify the number of trees in the forest. Defaults to 500.

mtry

Atomic integer. Number of candidate moderators available for each split. Defaults to the square root of the number moderators (rounded down).

method

Character. Specify the method by which to estimate the residual variance. Can be set to one of the following: "DL", "HE", "SJ", "ML", "REML", "EB", "HS", or "GENQ". Default is "REML". See the metafor package for more information about these estimators.

tau2

Numeric. Specify a predetermined value for the residual heterogeneity. Entering a value here supersedes the estimated tau2 value. Defaults to NULL.

...

Additional arguments are passed directly to ranger. It is recommended not to use additional arguments.

Value

List of length 3. The "forest" element of this list is an object of class "ranger", containing the results of the random forests analysis. The "rma_before" element is an object of class "rma.uni", containing the results of a random-effects meta-analysis on the raw data, without moderators. The "rma_after" element is an object of class "rma.uni", containing the results of a random-effects meta-analysis on the residual heterogeneity, or the difference between the effect sizes predicted by MetaForest and the observed effect sizes.

Details

For dependent data, a clustered MetaForest analysis is more appropriate. This is because the predictive performance of a MetaForest analysis is evaluated on out-of-bootstrap cases, and when cases out of the bootstrap sample originate from the same study, the model will be overly confident in its ability to predict their value. When the MetaForest is clustered by the study variable, the dataset is first split into two cross-validation samples by study. All dependent effect sizes from each study are thus included in the same cross-validation sample. Then, two random forests are grown on these cross-validation samples, and for each random forest, the other sample is used to calculate prediction error and variable importance, see doi:10.1007/s11634-016-0276-4 .

Examples

#Example 1:
#Simulate data with a univariate linear model
set.seed(42)
data <- SimulateSMD()
#Conduct unweighted MetaForest analysis
mf.unif <- MetaForest(formula = yi ~ ., data = data$training,
                      whichweights = "unif", method = "DL")
#Print model
mf.unif
#> Call:
#> MetaForest(formula = yi ~ ., data = data$training, whichweights = "unif", 
#>     method = "DL")
#> 
#> R squared (OOB):                  0.2768 
#> Residual heterogeneity (tau2):    0.0120 
#Conduct random-effects weighted MetaForest analysis
mf.random <- MetaForest(formula = yi ~ ., data = data$training,
                        whichweights = "random", method = "DL",
                        tau2 = 0.0116)
#Print summary
summary(mf.random)
#> MetaForest results
#>                                          
#> Type of analysis:              MetaForest
#> Number of studies:             20        
#> Number of moderators:          5         
#> Number of trees in forest:     500       
#> Candidate variables per split: 2         
#> Minimum terminal node size:    5         
#> OOB prediction error (MSE):    0.0925    
#> R squared (OOB):               0.3576    
#> 
#> Tests for Heterogeneity: 
#>                                tau2   tau2_SE I^2     H^2    Q-test  df Q_p   
#> Raw effect sizes:              0.0553 0.0486  37.2642 1.5940 30.2857 19 0.0483
#> Residuals (after MetaForest):  0.0000 0.0302  0.0000  1.0000 18.9321 19 0.4612
#> 
#> 
#> Random intercept meta-analyses:
#>                                Intercept se     ci.lb   ci.ub   p     
#> Raw effect sizes:              -0.2136   0.0875 -0.3851 -0.0421 0.0147
#> Residuals (after MetaForest):  0.0327    0.0680 -0.1005 0.1660  0.6300

#Example 2: Real data from metafor
#Load and clean data
data <- dat.bangertdrowns2004
data[, c(4:12)] <- apply(data[ , c(4:12)], 2, function(x){
  x[is.na(x)] <- median(x, na.rm = TRUE)
  x})
data$subject <- factor(data$subject)
data$yi <- as.numeric(data$yi)
#Conduct MetaForest analysis
mf.bd2004 <- MetaForest(formula = yi~ grade + length + minutes + wic+
                               meta, data, whichweights = "unif")
#Print MetaForest object
mf.bd2004
#> Call:
#> MetaForest(formula = yi ~ grade + length + minutes + wic + meta, 
#>     data = data, whichweights = "unif")
#> 
#> R squared (OOB):                  0.0887 
#> Residual heterogeneity (tau2):    0.0672 
#Check convergence plot
plot(mf.bd2004)

#Check summary
summary(mf.bd2004, digits = 4)
#> MetaForest results
#>                                          
#> Type of analysis:              MetaForest
#> Number of studies:             48        
#> Number of moderators:          5         
#> Number of trees in forest:     500       
#> Candidate variables per split: 2         
#> Minimum terminal node size:    5         
#> OOB prediction error (MSE):    0.1426    
#> R squared (OOB):               0.0887    
#> 
#> Tests for Heterogeneity: 
#>                                tau2   tau2_SE I^2     H^2    Q-test   df Q_p   
#> Raw effect sizes:              0.0499 0.0197  58.3740 2.4023 107.1061 47 0.0000
#> Residuals (after MetaForest):  0.0672 0.0238  65.3486 2.8859 129.8299 47 0.0000
#> 
#> 
#> Random intercept meta-analyses:
#>                                Intercept se     ci.lb   ci.ub  p     
#> Raw effect sizes:              0.2219    0.0460 0.1317  0.3122 0.0000
#> Residuals (after MetaForest):  -0.0361   0.0502 -0.1345 0.0624 0.4726
#Examine variable importance plot
VarImpPlot(mf.bd2004)