7.1 Heterogeneity statistics

There are three types of heterogeneity measures which are commonly used to assess the degree of heterogeneity. In the following examples, \(k\) denotes the individual study, \(K\) denotes all studies in our meta-analysis, \(\hat \theta_k\) is the estimated effect of \(k\) with a variance of \(\hat \sigma^{2}_k\), and \(w_k\) is the individual weight of the study (i.e., its inverse variance: \(w_k = \frac{1}{\hat \sigma^{2}_k}\); see infobox in Chapter 5.1.1 for more details).

1. Cochran’s Q

Cochran’s Q-statistic is the difference between the observed effect sizes and the fixed-effect model estimate of the effect size, which is then squared, weighted and summed (a sort of weighted standard deviation around the fixed-effect summary effect).

\[ Q = \sum\limits_{k=1}^K w_k (\hat\theta_k - \frac{\sum\limits_{k=1}^K w_k \hat\theta_k}{\sum\limits_{k=1}^K w_k})^{2}\]

2. Higgin’s & Thompson’s I2

\(I^{2}\) (Higgins and Thompson 2002) is the percentage of variability in the effect sizes which is not caused by sampling error. It is derived from \(Q\):

\[I^{2} = max \left\{0, \frac{Q-(K-1)}{Q} \right\}\]

3. Tau-squared

\(\tau^{2}\) is the between-study variance in our meta-analysis. It is an estimate of the variance of the underlying distribution of true effect sizes. As we show in Chapter 5.2.1, there are various proposed ways to calculate \(\tau^{2}\).

Which measure should i use?

Generally, when we assess and report heterogeneity in a meta-analysis, we need a measure which is robust, and not to easily influenced by statistical power.

Cochran’s Q increases both when the number of studies (\(k\)) increases, and when the precision (i.e., the sample size \(N\) of a study) increases. Therefore, \(Q\) and weather it is significant highly depends on the size of your meta-analysis, and thus its statistical power. We should therefore not only rely on \(Q\) when assessing heterogeneity.

I2 on the other hand, is not sensitive to changes in the number of studies in the analyses. \(I^2\) is therefore used extensively in medical and psychological research, especially since there is a “rule of thumb” to interpret it (Higgins et al. 2003):

  • I2 = 25%: low heterogeneity
  • I2 = 50%: moderate heterogeneity
  • I2 = 75%: substantial heterogeneity

Despite its common use in the literature, \(I^2\) not always an adequate measure for heterogeneity either, because it still heavily depends on the precision of the included studies (Rücker et al. 2008; Borenstein et al. 2017). As said before, \(I^{2}\) is simply the amount of variability not caused by sampling error. If our studies become increasingly large, this sampling error tends to zero, while at the same time, \(I^{2}\) tends to 100% simply because the single studies have greater \(N\). Only relying on \(I^2\) is therefore not a good option either.

Tau-squared, on the other hand, is insensitive to the number of studies, and the precision. Yet, it is often hard to interpret how relevant our tau-squared is from a practical standpoint.




References

Borenstein, Michael, Julian PT Higgins, Larry V Hedges, and Hannah R Rothstein. 2017. “Basics of Meta-Analysis: I2 Is Not an Absolute Measure of Heterogeneity.” Research Synthesis Methods 8 (1). Wiley Online Library: 5–18.

Higgins, Julian PT, and Simon G Thompson. 2002. “Quantifying Heterogeneity in a Meta-Analysis.” Statistics in Medicine 21 (11). Wiley Online Library: 1539–58.

Higgins, Julian PT, Simon G Thompson, Jonathan J Deeks, and Douglas G Altman. 2003. “Measuring Inconsistency in Meta-Analyses.” BMJ: British Medical Journal 327 (7414). BMJ Publishing Group: 557.

Rücker, Gerta, Guido Schwarzer, James R Carpenter, and Martin Schumacher. 2008. “Undue Reliance on I 2 in Assessing Heterogeneity May Mislead.” BMC Medical Research Methodology 8 (1). BioMed Central: 79.