Utrecht University, dept. Methodology & Statistics

Opening

In my Stats 1 class, many exercises describe situations like:

Grades are normally distributed, with \(\mu = 6, \sigma = 2\)

A student said: This is a stupid question, but are grades normally distributed?

Distribution of grades

Actual grades of the 347 students:

Camel plot

Types of latent variable analyses

Observed variables
Latent variable Continuous Categorical
Continuous Factor analysis IRT
Categorical Mixture model Latent class analysis

Slide with Plot

Basic idea behind mixture modeling

  • Assume that population consists of K subpopulations
  • Model data as a function of (unknown) class membership
  • Simplest “model” is just the mean
    • Estimate variances / covariances
    • Different model for different classes

Applications

Analysis goals:

  • Test theory about categorical latent variable (e.g., identity status)
  • Classify individuals
  • Identify number of classes
  • Identify most discriminant items
  • Predict class membership from covariates
  • Predict outcomes from class membership

Challenges: “correct” number of classes?

Approach:

  • Run the same model for a range of possible classes, e.g., 1:6
  • Examine fit statistics

Pitfalls:

  • Not estimating one-class model
  • Using entropy as a fit index
    • It signifies class separation
  • Cherry-picking fit statistics
    • Report all; lack of consensus could indicate problem
    • Weighted decision using AHP

Introducing tidyLPA

  • User-friendly software that implements best practices
  • Free means, variances, and covariances
  • Estimate and compare solutions for the number of classes
  • Beautiful visualizations

Sample workflow, tidyverse-style

library(tidyLPA)
grades %>%
  estimate_profiles(1:2)
## tidyLPA analysis using mclust: 
## 
##   Model Classes     AIC     BIC Entropy prob_min prob_max n_min n_max BLRT_p
## 1     1       1 1585.41 1593.11    1.00     1.00     1.00  1.00  1.00       
## 2     1       2 1545.82 1561.22    0.74     0.88     0.96  0.49  0.51   0.01

Sample workflow, tidyverse-style

grades %>%
  estimate_profiles(1:2) %>%
  compare_solutions()
## Warning: The solution with the minimum number of classes under consideration
## was considered to be the best solution according to one or more fit indices.
## Examine your results with care; consider adding a smaller number of classes.
## Warning: The solution with the maximum number of classes under consideration
## was considered to be the best solution according to one or more fit indices.
## Examine your results with care and consider estimating more classes.
## Compare tidyLPA solutions:
## 
##  Model Classes BIC     
##  1     1       1593.111
##  1     2       1561.222
## 
## Best model according to BIC is Model 1 with 2 classes.
## 
## An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 1 with 2 classes.

Sample workflow, bit more complex

grades %>%
  estimate_profiles(1:2, 
                    variances = c("equal", "varying"),
                    covariances = c("zero", "zero"))

For advanced users…

estimate_profiles calls Mclust(), or mplusModeler().

Additional arguments are passed to these functions.

To compare solutions

grades %>%
  estimate_profiles(1:2) %>%
  compare_solutions(statistics = c("AIC", "BIC", "KIC"))

AHP

Akogul & Erisoglu, 2017

Applied Saaty’s (1978) AHP to a range of model fit indices

  • AIC, AWE, BIC, CLC, KIC
  1. Decision matrix: n (classes) x m (fit indices)
  2. Pairwise comparison matrix for each m (fit index); degree of preference for i classes over j classes
  3. Determine relative importance vector (RIV) for each fit index (based on Eigenvector decomp)
  4. Determine composite relative importance vector (C-RIV) across indices, weighted by relative importance

Relative importance based on performance recovering true number of classes in simulations

Good practices for visualization

Show model parameters

  • Estimated mean
  • Estimated variance (if parameter)

Show parameter uncertainty

  • Confidence interval/band

Show classification uncertainty

  • Show how well classes capture raw data

If number of classes is is exploratory…

  • Compare several class solutions

Some examples

Some examples

Some examples

What stands out?

These visualizations…

  • Show only estimated mean (sometimes even unweighted mean of subgroup)
  • \(\sigma\) not visualized, even when it is a parameter
  • No SE’s
  • No classification uncertainty; treat class membership as known
  • Hide raw data, impossible to see whether classes are distinctive

Show model parameters

  • Estimated class mean (or trajectory)
  • Estimated variance (if parameter)

Show parameter uncertainty

  • Confidence interval/band

Show classification uncertainty

  • Show how well classes capture raw data
  • Show classification / transition probabilities

If number of classis is exploratory…

  • Compare several class solutions

Better profile plot

grades %>%
  estimate_profiles(1:2) %>%
  plot_profiles

Mixture densities plot

grades %>%
  estimate_profiles(1:2) %>%
  plot_density

Better profile plot

id_edu[, c("com3", "exp3")] %>%
  estimate_profiles(1:4) %>%
  plot_profiles()

Mixture densities plot

id_edu[, c("com3", "exp3")] %>%
  estimate_profiles(1:4) %>%
  plot_density()

Longitudinal extentions

Growth mixture models

Describe heterogeneity in developmental trajectories

  • Capture individual trajectories with latent growth model
  • Each individual follows a trajectory over time, described by latent intercept, slope, etc.
  • Apply LCA to latent growth variables

Analysis goals:

  • Identify classes that are characterized by different developmental trajectories

Example

# Load MplusAutomation
library(MplusAutomation)

# Do the latent class growth analysis
createMixtures(classes = 1:4, 
               filename_stem = "growth", 
               model_overall = "i s q | ec1@0 ec2@1 ec3@2 ec4@3 ec5@4 ec6@5;
                                i@0;  s@0;  q@0",
               rdata = empathy[, 1:6],
               ANALYSIS = "PROCESSORS = 2;")

runModels(filefilter = "growth")

results_growth <- readModels(filefilter = "growth")

Summary table

mixtureSummaryTable(results_growth)

Published example - not good

Visualization

plotGrowthMixtures(results_growth, rawdata = TRUE)

Latent transition analysis

Describe state changes

  • Individuals are in a hidden state at each time point (class membership)
    • Identity Status Theory: Achievement, Moratorium, Diffusion, Foreclosure
  • Capture these states with mixture model
  • Model transition matrix over time

Example: Latent transition analysis, identity

results_id <- estimate_profiles(id_edu[, c("com3", "exp3")], 
                                n_profiles = 1:4)
results_id

Example: Latent transition analysis, identity

results_id2 <- estimate_profiles(id_edu[, c("com5", "exp5")], 
                                n_profiles = 1:4)
results_id2

Do latent transition analysis

createMixtures(
  classes = 4,
  filename_stem = "lta",
  model_overall = "c2 ON c1;",
  model_class_specific = c(
  "[com3] (com{C});  [exp3] (exp{C});",
  "[com5] (com{C});  [exp5] (exp{C});"
  ),
  rdata = id_edu[, c("com3", "exp3", "com5", "exp5")]
)

Do latent transition analysis

runModels(filefilter = "lta")
lta_id <- readModels("lta_2_class.out")

Some examples

Some examples

Plot the latent transition matrix

plotLTA(lta_id)

Take home message

Good practices:

  • Compare several class solutions
  • Show estimated points/trajectory
  • Show confidence interval/band
  • Show how well classes capture raw data
  • Show classification / transition probabilities

Estimation

Mixture model estimated using EM algorithm

Two things must be estimated:

  1. \(\theta\): Model parameters for each class
  2. \(P(Z_i = k|X_i, \theta)\): The probability that individual i belongs to each of the classes k, given their observed data X and the model parameters

The problem is that these two are interdependent

  • Calculations for \(\hat{\theta}\) (means, variances, covariances in each class) are weighted by \(P(Z_i = k|X_i,\theta)\)
  • Probability of class membership depends on that class’ parameters

Estimation 2

So, we go back and forth:

  1. E-step:
    Maximize \(P(Z_i = k|X_i,\theta)\) for some starting values, \(\theta^0\)
  2. M-step:
    Estimate \(\hat{\theta}\), using the estimate of \(P(Z_i = k|X_i,\theta)\) from the E-step
  3. Plug esimate of \(\hat{\theta}\) in to E-step, repeat until convergence.

Back to presentation