13 GLM-III: Binary Predictors – Statistics 1 and 2

13.1 Lecture

13.2 Formative Test

A formative test helps you assess your progress in the course, and helps you address any blind spots in your understanding of the material. If you get a question wrong, you will receive a hint on how to improve your understanding of the material.

Complete the formative test ideally after you’ve seen the lecture, but before the lecture meeting in which we can discuss any topics that need more attention

Question 1

Dummy coding allows regression to include binary predictors by assigning numerical values to each category, estimating the mean of the reference category, and testing the difference between categories.

Question 2

The slope coefficient (b) in regression with a binary predictor represents the difference in means between the two categories, indicating how much the dependent variable changes when the binary predictor changes from 0 to 1.

Question 3

The assumption of linearity is not relevant, because the difference between two binary values of the predictor is linear by definition.

Question 4

Levene’s test checks the assumption of equality of variances in both groups for the independent samples t-test.

Question 5

The independent samples t-test and the t-test of the slope in regression with a binary predictor are equivalent tests that compare means between two independent groups.

Question 6

The p-value indicates the probability of observing a group difference at least as extreme as the one observed, assuming that the null hypothesis is true.

Question 7

Cohen’s d is an effect size that standardizes the difference between group means by the (pooled) standard deviation, making it interpretable on a meaningful scale.

Question 8

Assumption checks can alert you that important assumptions of the test are violated, but you should not blindly adapt analyses based on their results either - particularly in confirmatory research. You can always perform a sensitivity analysis in which you report both the planned analysis and the robust version.

Question 9

Cohen’s D = mean difference/pooled sd.

Question 10

The slope tells you how much the predicted value goes up for a 1-unit increase in the predictor D. Since D is coded 0 and 2, a 1-unit increase only gets you halfway!

13.3 In SPSS

13.3.1 Independent Samples t-test

As a t-test and as regression with a dummy predictor:

13.4 Tutorial

13.4.1 Independent Samples T-Test

In this assignment we will use the data file 5groups.sav. Download the file and open it in SPSS.

This time, we will compare the means of the variable y of two specific groups: group 1 and group 4. To test the difference between two sample means, we will use the t-test for independent samples.

What is the null hypothesis of this test? And what is the alternative hypothesis?

\(H_0: \mu_1=\mu_4\), against \(H_1: \mu_1\neq \mu_4\)

Create the necessary syntax for the t-test that compares the means of group 1 and group 4.

You can find the dialog for the two-sample t-test under Analyze > Compare Means > Independent Samples T Test

In the SPSS dialog you have to specify which two groups you want to compare. In our case, it’s group 1 and group 4. After placing the variable in the box named “Grouping Variable”, click the button named “Define Groups” to define the groups.

Compare your syntax to the correct syntax:

T-TEST GROUPS=group(1 4) /MISSING=ANALYSIS /VARIABLES=y /CRITERIA=CI(.95).

One of the assumptions of the independent samples t-test is homoscedasticity (equal variances for all levels of the predictor). We can compare the sizes of the variances of the two groups with a simple F-test, which we call Levene’s test.

Have a look at Levene’s test and try to interpret it. Discuss with your group what null-hypothesis is being tested here.

What is the p-value of the Levene’s test?

What do you conclude from this? What’s the practical use of the outcome of this test?

Levene’s test is not significant. Remember that the null hypothesis of Levene’s test is that the population variances of the group are equal. As the p-value is not significant, we cannot reject the null hypothesis. Consequently, there is no evidence that the population variances of two groups are unequal. Thus, there is no reason to question the assumption.

Now you will have to decide on the outcome of the actual t-test. SPSS reports two versions: one that assumes equal variances (top row) and one that relaxes this assumption (bottom row).

You should pick one of these. In principle, you should decide which one you will use before seeing the results - although if there is clear evidence of violation of assumptions, you might want to discuss in your report whether the results change if you use the robust version (bottom row).

For now remember: we assume equal variances.

What is the two-sided p-value?

Do you reject the null hypothesis of this t-test at alpha 0.05?

13.4.2 Regression with dummies

We will now perform the exact same analysis, but with regression and dummies.

To test the difference between group 1 and group 4, we first create a dummy variable to distinguish these two groups. Use group 1 as reference category. You can use either Transform -> Recode into different variables, or syntax:

RECODE group (1=0) (4=1) INTO dgroup4.
EXECUTE.

Note that all other groups are coded as missing on this variable, which is exactly what we want!

We will use regression to perform our t-test. The hypothesis is the same as in the previous assignment, but you could also rewrite it in terms of regression coefficient(s). What is the null hypothesis of this test in terms of regression coefficient(s)? And what is the alternative hypothesis?

\(H_0: \beta_{group1 vs group2}=0\) which is the same as \(H_0: \mu_{group1} = \mu_{group2}\), versus \(H_1: \beta_{group1 vs group2} \neq 0\) which is the same as \(H_0: \mu_{group1} \ne \mu_{group2}\)

Create the necessary syntax for a regression with the dummy variable that compares the means of group 1 and group 4.

You can find the dialog under Analyze > Regression > Linear

In the SPSS dialog you have to specify the Dependent and Independent variable. In our case, the independent variable is the dummy we created.

Compare your syntax to the correct syntax:

REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10)
  /NOORIGIN 
  /DEPENDENT y
  /METHOD=ENTER dgroup4.

Note that, unlike the t-test interface, the regression interface does not provide a Levene’s test. This is one reason you might want to use the t-test interface. The regression interface provides a more generic way to test the assumption of homoscedasticity: a residual plot.

Go back through the regression interface, but this time click the Plots button and plot the predicted value (X = ZPRED) against the residual value (Y = ZRESID).

Your syntax will now say:

REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10)
  /NOORIGIN 
  /DEPENDENT y
  /METHOD=ENTER dgroup4
  /SCATTERPLOT=(*ZRESID ,*ZPRED)

If the assumption of homoscedasticity is met, we should see that the dots in this plot are equally distributed around the zero line for all values on the X-axis. In this case, we see much narrower spread on the right side than on the left side.

What can you conclude from this, and does it match your conclusion from Levene’s test?

Now you will have to decide on the outcome of the actual t-test.

Remember that the t-test of the dummy variable should be the same as the t-test we conducted before. Verify that this is true.

What is the two-sided p-value?

We see one more t-test: for the “(Constant)” or intercept. How do we interpret this?

The mean difference between both Groups is 7, and this differs significantly from zero.The mean value in Group 4 is 7, and this differs significantly from zero.The mean value across both Groups is 7, and this differs significantly from zero.The mean value in Group 1 is 7, and this differs significantly from zero.