8 Philosophy of Science

Students of statistics often have the impression that this course is all about cold, hard facts. Nothing could be further from the truth: statistics is a direct extension of philosophy of science. The numbers we calculate here only have relevance to real-world research questions thanks to some complex philosophical arguments. While most of these are outside the scope of the present course, we want you to be familiar with the main concepts before moving on.

8.0.1 Deduction and Induction in Hypothesis Testing

Hypothesis testing has roots in the philosophy of logic, or correct reasoning. In logic, arguments consist of a set of premises, which can be true or false, that together lead to a conclusion, which can also be true or false. The most famous example is:

Premise: All men are mortal

Premise: Socrates is a man

Conclusion: Therefore, Socrates is mortal.

This is a deductive argument, which has the property that if the premises are true, then the conclusion must also be true. Deductive arguments are also often thought of as arguments from the general to the specific. In this case, the general rule “all men are mortal” gives rise to the specific claim that one specific man, Socrates, is also mortal.

The counterpart to deductive reasoning is inductive reasoning, which proceeds from specific observations or claims to general rules. The most famous example is:

Premise: All swans I have ever seen are white

Conclusion: Therefore, all swans are white.

Unlike the deductive argument above, however, inductive reasoning does not guarantee that true premises always produce true conclusions. Even if it is true that I have only seen white swans, the conclusion is false - black swans exist. David Hume introduced another famous example:

Premise: The sun has risen in the east every morning up until now.

Conclusion: The sun will also rise in the east tomorrow.

Here, too, the conclusion is not supported by the premise. This might make us feel uncomfortable - which sane person would reject the conclusion that the sun will rise in the east tomorrow? This discomfort illustrates that people are naturally inclined to reason inductively. As scientists, however, we should be very cautious to remember that this way of reasoning is not guaranteed to produce true conclusions.

Inductive and deductive reasoning are both used in statistical hypothesis testing. Deduction is used when we derive a specific prediction (hypothesis) from a general theory. For example, we could say that:

Premise: More time spent studying causes better grades.

Conclusion: In my dataset, time spent studying should correlate positively with grades.

If the premise is true, then we would expect to observe the corresponding pattern in our data.

Induction comes into play when we draw general conclusions from observed data.

Premise: I observed a positive correlation between time spent studying and grades in my dataset

Conclusion: Therefore, time spent studying causes better grades.

This conclusion does not logically follow from the premise. The problem is not resolved by removing the word “causes”:

Premise: I observed a positive correlation between time spent studying and grades in my dataset

Conclusion: Therefore, time spent studying correlates positively with grades in the general population of students.

It follows that we can never conclusively “prove” general conclusions from specific observations. The philosopher David Hume wrote extensively about this “problem of induction”: How can we justify the assumption that unobserved cases will follow the same patterns as observed ones? Hume argued that this assumption cannot be logically justified. Our sense that generalization is justified might be based on intuition, but not logic. The problem of induction challenges the very foundation of science. We cannot escape the use of induction when seeking to learn general insights from specific observations, but Hume showed that induction lacks a purely rational justification.

8.0.2 Falsificationism

It follows from the problem of induction that it is impossible to definitively prove a theory to be true. No matter how much evidence I have observed that supports a theory (white swans), all it takes is one refuting observation (black swan) to reject is. Karl Popper sought to avoid the problem of induction by devising a scientific method that relies exclusively on deduction: falsificationism. Popper demarcated the distinction between “pseudo-science” and science by arguing that “[scientific] statements […] must be capable of conflicting with […] observations” (Popper 1962, 39). The core business of science, according to Popper, should be to try to reject theories. Note that - while Popper’s work has been heavily criticized (for good reasons), his work is very influential in social science. Therefore, Popper is a good starting point for our course - even though he probably should not be the endpoint for students with a genuine interest in philosophy of science.

8.0.3 Falsificationism and Hypothesis Testing

The idea of falsificationism has been very influential in applied statistics in the social sciences, particularly in the practice of “Null-Hypothesis Significance Testing” (NHST). In NHST, researchers proceed as follows:

Develop a testable proposition about a population parameter; for example:
- “On average, my students understand the course material. Their mean grade is \(\mu \geq 6\)”.
- “On average, there is a positive association between hours studied and grade, \(\rho \geq 0\)”.
Develop a second hypothesis whose sole purpose is to be rejected, to pay lip service to falsificationism. Call this the “null hypothesis”. The “null hypothesis” is often taken to be the exact opposite of the researcher’s true belief, or “alternative hypothesis”:
- “On average, my students DO NOT understand the course material. Their mean grade is \(\mu < 6\)”.
- “On average, there is NO positive association between hours studied and grade, \(\rho < 0\)”.
Execute a procedure to make a decision to reject (falsify) or not reject the null hypothesis (next chapters).
If the null hypothesis is rejected, act as if this finding supports the alternative hypothesis.

As argued by Andrew Gelman in this blog post, this approach only pays lip service to falsificationism. A true falsificationist would put their true belief (alternative hypothesis) to the test. Fake falsificationism is creating a meaningless, “straw man” null-hypothesis, whose sole purpose is to be rejected.

8.0.4 Moving Forward

Where does this leave us? The most important point is that there are important limitations to commonly used methods, including null-hypothesis significance testing. There is no real satisfying solution. Just keep in mind that no statistical test can give evidence in support of a theory or hypothesis; neither a null nor an alternative hypothesis.

8.0.5 Causality

Another crucial philosophical issue relevant for statistics is the question of causality. Scientists are often interested in causal questions. We assume causality, for example, any time we want to act on knowledge derived from scientific research. For example, say that I do find a strong correlation between hours studied and grade obtained. If, based on this finding, you increase your study hours in order to improve your grade - then you are assuming causality. The same applies for governments making evidence-based policy, companies adjusting sales strategies based on customer analytics, or drugs that replenish a particular neurotransmitter that has been found lacking in patients with a specific diagnosis.

Despite the fact that causality is so important in scientific research, it is rarely defined. Contemporary definitions are typically based on counterfactuals: A is a cause of B if B would not have happened if A had not happened. This definition has important limitations, but it is sufficient for our course (Halpern, 2015).

Most people have heard the phrase “correlation does not imply causation”. What does this mean? One important misunderstanding is that the correlation coefficient is an inappropriate statistic for investigating causal research questions. This is not the case. This phrase warns us that, just because observe a statistical association between variables X and Y (for example, a correlation of \(r = .43\)), that does not mean that X caused Y. Importantly, observing this correlation is consistent with a causal effect of X on Y - but also with other explanations. For example, maybe Y caused X, or maybe a third variable caused both X and Y, and that is why they are correlated.

The problem is related to the previous section: observing a pattern in data that is consistent with a causal association between X and Y cannot conclusively prove that X caused Y, no matter what statistic you use to describe the pattern.

So where does causality come from? The short answer is: from theory or methodology. Causality can be assumed on theoretical grounds, or established using a randomized controlled experiment. In such an experiment, researchers randomly assign participants to either an experimental condition (e.g., receiving a drug, instruction, treatment, et cetera), or a control condition (e.g., receiving a placebo, no instruction, non-effective treatment, et cetera). The random assignment should, theoretically, result in two groups with no systematic differences that could explain between-group differences in the outcome of interest. Of course, due to pure chance, it could happen that there are systematic differences (more men in one group, taller people in one group, et cetera). But there is no procedure that has a better chance of resulting in comparable groups than random assignment.

In the social sciences, experiments are not always feasible or ethical, so researchers often use observational data. Does this preclude all causal claims? It does not. It is perfectly legitimate to present a theory that predicts a causal effect in the Introduction section of your scientific writing, and then present empirical data that show a pattern consistent with that effect. For example, if I assume that hours studied causes improved grades, a correlation of \(r = .43\) is consistent with that assumption. An alternative explanation might be that having receiving high grades in the past has motivated some students to study harder. Just make sure not to take the observed data as evidence for a causal effect.

What is required to argue that X has a causal effect on Y? Several philosophers have addressed this issue; most notably Hume and Mill (see Morabia, 2013). The necessary conditions for causality are sometimes summarized as association, temporal precedence, and non-spuriousness. Below, each is supported with quotes from Hume (1902), which were selected by Aaron Peikert:

Association: Cause and effect must be associated (this could be statistical association)
- “When one particular species of event has always […] been conjoined with another, we […] call the one object, Cause; the other, Effect.” (VII, II, 59)
- “familiar objects or events to be constantly conjoined together” (V, I, 35) There must be a “constant union” between cause and effect, and they must be “contiguous in space and time” (Hume 1739,16, pp. 173–175).
Temporal precedence: The cause must occur before the effect.
- observe a continual succession of objects, and one event following another (V, I, 35)
- “[when] the same object is always followed by the same event; we then begin to entertain the notion of cause and connexion.” (VII, II, 61)
Non-spuriousness: All alternative explanations of the effect are excluded.
- “Their conjunction may be arbitrary and casual. There may be no reason to infer the existence of one from the appearance of the other.” (describing spuriousness; V, I, 35)
- “we may define a cause to be an object, followed by another, and where all the objects similar to the first are followed by objects similar to the second. Or in other words where, if the first object had not been, the second never had existed.” (VII, II, 61)

This latter definition resembles the aforementioned counterfactual definition of causality. Note that this definition does not require randomized experimentation, but randomized experiments do help us meet all three criteria. The field of causal inference focuses on developing methods that can estimate causal effects (like you would get from a randomized controlled experiment) from observational data (Pearl, 2009).

8.1 Lecture

TO DO

8.3 Tutorial

8.3.1 Assignment 1: Induction and Deduction

Below are two texts like you might find in a social science publication. Carefully read both, and highlight examples of inductive- and deductive reasoning. Then discuss with your group mates:

Did you all highlight the same examples? Did you disagree about any of them?
Did you find examples of induction and deduction in both texts? Is one mode of reasoning more prominent in one text than the other?
Are all inductive and deductive inferences warranted? Do the fragments show sufficient awareness of the potential limitations of these modes of reasoning? Or conversely, are they too careful?

Fragment A:

Using a cross‑sectional survey of adolescents from six urban schools (N = 2,184), we examined associations between evening screen time (self‑reported minutes after 8 p.m.) and sleep quality (Pittsburgh Sleep Quality Index). Across schools and controlling for grade and gender, greater evening screen time was consistently associated with poorer sleep quality, r = .30, 95% CI [.25, .33]. Subgroup analyses by device type (phone vs. tablet) and by extracurricular workload yielded similar patterns. While these observational data cannot establish causation, finding a moderate effect across all schools suggests that higher evening screen exposure is related to adolescents’ diminished sleep quality. Reducing evening device use could help improve adolescents’ sleep outcomes.

Fragment B:

Social norms theory posits that behavior is shaped by perceptions of typical peer behavior. We thus hypothesized that providing households with descriptive norm feedback about neighborhood electricity use would reduce individual electricity consumption, relative to a neutral informational control. We preregistered H1: households receiving monthly comparative reports will exhibit lower kWh usage over three billing cycles than controls. The null hypothesis H0 was that there was no difference between the experimental and control conditions. In a randomized field experiment (N = 3,042 households), treatment households received reports comparing their usage to that of “similar homes”, alongside efficiency tips; control households received tips only. Mixed-effects models with random intercepts for household indicated a 2.4% reduction in usage for households in the experimental condition, \(\beta\) = −0.024, SE = 0.007, p < .001. We rejected the null hypothesis of no difference. These results are consistent with H1, indicating that social norms theory is a relevant framework for understanding household electricity consumption.

8.3.2 Assignment X: Causality

Below are five examples of statements from social scientific papers, courtesy of Calvin Isch.

First, read all the examples, and sort them into causal claims/non-causal claims. Discuss with your groupmates: did you classify each claim the same way? What makes the difference for you?
With your group, choose one of these claims and examine the associated paper. Discuss:
1. Do you still think the claim is causal/noncausal?
2. Would a causal claim be justified in this case?
3. Why/why not?

Example 1:

Violence exposure hampers compromise among Israelis, emphasizing the importance of abstaining from violence for conflict resolution.

https://doi.org/10.1093/pnasnexus/pgae581

Example 2:

We investigate both the role of gender and feminism in friends-with-benefits (FWB) relationships at a United States college, and ask whether identification with feminist ideology impacts students’ motivations and assessments of their relationships.

https://link.springer.com/article/10.1007/s12119-014-9252-3

Exampe 3:

This distrust of atheists is driven by religious predictors, social location, and broader value orientations.

https://doi.org/10.1177/000312240607100203

Example 4:

we show that being in a dual-career household increases one’s willingness and lowers the perceived risk of leaving their job and joining a startup venture—especially if the household prioritizes their spouse’s career.

https://doi.org/10.1002/smj.3481

Example 5:

We find that those who live in regions with a greater share of migrants from Eastern Europe have more positive attitudes towards the EU but that this positive influence diminishes in highly segregated areas.

https://doi.org/10.1080/13501763.2023.2271504