Unmasking the Truth: Diving into Hypothesis Testing
How to verify your friend’s claim?
Red Marble Proportion Estimation
library(tidyverse)
source("../utils.R")
red_marble = "🔴"
blue_marble = "🔵"
prob_red = 0.64
num_marbles = 5000
set.seed(42)
marbles = sample(c(red_marble, blue_marble), size = num_marbles,
replace = TRUE, prob = c(prob_red, 1 - prob_red))
set.seed(42)
n = 100
one_sample = sample(marbles, n)
p_hat = mean(one_sample == red_marble)
print(str_glue("Percentage of red marbles: {p_hat * 100}%"))Percentage of red marbles: 69%- “Half of the marbles in the bag are red!”
Null and Alternative Hypothesis
- Null Hypothesis \(H_0\): the parameter is equal to a specific value, \(p = 0.5\)
- Alternative Hypothesis \(H_a\)/\(H_1\): the parameter differs from the value specified by \(H_0\)
- Left-tailed: the parameter is less than the value specified by \(H_0\), \(p < 0.5\)
- Right-tailed: the parameter is greater than the value specified by \(H_0\), \(p > 0.5\)
- Two-tailed: the parameter is not equal to the value specified by \(H_0\), \(p \neq 0.5\)
Understanding Hypothesis Testing
- Assume the defendant is innocent.
- The evidence is presented.
- If the evidence strongly indicates guilt, we abandon the presumption of innocence and declare the defendant guilty.
- If not, we maintain the presumption of innocence.
Assuming Null Hypothesis Is True

68-95-99.7 Rule

Methods
- Critical value method
- \(p\)-value method
Significance Level
- The significance level (\(\alpha\)) is the probability that we use to determine whether an event is unusual or statistically significant.
- The most common significance levels are \(0.05\) and \(0.01\).
- If our test result has a probability less than or equal to this level, under the assumption that the null hypothesis is true, then the result is sufficiently unusual for us to doubt the null hypothesis.
Critical Value Method

Test Statistic Calculation
alpha = 0.05
cv = qnorm(1 - alpha / 2)
print(str_glue("Critical Values are {-round(cv, 3)} and {round(cv, 3)}."))Critical Values are -1.96 and 1.96.Since \(3.8 > 1.96\), we reject the null hypothesis at the 0.05 level and conclude that the proportion of red marbles is not 50%.
\(p\)-value Method
When using the p-value method in hypothesis testing, we calculate the p-value, which is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from our sample data, assuming that the null hypothesis is true.
\(p\)-value Method

\(p\)-value Calculation
\[X \sim N(0.5, 0.0025)\]
\[Prob(x >= 0.69) = ?\]
Since \(0.00014 < 0.05\), we reject the null hypothesis at the 0.05 level and conclude that the proportion of red marbles is not 50%.
Making Mistakes
set.seed(7595)
n = 100
one_sample = sample(marbles, n)
p_hat = mean(one_sample == red_marble)
print(str_glue("Percentage of red marbles: {p_hat * 100}%"))Percentage of red marbles: 58%Since \(-1.96 < 1.6 < 1.96\) or \(0.11 > 0.5\), we fail to reject the null hypothesis at the 0.05 level and conclude that the proportion of red marbles is 50%.
Type II error: fail to reject the null hypothesis when it is false.
Type I and II Errors
| \(H_0\) is true | \(H_0\) is false | |
|---|---|---|
| Reject \(H_0\) | Type I error | Correct decision |
| Fail to reject \(H_0\) | Correct decision | Type II error |
Interview Questions
Facebook: Say you flip a coin 10 times and observe only one head. What would be your null hypothesis and \(p\)-value for testing whether the coin is fair or not.
Let \(p\) be the probability of getting a head.
\(H_0\): \(p = 0.5\); \(H_a\): \(p \neq 0.5\)
CLT does not apply because the sample size is less than 30.
\(p\)-value: probability of observing an outcome as extreme as or more extreme than the observed data, assuming the null hypothesis is true.
\(X\sim \text{Binomial}(n=10, p=0.5)\)
\[Porb(x=0) + Prob(x = 1) + Prob(x=9) + Prob(x=10)\]
Since 0.021 < 0.05, we reject the null hypothesis at the 0.05 level and conclude that the coin is not fair.
Interview Questions
D.E. Shaw: A coin was flipped 1,000 times, and 550 times it showed heads. Do you think the coin is biased? Why or why not?
Let \(p\) be the probability of getting a head.
\(H_0\): \(p = 0.5\); \(H_a\): \(p \neq 0.5\)
Because 3.1622777 > 1.96 and 7.8270113^{-4} < 0.05, we reject the null hypothesis at the 0.05 level and conclude that the coin is not fair.