Unmasking the Truth: Diving into Hypothesis Testing

How to verify your friend’s claim?

Red Marble Proportion Estimation

library(tidyverse)
source("../utils.R")

red_marble = "🔴"
blue_marble = "🔵"

prob_red = 0.64
num_marbles = 5000

set.seed(42)
marbles = sample(c(red_marble, blue_marble), size = num_marbles, 
                 replace = TRUE, prob = c(prob_red, 1 - prob_red))

set.seed(42)
n = 100
one_sample = sample(marbles, n)
p_hat = mean(one_sample == red_marble)
print(str_glue("Percentage of red marbles: {p_hat * 100}%"))

Percentage of red marbles: 69%

“Half of the marbles in the bag are red!”

Null and Alternative Hypothesis

Null Hypothesis \(H_0\): the parameter is equal to a specific value, \(p = 0.5\)
Alternative Hypothesis \(H_a\)/\(H_1\): the parameter differs from the value specified by \(H_0\)
- Left-tailed: the parameter is less than the value specified by \(H_0\), \(p < 0.5\)
- Right-tailed: the parameter is greater than the value specified by \(H_0\), \(p > 0.5\)
- Two-tailed: the parameter is not equal to the value specified by \(H_0\), \(p \neq 0.5\)

Understanding Hypothesis Testing

Assume the defendant is innocent.
The evidence is presented.
If the evidence strongly indicates guilt, we abandon the presumption of innocence and declare the defendant guilty.
If not, we maintain the presumption of innocence.

Assuming Null Hypothesis Is True

p_red = 0.5
n_marbles = 5000

set.seed(42)
m = sample(c(red_marble, blue_marble), size = n_marbles, 
            replace = TRUE, prob = c(p_red, 1 - p_red))

samples = get_samples(m, n, 1000, is_red)
plot_sample_means(samples, n, p_red)

68-95-99.7 Rule

normal_plot(mean = p_red, sd = sqrt(p_red * (1 - p_red) / n))

Methods

Critical value method
\(p\)-value method

Significance Level

The significance level (\(\alpha\)) is the probability that we use to determine whether an event is unusual or statistically significant.
The most common significance levels are \(0.05\) and \(0.01\).
If our test result has a probability less than or equal to this level, under the assumption that the null hypothesis is true, then the result is sufficiently unusual for us to doubt the null hypothesis.

Critical Value Method

Test Statistic Calculation

alpha = 0.05
cv = qnorm(1 - alpha / 2)
print(str_glue("Critical Values are {-round(cv, 3)} and {round(cv, 3)}."))

Critical Values are -1.96 and 1.96.

z_score = (p_hat - p_red) / sqrt(p_red * (1 - p_red) / n)
print(str_glue("z-score: {round(z_score, 3)}"))

z-score: 3.8

Since \(3.8 > 1.96\), we reject the null hypothesis at the 0.05 level and conclude that the proportion of red marbles is not 50%.

\(p\)-value Method

When using the p-value method in hypothesis testing, we calculate the p-value, which is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from our sample data, assuming that the null hypothesis is true.

\(p\)-value Method

\(p\)-value Calculation

\[X \sim N(0.5, 0.0025)\]

\[Prob(x >= 0.69) = ?\]

p_val = (1 - pnorm(p_hat, mean = 0.5, sd = 0.05)) * 2
print(str_glue("p-value: {p_val}"))

p-value: 0.000144696087850171

Since \(0.00014 < 0.05\), we reject the null hypothesis at the 0.05 level and conclude that the proportion of red marbles is not 50%.

Making Mistakes

set.seed(7595)
n = 100
one_sample = sample(marbles, n)
p_hat = mean(one_sample == red_marble)
print(str_glue("Percentage of red marbles: {p_hat * 100}%"))

Percentage of red marbles: 58%

z_score = (p_hat - p_red) / sqrt(p_red * (1 - p_red) / n)
print(str_glue("z-score: {round(z_score, 3)}"))

z-score: 1.6

p_val = (1 - pnorm(p_hat, mean = 0.5, sd = 0.05)) * 2
print(str_glue("p-value: {round(p_val, 3)}"))

p-value: 0.11

Since \(-1.96 < 1.6 < 1.96\) or \(0.11 > 0.5\), we fail to reject the null hypothesis at the 0.05 level and conclude that the proportion of red marbles is 50%.

Type II error: fail to reject the null hypothesis when it is false.

Type I and II Errors

	\(H_0\) is true	\(H_0\) is false
Reject \(H_0\)	Type I error	Correct decision
Fail to reject \(H_0\)	Correct decision	Type II error

Interview Questions

Facebook: Say you flip a coin 10 times and observe only one head. What would be your null hypothesis and \(p\)-value for testing whether the coin is fair or not.

Let \(p\) be the probability of getting a head.

\(H_0\): \(p = 0.5\); \(H_a\): \(p \neq 0.5\)

CLT does not apply because the sample size is less than 30.

\(p\)-value: probability of observing an outcome as extreme as or more extreme than the observed data, assuming the null hypothesis is true.

\(X\sim \text{Binomial}(n=10, p=0.5)\)

\[Porb(x=0) + Prob(x = 1) + Prob(x=9) + Prob(x=10)\]

p_val = (choose(10, 1) * 0.5^1 * 0.5^9 + choose(10, 0) * 0.5^0 * 0.5^10) * 2
print(str_glue("p-value: {p_val}"))

p-value: 0.021484375

Since 0.021 < 0.05, we reject the null hypothesis at the 0.05 level and conclude that the coin is not fair.

Interview Questions

D.E. Shaw: A coin was flipped 1,000 times, and 550 times it showed heads. Do you think the coin is biased? Why or why not?

Let \(p\) be the probability of getting a head.

\(H_0\): \(p = 0.5\); \(H_a\): \(p \neq 0.5\)

z_score = (0.55 - 0.5) / sqrt(0.5 * (1 - 0.5) / 1000)
print(str_glue("z-score: {round(z_score, 3)}"))

z-score: 3.162

p_val = (1 - pnorm(0.55, 0.5, sqrt(0.5 * (1 - 0.5) / 1000)))
print(str_glue("p-value: {round(p_val, 3)}"))

p-value: 0.001

Because 3.1622777 > 1.96 and 7.8270113^{-4} < 0.05, we reject the null hypothesis at the 0.05 level and conclude that the coin is not fair.