Lecture 8
NC State University
ST 511 - Fall 2024
2024-09-11
– Keep up with Slack
– AE Solutions Posted
– Quiz 2 Solutions Posted
– HW-2 Posted Today ~ 5:00 (due Monday 23rd)
– HW-1 Solutions (today; grades soon)
Quiz 3 - Question 3
The Central Limit Theorem states that we always need to have at least a sample size of at least n = 30 for the sampling distribution of a sample statistic to be approximately normal.
\(\geq 30\) vs \(> 30\)
The literature is sloppy (and apparently so am I)
Both is fine. I’m going to be more consistent with \(> 30\)
Notation check
\(\hat{p}\)
\(\mu\)
\(\bar{x}\)
\(\pi\)
Notation check
\(\hat{p}\) - sample proportion
\(\mu\) - population mean
\(\bar{x}\) - sample mean
\(\pi\) - population proportion
Suppose you are a researcher interested in studying food habits for NC State students. You take a random sample of 100 students, and are interested in if they had Howling Cow ice creme with their dinner or now. We observed 37 students have ice creme, and 63 students not have ice creme.
– What is our random variable (X)?
– What type of variable are we working with?
Suppose you are a researcher interested in studying food habits for NC State students. You take a random sample of 100 students, and are interested in if they have Howling Cow ice creme with their dinner or now. We observed 37 students have ice creme, and 63 students not have ice creme.
We know that the Central Limit Theorem works with means (under certain conditions). Can X be represented as a mean?
\[ X = \begin{cases} 1 & \text{if yes}\\ 0 & \text{if no} \end{cases} \]
1, 0, 1, 1, 0, 1, 0, 1, …..
The mean of a binary outcome is just the sample proportion
– Set up your null and alternative hypothesis
– Collect data
– Quantify evidence
– Articulate your results
Suppose the Howling Cow company claims that 50% of NC State students eat Howling Cow at dinner. We think that less than 50% of students eat Howling Cow at dinner.
\(H_o\): - The assumption that is made
\(H_a\) - The research question
\(H_o\): \(\pi\) = 0.5
\(H_a\) \(\pi\) < 0.5
\(>\)
\(\neq\)
You take a random sample of 100 students, and are interested in if they have Howling Cow ice creme with their dinner or now. We observed 37 students have ice creme, and 63 students not have ice creme.
What is the proper notation for our sample statistic?
You take a random sample of 100 students, and are interested in if they had Howling Cow ice creme with their dinner or now. We observed 37 students have ice creme, and 63 students not have ice creme.
What is the proper notation for our sample statistic?
\(\hat{p}\) = \(\frac{37}{100}\)
A null distribution is a sampling distribution under the assumption of the null hypothesis. It describes how that statistic would vary from sample to sample, if the null hypothesis were true.
We can use theory or simulation techniques to model this distribution if our assumptions hold!
Assumptions for single proportion
– Independence (this never goes away)
– Sample size (a little different than with means)
– Need to think about the number of successes + failures
– Need to think about creating a sampling distribution under the assumption of the null hypothesis
– \(\pi = 0.5\)
So we ask ourselves, if the null is true, how many successes and failures would we expect to see out of our sample size of 100?
n = 100
Success = 0.5
100*.5 = 50
Failure = 0.5
100*.5 = 50
The null value will be the center
We can estimate the standard error of the null sampling distribution as so:
\[ \sqrt{\frac{\pi_o * (1-\pi_o)}{n}} \]
Often, with theory-based methods, we use a standardized statistic. A standardized statistic is computed by subtracting the mean of the null distribution from the original statistic and dividing by the standard error
\[ \text{statistic} = \frac{statistc - null}{SE(null)} \]
– The standardized statistic has a nice definition
– Helps compare across multiple studies with different units
\[ Z = \frac{.37 - .5}{.05} = -2.6 \]
Interpretation - Interpreting this value, we can say that our sample proportion of 0.37 is 2.6 standard errors below the null value of 0.50.
The reason you can use a z-test with proportions is because the standard deviation of a proportion is a function of the proportion itself. With proportions, we assume the standard deviation is known exactly.
Under an assumption of some null hypothesis, would you expect to observe this statistic?
Under an assumption of some null hypothesis, would you expect to observe this statistic?
Under an assumption of some null hypothesis, would you expect to observe this statistic?
A p-value is the probability of observing our sample statistic, or something more extreme given our null hypothesis is true
Let’s fill in the bold with our context!
A p-value is the probability of observing .37, or something smaller given the true proportion of all NC State students that eat Howling Cow ice creme at dinner is 50%
pnorm()
is used to calculate probabilities from a Normal distribution
arguments:
– quantile (z-statistic of -2.6)
– mean
– sd
– lower.tail = [TRUE/FALSE]
\(\alpha\) (also called significance level), acts as a “cut-off” for how much evidence we need to reject the null hypothesis.
– In practice, this value is the % of times we make the incorrect decision
– This is decided before the study
– Typically 0.1, 0.05, or 0.01
We will use 0.05 for our study.
– Decision: Reject / Fail to reject our null hypothesis
– Conclusion: Weak / Strong evidence to conclude our alternative hypothesis
– Interpretation: Use the definition
The null distribution can be estimated through simulation (simulation-based methods), or can be modeled by a mathematical function (theory-based methods)
Let’s now talk about simulation (and practice with it in R)
– Simulation studies started to become popular as technology advanced
– Still need an independence assumption
– Less strict sample size assumption (no exact cut-off values)
- Typically > 5 success + failures (categorical)
- sample size > 10-15 (quantitative)