Lab 3D: Are You Sure about That?
Lab 3D - Are you sure about that?
Directions: Follow along with the slides, completing the questions in blue on your computer, and answering the questions in red in your journal.
Confidence and intervals
-
Throughout the year, we've seen that:
– Means are used for describing the typical value in a sample or population, but we usually don't know what they are, because we can't see the entire population.
– Means of samples can be used to estimate means of populations.
– By including a margin of error with our estimate, we create an interval that increases our confidence that we've located the correct value of the population mean.
-
Today, we'll learn how we can calculate margins of error by using a method called the bootstrap.
– Which comes from the phrase, Picking yourself up by your own bootstraps.
In this lab
-
Load the built-in
atus
(American Time Use Survey) dataset, which is a survey of how a sample of Americans spent their day.– The United States has an estimated population of 327,350,075. How many people were surveyed for this particular dataset?
-
The statistical question we wish to investigate is:
What is the mean age of people older than 15 living in the United States?
-
Why is it important that the ATUS is a random sample?
-
Use our
atus
data to calculate an estimate for the average age of people older than 15 living in the U.S.
One bootstrap
-
A bootstrapped sample is when we take a random
sample()
of our original data (atus
) WITH replacement.– The
size
of the sample should be the same size as the original data. -
We can create a single bootstrapped sample for the
mean
in three steps:`1. Sample the number of the rows to use in our bootstrap.
`2.
slice
those rows from our original data into our bootstrap data.`3. Calculate the mean of our bootstrapped data.
Our first bootstrap
-
Fill in the blanks to
sample
the row numbers we'll use in our bootstrapped sample.– Be sure to re-read what a bootstrapped sample is from the previous slide to help you fill in the blanks.
– Use
set.seed(123)
before taking the sample.bs_rows <- ____(1:____, size = ____, replace = ____)
-
Use the
slice
function to create a new dataset that includes each row from oursample
.bs_atus <- slice(atus, bs_rows)
Take a look
-
Look at the values of
bs_rows
andbs_atus
. -
Write a paragraph that explains to someone that's not familiar with
R
how you createdbs_rows
andbs_atus
. Be sure to include an explanation of what the values ofbs_rows
mean and how those values are used to createbs_atus
. Also, be sure to explain what each argument of each function does.
One strap, two strap
-
Calculate the
mean
of theage
variable in yourbootstrapped
data, then use a different value ofset.seed()
to create your own, personal bootstrapped sample. Then calculate itsmean
. -
Compare this second bootstrapped sample with three other classmates and write a sentence about how similar or different the bootstrapped sample means were.
Many bootstraps
-
To use bootstrapped samples to create confidence intervals, we need to create many bootstrapped samples.
– Normally, the more bootstrapped samples we use, the better the confidence interval.
– In this lab, we'll
do()
500 bootstrapped samples. -
To make
do()
-ing 500 bootstraps easier, we'll code our 3-step bootstrap method into a function.– Open a new R script (File -> New File -> R Script) to write your function into.
Bootstrap function
-
Fill in the blank space below with the 3 steps needed to create a bootstrapped sample
mean
for ouratus
data.– Each step should be written on its own line between the curly braces.
bs_func <- function() { }
-
Highlight and Run the code you write.
Visualizing our bootstraps
-
Once your function is created, fill in the blanks to create 500 bootstrapped sample means:
bs_means <- do(____) * bs_func()
-
Create a
histogram
for your bootstrapped samples and describe the center, shape and spread of its distribution.– These bootstrapped estimates no longer estimate the average age of people in the U.S.
– Instead, they estimate how much the estimate of the average age of people in the U.S. varies.
-
In the next slide, we'll look at how we can use these bootstrapped means to create 90% confidence intervals.
Bootstrapped confidence intervals
-
To create a 90% confidence interval, we need to decide between which two ages the middle 90% of our bootstrapped estimates are contained.
-
Using your
histogram
, fill in the statement below:The lowest 5% of our estimates are below years and the highest 5% of our estimates are above years.
-
Use the
quantile()
function to check your estimates. -
Based on your bootstrapped estimates, between which two ages are we 90% confident the actual
mean
age of people living in the U.S. is contained?
On your own
-
Using your bootstrapped sample means, create a 95% confidence interval for the
mean
age of people living in the U.S. -
Why is the 95% confidence interval wider than the 90% interval?
-
Write down how you would explain what a 95% confidence interval means to someone not taking Introduction to Data Science.