Lab 3D: Are You Sure about That?
Lab 3D  Are you sure about that?
Directions: Follow along with the slides, completing the questions in blue on your computer, and answering the questions in red in your journal.
Confidence and intervals

Throughout the year, we've seen that:
– Means are used for describing the typical value in a sample or population, but we usually don't know what they are, because we can't see the entire population.
– Means of samples can be used to estimate means of populations.
– By including a margin of error with our estimate, we create an interval that increases our confidence that we've located the correct value of the population mean.

Today, we'll learn how we can calculate margins of error by using a method called the bootstrap.
– Which comes from the phrase, Picking yourself up by your own bootstraps.
In this lab

Load the builtin
atus
(American Time Use Survey) dataset, which is a survey of how a sample of Americans spent their day.– The United States has an estimated population of 327,350,075. How many people were surveyed for this particular dataset?

The statistical question we wish to investigate is:
What is the mean age of people older than 15 living in the United States?

Why is it important that the ATUS is a random sample?

Use our
atus
data to calculate an estimate for the average age of people older than 15 living in the U.S.
One bootstrap

A bootstrapped sample is when we take a random
sample()
of our original data (atus
) WITH replacement.– The
size
of the sample should be the same size as the original data. 
We can create a single bootstrapped sample for the
mean
in three steps:`1. Sample the number of the rows to use in our bootstrap.
`2.
slice
those rows from our original data into our bootstrap data.`3. Calculate the mean of our bootstrapped data.
Our first bootstrap

Fill in the blanks to
sample
the row numbers we'll use in our bootstrapped sample.– Be sure to reread what a bootstrapped sample is from the previous slide to help you fill in the blanks.
– Use
set.seed(123)
before taking the sample.bs_rows < ____(1:____, size = ____, replace = ____)

Use the
slice
function to create a new dataset that includes each row from oursample
.bs_atus < slice(atus, bs_rows)
Take a look

Look at the values of
bs_rows
andbs_atus
. 
Write a paragraph that explains to someone that's not familiar with
R
how you createdbs_rows
andbs_atus
. Be sure to include an explanation of what the values ofbs_rows
mean and how those values are used to createbs_atus
. Also, be sure to explain what each argument of each function does.
One strap, two strap

Calculate the
mean
of theage
variable in yourbootstrapped
data, then use a different value ofset.seed()
to create your own, personal bootstrapped sample. Then calculate itsmean
. 
Compare this second bootstrapped sample with three other classmates and write a sentence about how similar or different the bootstrapped sample means were.
Many bootstraps

To use bootstrapped samples to create confidence intervals, we need to create many bootstrapped samples.
– Normally, the more bootstrapped samples we use, the better the confidence interval.
– In this lab, we'll
do()
500 bootstrapped samples. 
To make
do()
ing 500 bootstraps easier, we'll code our 3step bootstrap method into a function.– Open a new R script (File > New File > R Script) to write your function into.
Bootstrap function

Fill in the blank space below with the 3 steps needed to create a bootstrapped sample
mean
for ouratus
data.– Each step should be written on its own line between the curly braces.
bs_func < function() { }

Highlight and Run the code you write.
Visualizing our bootstraps

Once your function is created, fill in the blanks to create 500 bootstrapped sample means:
bs_means < do(____) * bs_func()

Create a
histogram
for your bootstrapped samples and describe the center, shape and spread of its distribution.– These bootstrapped estimates no longer estimate the average age of people in the U.S.
– Instead, they estimate how much the estimate of the average age of people in the U.S. varies.

In the next slide, we'll look at how we can use these bootstrapped means to create 90% confidence intervals.
Bootstrapped confidence intervals

To create a 90% confidence interval, we need to decide between which two ages the middle 90% of our bootstrapped estimates are contained.

Using your
histogram
, fill in the statement below:The lowest 5% of our estimates are below years and the highest 5% of our estimates are above years.

Use the
quantile()
function to check your estimates. 
Based on your bootstrapped estimates, between which two ages are we 90% confident the actual
mean
age of people living in the U.S. is contained?
On your own

Using your bootstrapped sample means, create a 95% confidence interval for the
mean
age of people living in the U.S. 
Why is the 95% confidence interval wider than the 90% interval?

Write down how you would explain what a 95% confidence interval means to someone not taking Introduction to Data Science.