Lab 3D: Are You Sure about That?
Lab 3D - Are you sure about that?
Directions: Follow along with the slides, completing the questions in blue on your computer, and answering the questions in red in your journal.
Confidence and intervals
- 
Throughout the year, we've seen that: – Means are used for describing the typical value in a sample or population, but we usually don't know what they are, because we can't see the entire population. – Means of samples can be used to estimate means of populations. – By including a margin of error with our estimate, we create an interval that increases our confidence that we've located the correct value of the population mean. 
- 
Today, we'll learn how we can calculate margins of error by using a method called the bootstrap. – Which comes from the phrase, Picking yourself up by your own bootstraps. 
In this lab
- 
Load the built-in atus(American Time Use Survey) dataset, which is a survey of how a sample of Americans spent their day.– The United States has an estimated population of 327,350,075. How many people were surveyed for this particular dataset? 
- 
The statistical question we wish to investigate is: What is the mean age of people older than 15 living in the United States? 
- 
Why is it important that the ATUS is a random sample? 
- 
Use our atusdata to calculate an estimate for the average age of people older than 15 living in the U.S.
One bootstrap
- 
A bootstrapped sample is when we take a random sample()of our original data (atus) WITH replacement.– The sizeof the sample should be the same size as the original data.
- 
We can create a single bootstrapped sample for the meanin three steps:`1. Sample the number of the rows to use in our bootstrap. `2. slicethose rows from our original data into our bootstrap data.`3. Calculate the mean of our bootstrapped data. 
Our first bootstrap
- 
Fill in the blanks to samplethe row numbers we'll use in our bootstrapped sample.– Be sure to re-read what a bootstrapped sample is from the previous slide to help you fill in the blanks. – Use set.seed(123)before taking the sample.bs_rows <- ____(1:____, size = ____, replace = ____)
- 
Use the slicefunction to create a new dataset that includes each row from oursample.bs_atus <- slice(atus, bs_rows)
Take a look
- 
Look at the values of bs_rowsandbs_atus.
- 
Write a paragraph that explains to someone that's not familiar with Rhow you createdbs_rowsandbs_atus. Be sure to include an explanation of what the values ofbs_rowsmean and how those values are used to createbs_atus. Also, be sure to explain what each argument of each function does.
One strap, two strap
- 
Calculate the meanof theagevariable in yourbootstrappeddata, then use a different value ofset.seed()to create your own, personal bootstrapped sample. Then calculate itsmean.
- 
Compare this second bootstrapped sample with three other classmates and write a sentence about how similar or different the bootstrapped sample means were. 
Many bootstraps
- 
To use bootstrapped samples to create confidence intervals, we need to create many bootstrapped samples. – Normally, the more bootstrapped samples we use, the better the confidence interval. – In this lab, we'll do()500 bootstrapped samples.
- 
To make do()-ing 500 bootstraps easier, we'll code our 3-step bootstrap method into a function.– Open a new R script (File -> New File -> R Script) to write your function into. 
Bootstrap function
- 
Fill in the blank space below with the 3 steps needed to create a bootstrapped sample meanfor ouratusdata.– Each step should be written on its own line between the curly braces. bs_func <- function() { }
- 
Highlight and Run the code you write. 
Visualizing our bootstraps
- 
Once your function is created, fill in the blanks to create 500 bootstrapped sample means: bs_means <- do(____) * bs_func()
- 
Create a histogramfor your bootstrapped samples and describe the center, shape and spread of its distribution.– These bootstrapped estimates no longer estimate the average age of people in the U.S. – Instead, they estimate how much the estimate of the average age of people in the U.S. varies. 
- 
In the next slide, we'll look at how we can use these bootstrapped means to create 90% confidence intervals. 
Bootstrapped confidence intervals
- 
To create a 90% confidence interval, we need to decide between which two ages the middle 90% of our bootstrapped estimates are contained. 
- 
Using your histogram, fill in the statement below:The lowest 5% of our estimates are below years and the highest 5% of our estimates are above years. 
- 
Use the quantile()function to check your estimates.
- 
Based on your bootstrapped estimates, between which two ages are we 90% confident the actual meanage of people living in the U.S. is contained?
On your own
- 
Using your bootstrapped sample means, create a 95% confidence interval for the meanage of people living in the U.S.
- 
Why is the 95% confidence interval wider than the 90% interval? 
- 
Write down how you would explain what a 95% confidence interval means to someone not taking Introduction to Data Science.