Lab 2F - The Titanic Shuffle
Lab 2F - The Titanic Shuffle
Directions: Follow along with the slides, completing the questions in blue on your computer, and answering the questions in red in your journal.
Previously ...
-
In the previous lab, we learned that by using a
do
-loop and theshuffle
function, we could simulate randomly shuffling our data many times.- This helps us determine how likely it is that a difference between groups is due to chance.
-
For this lab, we will extend these ideas to numerical variables by using random shuffling and numerical summaries.
-
The question we will investigate in this lab is:
Is there any evidence to suggest that those who survived paid a higher fare than those who died?
-
We will consider wealthier passengers to be those that paid a higher
fare
for their ticket.
The Titanic
-
The Titanic was a ship that sank en route to the U.S.A. from England after hitting an Iceberg in 1912.
– At the time, it was claimed that the Titanic was unsinkable ... it wasn't ... because it did.
-
Use the
data
function to load thetitanic
passenger and survival data. -
Create a boxplot of the
fare
s paid by passengers and facet the plot based on whether the passenger survived or not.– Based on the plot, do you believe that passengers who paid a higher fare on the Titanic were more likely to survive? Explain why and describe how certain you are of being correct.
The search begins!
-
Start your analysis by calculating how much more the typical survivor paid than the typical non-survivor in our data.
-
Based on the distributions of fares paid, which numerical summary that describes the typical value might be preferred?
-
What was the typical fare paid by survivors? Non-survivors? How much more did the typical survivor pay?
Do the shuffle!
-
Use the
do
and theshuffle
functions to shuffle the passenger's survival status 500 times.– Use the previous lab if you need some help on how to do this.
– For each shuffle, compute each group's
median
fare paid.–
Assign
your shuffled data the nameshuffled_survival
. -
After shuffling your data, use the
mutate
function to create a variable called diff which is themedian
fare of survivors minus themedian
fare of non-survivors.- Assign your mutated data the name
shuffled_survival
again.
- Assign your mutated data the name
Put your simulations to use
-
By using your shuffled data, answer the research question we posed at the beginning of the lab.
Is there any evidence to suggest that those who survived paid a higher fare than those who died?
-
Write up your answer as a statistical analysis. Create a plot and explain how the plot supports your conclusion. Be sure to also explain why shuffling your data is important.
Comparing Mean Fares
-
What about if instead of calculating the median fare price for each group after a shuffle, we calculated the mean fare price and took the difference (mean_survivor – mean_victim).
-
If we did this 500 times, what do you predict the distribution of differences will look like?
-
Use the
do
and theshuffle
functions to shuffle the passenger’s survival status 500 times.-
For each shuffle, compute each group’s mean fare paid.
-
After shuffling your data, use the
mutate
function to create a variable calleddiff
which is the mean fare of survivors minus the mean fare of non-survivors.
-
-
What does the shuffled data reveal? Does the answer to the research question below change when using the mean fares instead of the median fares?
Is there any evidence to suggest that those who survived paid a higher fare than those who died?