Lab 2F - The Titanic Shuffle

Directions: Follow along with the slides, completing the questions in blue on your computer, and answering the questions in red in your journal.

Previously ...

In the previous lab, we learned that by using a do-loop and the shuffle function, we could simulate randomly shuffling our data many times.
- This helps us determine how likely it is that a difference between groups is due to chance.
For this lab, we will extend these ideas to numerical variables by using random shuffling and numerical summaries.
The question we will investigate in this lab is:

Is there any evidence to suggest that those who survived paid a higher fare than those who died?
We will consider wealthier passengers to be those that paid a higher fare for their ticket.

The Titanic

The Titanic was a ship that sank en route to the U.S.A. from England after hitting an Iceberg in 1912.

– At the time, it was claimed that the Titanic was unsinkable ... it wasn't ... because it did.
Use the data function to load the titanic passenger and survival data.
Create a boxplot of the fares paid by passengers and facet the plot based on whether the passenger survived or not.

– Based on the plot, do you believe that passengers who paid a higher fare on the Titanic were more likely to survive? Explain why and describe how certain you are of being correct.

The search begins!

Start your analysis by calculating how much more the typical survivor paid than the typical non-survivor in our data.
Based on the distributions of fares paid, which numerical summary that describes the typical value might be preferred?
What was the typical fare paid by survivors? Non-survivors? How much more did the typical survivor pay?

Do the shuffle!

Use the do and the shuffle functions to shuffle the passenger's survival status 500 times.

– Use the previous lab if you need some help on how to do this.

– For each shuffle, compute each group's median fare paid.

– Assign your shuffled data the name shuffled_survival.
After shuffling your data, use the mutate function to create a variable called diff which is the median fare of survivors minus the median fare of non-survivors.
- Assign your mutated data the name shuffled_survival again.

Put your simulations to use

By using your shuffled data, answer the research question we posed at the beginning of the lab.

Is there any evidence to suggest that those who survived paid a higher fare than those who died?
Write up your answer as a statistical analysis. Create a plot and explain how the plot supports your conclusion. Be sure to also explain why shuffling your data is important.

Comparing Mean Fares

What about if instead of calculating the median fare price for each group after a shuffle, we calculated the mean fare price and took the difference (mean_survivor – mean_victim).
If we did this 500 times, what do you predict the distribution of differences will look like?
Use the do and the shuffle functions to shuffle the passenger’s survival status 500 times.
- For each shuffle, compute each group’s mean fare paid.
- After shuffling your data, use the mutate function to create a variable called diff which is the mean fare of survivors minus the mean fare of non-survivors.
What does the shuffled data reveal? Does the answer to the research question below change when using the mean fares instead of the median fares?

Is there any evidence to suggest that those who survived paid a higher fare than those who died?