Lesson 14: The Titanic Shuffle
Lesson 14: The Titanic Shuffle
Objective:
Students will continue to understand that, just by chance, we will see differences between two groups. They will understand that these differences are usually small.
Materials:

Advanced preparation required (see Step 8 of lesson)

Poster paper

Markers
Essential Concepts:
Essential Concepts:
We can also "shuffle" data based on numerical variables. The statistic we use is the difference in medians. The distribution we form by this form of shuffling still represents what happens if chance were the only factor at play. When differences are small, we suspect that they might be due to chance. When differences are big, we suspect they might be 'real.'
Lesson:

Remind students that they previously learned how to determine if a difference is due to chance by shuffling based on categorical variables (gender and survival).

Display the dotplot created during Lesson 13 of the difference in proportions between female and survivors of horror movies. Remind the students that, "by chance," the differences were typically zero . Most of the time, they were pretty small. Sometimes they were bigger, but that was rare and this tells us that if we see "small" differences, we might think they are due to chance. But if we see "big" differences, they are not.

Lead a short discussion about what students think small and big differences mean. Make sure they answer in units (which are percentage points for the horror movie data). So, for example, a "big" difference might be 5 percentage points (but don't let them just say "5").

Inform students that, during today’s lesson, they will learn how to determine if there is a difference between groups when a numerical variable is involved.

In particular, they will assume the roles of passengers in the Titanic for today’s lesson. In case some students may not know about the Titanic, ask a volunteer to share what he/she knows.

Explain that, at its time, the Titanic was the largest cruise ship ever built and was declared to be unsinkable. However, on its first voyage, it sank and was one of the worst maritime disasters in history. About 40% of passengers survived; however, your chances of survival depended very much on your age, gender, and wealth.

Inform the students that we are going to look at whether the amount of money a passenger paid for his/her cabin (the fare price) had anything to do with whether or not he/she survived.

Each student will need a strip from the LMR_Titanic Strips file—see below for instructions.
Advanced preparation required:
The Titanic Strips LMR contains data from 40 actual passengers on the titanic. Each strip represents the data from one passenger: the left hand side shows the fare paid and right hand side contains the survival information of that passenger after the collision. Cut the LMR into strips such that the fare price is attached to the survivor status for each of the 40 observations.

40 strips were created for large classes. If your class has less than 40 students, assign the students to two groups such that roughly 40% of them are in the survivor group (15/40 = 37.5% ≈ 40%), and the rest are in the victim group. If your class is small (smaller than 10), then put the students in two equal sized groups. The split does not have to be exactly 40%.

Inform the smaller group that they are the survivors and distribute a survivor strip with its corresponding fare to each student. Set aside any leftover strips. Tell them that the price on the strip represents the amount of money paid for their ticket to board the Titanic. Notify them that $20 in 1912 is worth about $500 today.

Divulge to the larger group that they, unfortunately, are the victims and distribute a victim strip with its corresponding fare to each student. Set aside any leftover strips.

Ask each group to create a dotplot of their fare prices on a poster. Lead a quick discussion comparing the two dotplots visually. Then, ask each group to calculate the median fare for their group.

As a class, find the difference between the median fares for the two groups.
median of “Survivor” fares – median of “Victim” fares
For example:
If all 15 survivor cards and all 25 victim cards are used, the difference is medians would be:
$26.00 – $13.00 = $13.00

Explain that one of the controversies of the Titanic disaster was that some people felt that the rich people were given better access to the lifeboats than were the poor, so rich people were more likely to survive. Note that the data represented on the fare cards are only a subset of the actual Titanic data, which had over 800 passengers. However, the data were randomly selected from the real data and are considered representative of the 800 passengers.

In pairs, ask students to discuss the following:
 Based on the data from our dotplots, do you think rich people were more likely to survive? In other words, did passengers who paid more for their tickets have a better chance of survival? Yes, there is evidence that rich passengers survived more often than poorer passengers. The median difference between the fare prices of the survivors and the victims is $13.00 (see Step 13). Most survivors had higher fare prices than the victims, so the distribution of survivor fares is shifted to the right and is more rightskewed.

Share out a couple of responses with the whole class.

Have students tear their strip such that they separate the fare from the outcome (survivor or victim). Collect only the outcomes and randomly shuffle them. Students will keep their fare.
Distribute the shuffled outcome strips face down to the students. Once everyone has a new outcome strip, ask students to turn their outcome strip over and regroup based on their new survival status.

Ask the students:

Why do we shuffle the survivor/victim strips and not the fare strips? We want to know if the price someone paid for his/her ticket affects whether or not he/she survived. So, when we shuffle, we assume that fare price has nothing to do with survival, so the prices should be irrelevant.

What do you think the median fare difference of our shuffled groups will be? The median fare difference of the shuffled groups should be close to 0, meaning that there should be NO difference in fare price for the survivors and the victims. Everyone would have the same chances of surviving, regardless of their ticket price.


Have each group calculate the median fare price for their new groups. Then, ask:
 Do you think this difference, of ___ dollars, is real or due to chance? Answers will vary by class. Since the data were shuffled, any difference should be due to chance.

On the board, create a table to display the median fare prices for each group, and include a column for the difference (median “Survivor” fare – median “Victim” fare). Fill in the table with the values the students found in Step 13. Note: The first row has been filled in with the example data from above BEFORE the shuffles have taken place.
Median Fare Price of Survivors Median Fare Price of Victims Difference in Medians (Survivors  Victims) $26.00 $13.00 $26.00  $13.00 = $13.00 ? 
Note that values in the “Difference in Medians” column can be positive or negative because sometimes the survivors will pay more for their tickets, and other times the victims will pay more for their tickets.

Draw a dotplot on the board labeled “Difference in Medians.” Include a vertical line at $13.00 (or whatever value was calculated in Step 13 by the class) to represent the actual difference in the median fare prices between the survivors and the victims (see example below).

Using the information from Steps 19 and 20, place a dot at the corresponding value for the shuffled data’s difference in medians. Ask the students:
 How does this difference compare to the actual difference of $13.00 (from Step 13)? Answers will vary by class. Most likely, the difference in medians will be much smaller than $13.00. In fact, the difference in medians will be centered around 0.

Remind students that small differences might be due to chance and big differences typically mean that there is a “real” difference between groups. In this case, a big difference might mean that the rich passengers were more likely to survive. And a small difference might mean that survival was just a matter of plain luck.

Repeat Steps 17 – 23 a few more times (depending on how much class time you have available).

In pairs, ask students to discuss whether they think the real difference in median fare prices they calculated in Step 13 ($13.00 if all cards were used) is small or large. Answers will vary by class. Guide students to look at the MAD value of the distribution of differences in median fares.

Explain that one way that we can decide what is “large” or “small” is by creating cutoff values that we think are too far away from the center of the distributions of differences. In general, we can assign a rule that states that any difference in median fare prices that is greater than 2 MAD values above or below the median is considered unusual. This means that any value in the outer edges of the plot would indicate that a passenger’s ticket price impacted his/her chances of survival.

Inform students that they will use RStudio to shuffle the actual Titanic data of all 800 passengers during the next class and can decide if the difference in survival rates of rich passengers and poor passengers was real, or just due to chance.
Class Scribes:
One team of students will give a brief talk to discuss what they think the 3 most important topics of the day were.
Homework & Next Day
For the next 3 days, students will collect data for the Stress/Chill campaign either through the UCLA IDS UCLA App or via web browser at https://portal.idsucla.org