# Essential Concepts

**IDS Unit 2: Essential Concepts**

__Lesson 1: What Is Your True Color?__

__Lesson 1: What Is Your True Color?__

Students will understand that the 'typical' value is a value that can represent the entire group, even though we know that not all members of the group share the same value.

__Lesson 2: What Does Mean Mean?__

__Lesson 2: What Does Mean Mean?__

The center of a distribution is the 'typical' value. One way of measuring the center is with the mean, which finds the balancing point of the distribution. The mean gives us the typical value, but does not tell the whole story. We need a way to measure the variability to understand how observations might differ from the typical value.

__Lesson 3: Median In the Middle__

__Lesson 3: Median In the Middle__

Another measure of center is the median, which can also be used to represent the typical value of a distribution. The median is preferred for skewed distributions or when there are outliers, because it better matches what we think of as 'typical.'

__Lesson 4: How Far Is It from Typical?__

__Lesson 4: How Far Is It from Typical?__

MAD measures the variability in a sample of data - the larger the value, the greater the variability. More precisely, the MAD is the typical distance of observations from the mean. There are other measures of spread as well, notably the standard deviation and the interquartile range (IQR).

__Lesson 5: Human Boxplots__

__Lesson 5: Human Boxplots__

A common statistical question is “How does this group compare to that group?” This is a hard question to answer when the groups have lots of variability. One approach is to compare the centers, spreads, and shapes of the distributions. Boxplots are a useful way of comparing distributions from different groups when all of the distributions are unimodal (one hump).

__Lesson 6: Face Off__

__Lesson 6: Face Off__

Writing (and saying) precise comparisons between groups in which variability is present based on the (a) center, (b) spread, (c) shape, and (d) unusual outcomes help to make statements in context of the data. Actual comparison statements should use terms such as "less than," "about the same as," etc.

__Lesson 7: Plot Match__

__Lesson 7: Plot Match__

Boxplots are an alternative visualization of histograms or dot plots. They capture most, but not all, of the features we can see in a dot plot or histogram.

__Lesson 8: How Likely Is It?__

__Lesson 8: How Likely Is It?__

Probability is an area about which we humans have poor intuition. Probability measures a long-run proportion: 50% chance means the event happens 50% of the time if you repeated it forever. When we don't repeat forever, we see variability.

__Lesson 9: Bias Detective__

__Lesson 9: Bias Detective__

In the short-term, actual outcomes of chance experiments vary from what is 'ideal.' An ideal die has equally likely outcomes. But that does not mean we will see exactly the same number of one dots, two dots, etc.

__Lesson 10: Marbles, Marbles…__

__Lesson 10: Marbles, Marbles…__

There are two ways of sampling data that model real-life sampling situations: with and without replacement. Larger samples tend to be closer to the "true" probability.

__Lesson 11: This AND/OR That__

__Lesson 11: This AND/OR That__

What does "A or B" mean versus "A and B" mean? These are compound events and two-way tables can be used to calculate probabilities for them.

__Lesson 12: Don’t Take My Stress Away!__

__Lesson 12: Don’t Take My Stress Away!__

Generating statistical questions is the first step in a Participatory Sensing campaign. Research and observations help create applicable campaign questions.

__Lesson 13: The Horror Movie Shuffle__

__Lesson 13: The Horror Movie Shuffle__

We can "shuffle" data based on categorical variables. The statistic we use is the difference in proportions. The distribution we form by shuffling represents what happens if chance were the only factor at play. If the actual observed difference in proportions is near the center of this shuffling distribution, then we would conclude that chance is a good explanation for the difference. But if it is extreme (in the tails or off the charts), then we should conclude that chance is NOT to blame. Sometimes, the apparent difference between groups is caused by chance.

__Lesson 14: The Titanic Shuffle__

__Lesson 14: The Titanic Shuffle__

We can also "shuffle" data based on numerical variables. The statistic we use is the difference in means. The distribution we form by this form of shuffling still represents what happens if chance were the only factor at play. When differences are small, we suspect that they might be due to chance. When differences are big, we suspect they might be 'real.'

__Lesson 15: Tangible Data Merging__

__Lesson 15: Tangible Data Merging__

We can enhance the context of a statistical problem by merging related data sets together. To merge data, each data set must have a "unique identifier" that tells us how to match up the lines of the data.

__Lesson 16: What Is Normal?__

__Lesson 16: What Is Normal?__

The Normal curve, also called the Gaussian distribution and the "bell curve," is a model that describes many real-life distributions and is usually called the Normal Model.

__Lesson 17: A Normal Measure of Spread__

__Lesson 17: A Normal Measure of Spread__

The standard deviation is another measure of spread. This is commonly used by statisticians because of its role in common models and distributions, such as the Normal Model.

__Lesson 18: Shuffling with Normal__

__Lesson 18: Shuffling with Normal__

Z-scores allow us a way to measure how extreme a value is, regardless of the units of measurement. Usually, z-scores will range between -3 and +3, and so values that are at or more extreme than -3 or +3 standard deviations are considered large.