# Essential Concepts

**IDS Unit 4: Essential Concepts**

__Lesson 1: Trash__

__Lesson 1: Trash__

Exploring different datasets can give us insight about the same processes. Data from our Participatory Sensing campaigns rely on human sensors and limit the ability to generalize to the greater population.

__Lesson 2: Drought__

__Lesson 2: Drought__

Data can be used to make predictions. Official datasets rely on censuses or random samples and can be used to make generalizations.

__Lesson 3: Community Connection__

__Lesson 3: Community Connection__

Data collected through Participatory Sensing campaigns will be used to create models that answer real-world problems related to our community.

__Lesson 4: Evaluate and Implement the Campaign__

__Lesson 4: Evaluate and Implement the Campaign__

Statistical investigative questions guide a Participatory Sensing campaign so that we can learn about a community or ourselves. These campaigns should be evaluated before implementing to make sure they are reasonable and ethically sound.

__Lesson 5: Refine and Create the Campaign__

__Lesson 5: Refine and Create the Campaign__

Statistical investigative questions guide a Participatory Sensing campaign so that we can learn about a community or ourselves. These campaigns should be tried before implementing to make sure they are collecting the data they are meant to collect and refined accordingly.

__Lesson 6: Statistical Predictions using One Variable__

__Lesson 6: Statistical Predictions using One Variable__

Anyone can make a prediction. But statisticians measure the success of their predictions. This lesson encourages the classroom to consider different measures of success.

__Lesson 7: Statistical Predictions by Applying the Rule__

__Lesson 7: Statistical Predictions by Applying the Rule__

If we use the mean squared errors rule, then the mean of our current data is the best prediction of future values. If we use the mean absolute errors rule, then the median of the current data is the best prediction of future values.

__Lesson 8: Statistical Predictions Using Two Variables__

__Lesson 8: Statistical Predictions Using Two Variables__

When predicting values of a variable *y*, and if *y* is linearly associated with *x*, then we can get improved predictions
by using our knowledge about *x*. For every value of *x*, find the mean of the *y* values for that value of *x*. If the resulting mean follows a trend, we can model this trend to generalize to unseen values of *x*.

__Lesson 9: Spaghetti Line__

__Lesson 9: Spaghetti Line__

We can often use a straight line to summarize a trend. “Eyeballing” a straight line to a scatterplot is one way to do this.

__Lesson 10: What’s the Best Line?__

__Lesson 10: What’s the Best Line?__

The regression line can be used to make good predictions about values of *y* for any given value of *x*. This
works for exactly the same reason the mean works well for one variable: the predictions will make your
score on the mean squared errors as small as possible.

__Lesson 11: What’s the Trend?__

__Lesson 11: What’s the Trend?__

A positive or negative association between variables provides valuable insights into increasing or decreasing trends, particularly in making predictions. By understanding these associations, we can anticipate future outcomes or behaviors more accurately.

__Lesson 12: How Strong Is It?__

__Lesson 12: How Strong Is It?__

A high absolute value for correlation means a strong linear trend. A value close to 0 means a weak linear trend.

__Lesson 13: Improving your Model__

__Lesson 13: Improving your Model__

If a linear model is fit to a non-linear trend, it will not do a good job of predicting. For this reason, we need to identify non-linear trends by looking at a scatterplot or the model needs to match the trend.

__Lesson 14: More Variables to Make Better Predictions__

__Lesson 14: More Variables to Make Better Predictions__

We can use scatterplots to assess which variables might lead to strong predictive models. Sometimes using several predictors in one model can produce stronger models.

__Lesson 15: Combination of Variables__

__Lesson 15: Combination of Variables__

If multiple predictors are associated with the response variable, a better predictive model will be produced, as measured by the mean squared error.

__Lesson 16: Footbal or Futbol?__

__Lesson 16: Footbal or Futbol?__

Some trends are not linear, so the approaches we’ve done so far won’t be helpful. We need to model such trends differently. Decision trees are a non-linear tool for classifying observations into groups when the trend is non-linear.

__Lesson 17: Grow Your Own Decision Tree__

__Lesson 17: Grow Your Own Decision Tree__

We can determine the usefulness of decision trees by comparing the number of misclassifications in each.

__Lesson 18: Where Do I Belong?__

__Lesson 18: Where Do I Belong?__

We can identify groups, or “clusters”, in data based on a few characteristics. For example, it is easy to classify a group of people into football players and swimmers, but what if you only knew each person’s arm span? How well could you classify them into football players and swimmers now?

__Lesson 19: Our Class Network__

__Lesson 19: Our Class Network__

Networks are made when observations are interconnected. In a social setting, we can examine how different people are connected by finding relationships between other people in a network.