Essential Concepts

IDS Unit 4: Essential Concepts

Lesson 1: Trash

Exploring different datasets can give us insight about the same processes. Data from our Participatory Sensing campaigns rely on human sensors and limit the ability to generalize to the greater population.

Lesson 2: Drought

Data can be used to make predictions. Official datasets rely on censuses or random samples and can be used to make generalizations.

Lesson 3: Community Connection

Data collected through Participatory Sensing campaigns will be used to create models that answer real-world problems related to our community.

Lesson 4: Evaluate and Implement the Campaign

Statistical investigative questions guide a Participatory Sensing campaign so that we can learn about a community or ourselves. These campaigns should be evaluated before implementing to make sure they are reasonable and ethically sound.

Lesson 5: Refine and Create the Campaign

Statistical investigative questions guide a Participatory Sensing campaign so that we can learn about a community or ourselves. These campaigns should be tried before implementing to make sure they are collecting the data they are meant to collect and refined accordingly.

Lesson 6: Statistical Predictions using One Variable

Anyone can make a prediction. But statisticians measure the success of their predictions. This lesson encourages the classroom to consider different measures of success.

Lesson 7: Statistical Predictions by Applying the Rule

If we use the mean squared errors rule, then the mean of our current data is the best prediction of future values. If we use the mean absolute errors rule, then the median of the current data is the best prediction of future values.

Lesson 8: Statistical Predictions Using Two Variables

When predicting values of a variable y, and if y is linearly associated with x, then we can get improved predictions by using our knowledge about x. For every value of x, find the mean of the y values for that value of x. If the resulting mean follows a trend, we can model this trend to generalize to unseen values of x.

Lesson 9: Spaghetti Line

We can often use a straight line to summarize a trend. “Eyeballing” a straight line to a scatterplot is one way to do this.

Lesson 10: What’s the Best Line?

The regression line can be used to make good predictions about values of y for any given value of x. This works for exactly the same reason the mean works well for one variable: the predictions will make your score on the mean squared errors as small as possible.

Lesson 11: What’s the Trend?

A positive or negative association between variables provides valuable insights into increasing or decreasing trends, particularly in making predictions. By understanding these associations, we can anticipate future outcomes or behaviors more accurately.

Lesson 12: How Strong Is It?

A high absolute value for correlation means a strong linear trend. A value close to 0 means a weak linear trend.

Lesson 13: Improving your Model

If a linear model is fit to a non-linear trend, it will not do a good job of predicting. For this reason, we need to identify non-linear trends by looking at a scatterplot or the model needs to match the trend.

Lesson 14: More Variables to Make Better Predictions

We can use scatterplots to assess which variables might lead to strong predictive models. Sometimes using several predictors in one model can produce stronger models.

Lesson 15: Combination of Variables

If multiple predictors are associated with the response variable, a better predictive model will be produced, as measured by the mean squared error.

Lesson 16: Footbal or Futbol?

Some trends are not linear, so the approaches we’ve done so far won’t be helpful. We need to model such trends differently. Decision trees are a non-linear tool for classifying observations into groups when the trend is non-linear.

Lesson 17: Grow Your Own Decision Tree

We can determine the usefulness of decision trees by comparing the number of misclassifications in each.

Lesson 18: Where Do I Belong?

We can identify groups, or “clusters”, in data based on a few characteristics. For example, it is easy to classify a group of people into football players and swimmers, but what if you only knew each person’s arm span? How well could you classify them into football players and swimmers now?

Lesson 19: Our Class Network

Networks are made when observations are interconnected. In a social setting, we can examine how different people are connected by finding relationships between other people in a network.