LAB 4F: Some Models Have Curves
Lab 4F - Some models have curves
Directions: Follow along with the slides and answer the questions in bold font in your journal.
Making models do yoga
In the previous lab, we saw that prediction models could be improved by including additional variables.'
– But using straight lines for all the variables in a model might not really fit what's happening in the data.
In this lab, we'll learn how we can turn our
lm()models using straight lines into
lm()models using quadratic curves.
moviedata and split it into two sets:
– A set named
trainingthat includes 75% of the data.
– And a set named
testingthat includes the remaining 25%.
– Remember to use
Problems with lines
Calculate the slope and intercept of a linear model that predicts
– Then create a scatterplot of the two variables using the
testingdata and use
add_line()to include the line of best fit based on the
– Describe, in words, how the line fits the data? Are there any values for
critics_ratingthat would make obviously poor predictions?
Compute the MSE of the model for the
testingdata and write it down for later.
You don't need to be a full-fledged Data Scientist to realize that trying to fit a line to curved data is a poor modeling choice.
– If our data is curved, we should try model it with a curve.
So instead of using an
y = a + bx
We could use an
y = a + bx + cx2
This is called a quadratic curve.
Making bend-y models
To fit a quadratic model in
R, we can use the
– Fill in the blanks below to predict
audience_ratingusing a quadratic polynomial for
lm(____ ~ poly(____, 2), data = training)
What is the role of the number 2 in the
Write down the model equation in the form:
y = a + bx + cx2
Assign this model a name and calculate the MSE for the
Comparing lines and curves
Create a scatterplot with
audience_ratingon the y-axis and
critics_ratingon the x-axis using your
– Add the line of best fit for the
trainingdata to the plot.
– Then use the name of the model in the code below to add your quadratic model:
Compare how the line of best fit and the quadratic model fit the data. Use the difference in each model's testing MSE to describe why one model fits better than the other.
On your own
Create a model that predicts
3degree polynomial (called a cubic model) for the
critics_ratingusing the training data.
– By using a plot, describe why you think a
3degree polynomial will make better predictions for the testing data.
– Compute the MSE for the model with a
3degree polynomial and use the MSE to justify whether the
3degree polynomial fits the
– Using the linear model from above which has the smallest MSE, include a different numerical variable to the model and recompute the MSE. Does modeling the variable you chose as a quadratic polynomial improve the MSE further?