LAB 4F: Some Models Have Curves
Lab 4F  Some models have curves
Directions: Follow along with the slides and answer the questions in bold font in your journal.
Making models do yoga

In the previous lab, we saw that prediction models could be improved by including additional variables.'
– But using straight lines for all the variables in a model might not really fit what's happening in the data.

In this lab, we'll learn how we can turn our
lm()
models using straight lines intolm()
models using quadratic curves. 
Load the
movie
data and split it into two sets:– A set named
training
that includes 75% of the data.– And a set named
testing
that includes the remaining 25%.– Remember to use
set.seed
.
Problems with lines

Calculate the slope and intercept of a linear model that predicts
audience_rating
based oncritics_rating
for thetraining
data.– Then create a scatterplot of the two variables using the
testing
data and useadd_line()
to include the line of best fit based on thetraining
data..– Describe, in words, how the line fits the data? Are there any values for
critics_rating
that would make obviously poor predictions? 
Compute the MSE of the model for the
testing
data and write it down for later.
Adding flexibility

You don't need to be a fullfledged Data Scientist to realize that trying to fit a line to curved data is a poor modeling choice.
– If our data is curved, we should try model it with a curve.

So instead of using an
lm()
likey = a + bx

We could use an
lm()
likey = a + bx + cx2

This is called a quadratic curve.
Making bendy models

To fit a quadratic model in
R
, we can use thepoly()
function.– Fill in the blanks below to predict
audience_rating
using a quadratic polynomial forcritics_rating
.lm(____ ~ poly(____, 2), data = training)

What is the role of the number 2 in the
poly()
function? 
Write down the model equation in the form:
y = a + bx + cx2

Assign this model a name and calculate the MSE for the
testing_data
.
Comparing lines and curves

Create a scatterplot with
audience_rating
on the yaxis andcritics_rating
on the xaxis using yourtesting
data.– Add the line of best fit for the
training
data to the plot.– Then use the name of the model in the code below to add your quadratic model:
add_curve(____)

Compare how the line of best fit and the quadratic model fit the data. Use the difference in each model's testing MSE to describe why one model fits better than the other.
On your own

Create a model that predicts
audience_rating
using a3
degree polynomial (called a cubic model) for thecritics_rating
using the training data.– By using a plot, describe why you think a
2
or3
degree polynomial will make better predictions for the testing data.– Compute the MSE for the model with a
3
degree polynomial and use the MSE to justify whether the2
or3
degree polynomial fits thetesting
data better.– Using the linear model from above which has the smallest MSE, include a different numerical variable to the model and recompute the MSE. Does modeling the variable you chose as a quadratic polynomial improve the MSE further?