LAB 4A: If the Line Fits…
Lab 4A  If the line fits ...
Directions: Follow along with the slides and answer the questions in bold font in your journal.
How to make predictions

Anyone can make predictions.
– Data scientists use data to inform their predictions by using the information learned from the sample to make predictions for the whole population.

In this lab, we'll learn how to make predictions by finding the line of bestfit.
– You will also learn how to use the information from one variable to make predictions about another variable.
Predicting heights
 Use the
data()
function to load thearm_span
data. 
This data comes from a sample of 90 people in the Los Angeles area.
– The measurements of
height
andarmspan
are in inches.– A person's
armspan
is the maximum distance between their fingertips when they spread their arms out wide. 
Make a plot of the
height
variable.– If you had to predict the height of someone in the LA area, what single height would you choose and why?
– Would you describe this as a good guess? What might you try and improve your predictions?
Predicting heights knowing arm spans

Create two subsets of our
arm_span
data:– One for
armspan >= 61 & armspan <= 63
.– A second for
armspan >= 64 & armspan <= 66
. 
Create a histogram for the
height
of people in each subset. Answer the following based on the data:– What
height
would you predict if you knew a person had anarmspan
around 62 inches?– What
height
would you predict if you knew a person had anarmspan
around 65 inches?– Does knowing someone's
armspan
help you predict their height. Why or why not?
Fitting lines

Notice that there is a trend that people with a larger
armspan
also tend to have a larger meanheight
.– One way of describing this sort of trend is with a line.

Data scientists often fit lines to their data to make predictions.
– What we mean by fit is to come up with a line that's close to as many of the data points as possible.

Create an scatterplot for
height
andarmspan
. Then run the following code. Draw a line by clicking twice on the Plot pane.add_line()
Predicting with lines

Draw a line that you think is a good fit and write down its equation. Using this equation:
– Predict how tall a person with a 62 and a person with a 65 inch
armspan
would be. 
Using a line to make predictions also lets us make predictions for
armspan
s that aren't in our data.– How tall would you predict a person with a 63.5 inch
armspan
to be? 
Compare your answers with a neighbor's. Did both of you come up with the same equation for a line? If not, can you tell which line fits the data best?
Regression lines

If you were to go around your class, each student would have created a different line that they feel fit the data best.
– Which is a problem because everyone's line will make slightly different predictions.

To avoid this variation in predictions, data scientists will use regression lines.
– These line connects the mean
height
of people with similararm_spans
.– Fill in the blanks below to create the a regression line using an
lm
, or linear model:lm(____ ~ ____, data = arm_span)
Predicting with regression lines

Use the output of the code from the previous slide to write down the equation of the regression line in the form
y = a + bx.

Add this line to a scatterplot by filling in the blanks below:
add_line(intercept = ____, slope = ____)

Predict the height of a person with a 63.5 inch
armspan
and compare it with a neighbor. Ensure you both arrive at the same predicted value. 
Measure your
armspan
and use the regression line to predict your height. How close was the prediction?