Lab 1E: What’s the Relationship?
Lab 1E - What's the Relationship?
Directions: Follow along with the slides and answer the questions in bold font in your journal.
Finding patterns in data.
-
To discover (really) interesting observations or relationships in data, we need to find them!
– Which is difficult if we only look at the raw data.
-
The best tool for finding patterns is often ... your own eyes.
– Plots are an excellent way to help your eye search for patterns.
-
In this lab, we'll learn how to include more variables in our plots to make them more informative.
-
Import the data from your class' Food Habits campaign and name it
food
.
Where's the variables?
- How many variables were used to create this plot? Which variables were used and how were they used?
Multiple variable plots
-
The previous graph is an example of a multiple variable plot, which means that more than a single variable was used. In this case:
-
Variable 1: height
-
Variable 2: gender
-
Multiple variable plots are tools for finding relationships between data.
-
Let's take our
food
data and make some new multiple variable plots you haven't created before!
Scatterplots
Creating scatterplots
-
Scatterplots are useful for viewing how one numerical variable relates to another numerical variable.
-
Fill in the blanks to create a scatterplot with
sodium
on the y-axis andsugar
on the x-axis.xyplot(____ ~ ____, data = food)
Scatterplots in action
-
Use a scatterplot to answer the following questions:
– Do snacks that have more
calories
also have moretotal_fat
? Why do you think that?– What happens if you swap the
calories
andtotal_fat
variables in your code? Does the relationship between the variables change?– Does the relationship between
calories
andtotal_fat
change when the snack is eitherSalty
orSweet
? Write down the code you used to answer this question.
4-variable scatterplots
-
When we make scatterplots, we can include:
– 1 numerical variable on the x-axis
– 1 numerical variable on the y-axis
– Use 1 categorical variable to facet our scatterplot
– Change the color of the points based on another categorical variable
-
To change the color of our points, we can include the
groups
argument much like we did for bargraphs (use the search feature in the History pane if you need help). -
Create a scatterplot that uses these 4 variables:
sodium
,sugar
,healthy_level
,salty_sweet
.
Multiple facets
-
It can sometimes be helpful to facet on more than 1 variable.
– Splitting the the data using 2 facets can give us additional insights that might otherwise be hidden.
-
Create a
dotPlot
orhistogram
of the calories variable, but facet the data using:healthy_level + salty_sweet
-
How does the
healthy_level
of aSalty
orSweet
snack impact the number ofcalories
in the snack?
On your own
-
Answer the following questions by creating an appropriate graph or graphs.
– Do healthier snacks
cost
more or less than less healthy snacks?– What other variables seem to be related to the
cost
of a snack? Describe their relationships.