Lab 1B: Get the Picture?
Lab 1B - Get the picture?
Directions: Follow along with the slides, completing the questions in blue on your computer, and answering the questions in red in your journal.
Where'd we leave off ...
In the previous lab, we started to get acquainted with the layout of RStudio and some of the commands.
In this lab, we'll learn about different types of variables.
– Such as those that are measured by numbers and others that have values that are categories.
We'll also look at ways to visualize these different types of data using plots (a word data scientists use interchangeably with the word graph).
Find the History tab in RStudio and click on it. Figure out how to use the information to reload the
Numerical variables have values that are measured in units.
Categorical Variables have values that describe or categorize our observations.
cdcdata and find the columns for
gender(use the History pane again if you need help to
heighta numerical or categorical variable? Why?
gendera numerical or categorical variable? Why?
– List either the different categories or what you think the measured units are for
Which is which?
Run the code you used in the previous lab to display the
cdcdata's variables (Use the code displayed in the History pane to resubmit previously typed commands). Use the code's output to help you complete the following:
– Write down 3 variables that you think are categorical variables and why.
– Write down 3 variables that you think are numerical variables and why.
One way to get a good summary of your data is to look at the data's structure.
– One way to view this info would be to click on the little blue arrow next to
cdcin the Environment pane.
– Another way would be to run the following in the console:
Look at the
structure of your
cdcdata and answer:
What information does the
Were you able to correctly guess which variables were categorical and numeric? Which ones did you mislabel?
Visualizing data is a really helpful way to learn about our variables.
Choose one numeric and one categorical variable from the data and create both a
histogramfor each variable.
– Which function, either
histogram, is better at visualizing categorical variables? Which is better at visualizing numerical variables?
We have options
Make a graph that shows the distribution of people’s
– Describe the distribution of
weight. Make sure to describe the shape, center and spread of the distribution.
Options can be added to plotting functions to change their appearance. The code below includes the
nintoption which controls the number of intervals in a numerical plot.
– Options, also known as arguments, are additional pieces of information you provide to a function, and separated by commas.
Type the command below on your console and then answer the questions that follow:
histogram(~weight, data = cdc, nint = 3)
How did including the option
nint = 3change the
nint = 3impact how you would describe the shape, center and spread?
Try other values for
nint. What value produced the best graph? Why?
How often do people text & drive?
Make a graph that shows how often people in our data texted while driving.
– What does the y-axis represent?
– What does the x-axis tell us?
– Would you say that most people never texted while driving? What does the word most mean?
– Approximately what percent of the people texted while driving for 20 or more days? (Hint: There's 13,677 students in our data.)
Does texting and driving differ by gender?
Fill in the blanks with the correct variables to create a side-by-side bargraph:
bargraph (~ ____ , data = ____ , groups = ____ )
Write a sentence explaining how boys and girls differ when it comes to texting while driving.
Would you say that most girls never text and drive? Would you say that most boys never text and drive?
How did including the
groupsargument in your code change the graph?
Do males and females have similar heights?
To answer this, what we'd like to do is visualize the distributions of heights, separately, for males and females.
– This way, we can easily compare them.
groupsargument to create a
heightof males and females.
– Can you use this graphic to answer the question at the top of the slide? Why or why not?
– Is grouping numeric values, such as heights, as helpful as grouping categorical variables, such as texting & driving?
Do males and females have similar heights?, continued
Why does this work for bargraphs but not histograms?
groupsargument uses color to differentiate between groups. - With bargraphs, each group is split with bars next to each other on teh x-axis. - With histograms, the x-axis is a continuous set of numbers so the bars overlap making it difficult to compare center and spread.
Fill in the blanks with the correct variables to create a split
histogramto answer the questions below:
histogram (~ ____ | ____ , data = ____ )
Do you think males and females have similar heights? Use the plot you create to justify your answer.
Just like we did for the
histogram, is it possible to create a split
bargraph? Try to create a
drive_textthat's split by
genderto find out.
On your own:
In this lab, we looked at the texting & driving habits of boys and girls.
What other factors do you think might affect how often people text and drive?
– Choose one variable from the
cdcdata, make a graph, and use the graph to describe how
drive_textuse differs with this variable.