Lesson 21: Learning to Love XML

Objective:

Students will understand the need for data to be stored in different ways - specifically, why it makes sense for web data to be formatted as XML.

Materials:

Online Data-ing handout (LMR_3.19_Online Data-ing)

Note: This should have been completed during the previous class.
Mountain Peak XML data found at:
https://labs.idsucla.org/extras/webdata/mountains.html

Note: Open with Google Chrome or Firefox browsers, NOT with Safari.
Projector
Mountains – HTML vs. XML handout (LMR_3.20_Mountains - HTML vs. XML)

Vocabulary:

XML

Essential Concepts:

XML is a programming language that we use with our campaigns. We create basic XML "tags" in the code, which help us store data in a format we understand.

Lesson:

Allow time for student teams to present their findings from the Online Data-ing handout (LMR_3.19) if there was not sufficient time during the previous lesson.
Remind students that in the previous lesson they learned about a variety of ways that data can be presented online.
They've been working with comma separated (CSV) files and R data frames. Last time and in the lab, they worked with HTML tables. Today they are going to learn how HTML can be displayed as an XML table.
XML, or Extensible Mark up Language, is a popular format for storing data on the Internet. It is useful because it creates readable web pages, and also because it allows programmers to easily update values in the data table if those values change.
In pairs, ask students to brainstorm ways in which data that is found online is different than the way we see data in RStudio. Then, create a class brainstorm from the student pair responses.
After the brainstorm, emphasize the following:
1. RStudio’s default way to work with data is as large data frames (tables) where rows represent observations and columns represent variables.
2. Data that is viewed online often has a different structure.
3. Data structures found on the web might be displayed in tables, such as those on Wikipedia, or streams, such as Twitter, and might even include data spread across multiple sections of a web page, such as Yelp.
Show students, on a projector, the Mountain Peak XML data found at
https://labs.idsucla.org/extras/webdata/mountains.html

Ask students to look at the data and determine if they have seen it before. Hint: They have! It was the data they scraped during Lab 3E.
Once students figure out that the XML is just the same data as the website they scraped during Lab 3E, distribute the Mountains – HTML vs. XML handout (LMR_3.20), which displays both HTML and XML versions of the data.

Note: The handout only includes the first 3 mountains.

LMR_3.20
Ask student pairs to answer the following:
1. Why are certain XML tags indented in the XML version of the data? The indentations tell us how to structure the HTML table. For example, all the mountains are contained in the <data> section, but are further tagged by each particular mountain within the <mountain> and </mountain> tags. All information stored between those two tags will be displayed as one row of the HTML table.
2. What are the role of tags (ex. <state>) and end tags (ex. </state>) in the XML code? Tags tell us when a certain type of data begins, and end tags tell us when the data should end. In other words, it tells us where to find the specific values of a variable (ex. Alaska would be the value of the “state” variable since it is between the <state> and </state> tags.
3. Where are the variable names? The variable names can be found between each <mountain> and </mountain> tags. Specifically, the first variable is “peak” and the last variable is “rank.”
4. Where are the observations? The observations are located within each of the variable tags. For example, the observation “Mount McKinley (Denali)” is found between the <peak> and </peak> tags.
Assign student pairs one of the above questions to share out with the class. Student pairs that did not receive an assignment must participate using the Agree/Disagree strategy.
As a class, discuss the answers to the questions above.
XML formats make it easier to display data on the web in a pleasant matter and make it easier for programmers to find and alter data if the values change or if, for example, they wish to add a new row to a table.

Class Scribes:

One team of students will give a brief talk to discuss what they think the 3 most important topics of the day were.

Homework

For the next 3 days, students will collect data using the class’s newly created Participatory Sensing campaign (see Lessons 16-18).

For homework, students should reflect about how XML and HTML data are displayed. They should discuss when each format is appropriate.