Unit4, Section5: Ties that Bind
Instructional Days: 3
Clustering is another way to classify data into groups. We classify observations based on numerical characteristics and their similarities. We use k-means to determine the mean value for each group of k clusters by randomly assigning an initial value for the mean and then moving the mean based on its proximity to the points.
Networks classify people into groupings based on who knows whom. Nodes are formed when a relationship between two people is present.
Students will participate in the Find the Clusters Activity described in Lesson 14. They will determine which points in a plot should be grouped as football players and which points should be grouped as swimmers.
S-IC 2: Decide if a specified model is consistent with results from a given data-generating process, e.g., using simulation.
Understand what RStudio is doing when using the k-means function to find clusters in a group of data and when creating networks in order to learn how to classify data into groups.
• Use the k-means function to find clusters in a group of data.
• Plot the data with the cluster assignments based on the k-means function.
Network analysis is used by many private and public entities such as the National Security Agency when they want to find terrorist networks to have maximum impact on communications. The k-means algorithm is a technique for grouping entities according to the similarity of their attributes. For example, dividing countries into similar groups using k-means to make fair comparisons is applicable.
Students will use complex sentences to construct summary statements about their understanding of data, how it is collected, how it used, and how to work with it.
Students will engage in partner and whole group discussions and presentations to express their understanding of data science concepts.
Students will use complex sentences to write informative short reports that use data science concepts and skills.
Legend for Activity Icons